Warning Jobserver and all related Jobserver integrations are end of life starting October, 2024, with the exception of Public Sector customers using GovCloud or on-prem environments.
For information about using Catalog connectors on Edge, go to Overview of Catalog connectors.

Register a data source using your own driver

You can register a database as a data source using one of your own drivers.

Note 
Important 

In Collibra 2024.05, we launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

Prerequisites

  • You have a global role with the Catalog global permission, for example, Catalog Author.
  • You have configured one or more Jobservers in Collibra Console. If there is no available Jobserver, the Register data source actions will be grayed out in the global create menu of Collibra Data Intelligence Platform.
  • If you are using a Collibra Data Intelligence Platform environment with an on-premises Jobserver, both must have the same installer version. You can find the installer version of your Collibra Data Intelligence Platform environment at the bottom of the sign-in window of its Collibra Console, for example 5.9.2-0
  • You have a resource role with the following resource permissions on the Schema community:
    • Asset > add
    • Attribute > add
    • Domain > add
    • Attachment > add

Steps

Tip 

This information varies depending on your data source type and authentication method.

  1. On the main toolbar, click Products icon, and then click Catalog.
    The Catalog Home opens.
  2. On the main toolbar, click .
    The Create dialog box appears.
  3. In the Create dialog box, click Register a Data Source Using Your Own Driver).
  4. In the Register data source dialog box, click GenericAmazon RedShiftCloudera HiveHortonworks HiveHP VerticaIBM DB2Mapr HiveSQL ServerMySQLOraclepostgreSQLTeradata.
  5. Do one of the following:
    • Click Select in the row of an existing driver to continue.
  6. Configure the data source:
    FieldDescription
    Schema Name

    This name is used in Collibra as schema asset and must therefore be unique.

    Schema DescriptionThe description of the schema. This is used as description of the schema asset.
    OwnerThe owner of the registered data in Collibra.
    Process On

    The jobserver used for ingesting.

    <Connection properties section>

    The connection properties as defined in your JDBC driver.

    Label

    Description

    Hostname

    The name of your device.

    Port

    The port number.

    Database

    The name of your database.

    Schema

    The name of your schema.

    Label

    Description

    URL (hostname:port)

    Address of the used database. Use the format hostname:port.

    Principal

    The Kerberos principal identity.

    Schema

    The name of your schema.

    Label

    Description

    URL (hostname:port)

    Address of the used database. Use the format hostname:port.

    Schema

    The name of your schema.

    Label

    Description

    Hostname

    The name of your device.

    Port

    The port number.

    Database

    The name of your database.

    Schema

    The name of your schema.

    Label

    Description

    Hostname

    The name of your device.

    Port

    The port number.

    Database

    The name of your database.

    Schema

    The name of your schema.

    Label

    Description

    URL (hostname:port)

    Address of the used database. Use the format hostname:port.

    Schema

    The name of your schema.

    Label

    Description

    Hostname

    The name of your device.

    Port

    The port number.

    Database

    The name of your database.

    Schema

    The name of your schema.

    Label

    Description

    Hostname

    The name of your device.

    Port

    The port number.

    Database

    The name of your database.

    Label

    Description

    Hostname

    The name of your device.

    Port

    The port number.

    SID

    The Oracle System ID, which identifies a database on a system.

    Schema

    The name of your schema.

    Label

    Description

    Hostname

    The name of your device.

    Port

    The port number.

    Database

    The name of your database.

    Schema

    The name of your schema.

    Label

    Description

    Hostname

    The name of your device.

    Port

    The port number.

    Database

    The name of your database.

    Schema

    The name of your schema.

    If you want to use Kerberos authentication, you also need the following connection properties.

    Label

    Description

    Principal

    The Kerberos principal identity.

    Kerberos realm

    The Kerberos realm name.

    Login context name

    The login context name that is used as the index to the configuration.

    Jaas file name

    The name of the Jaas file.

    Kerberos configuration file

    The configuration file containing specific properties for Kerberos authentication.

    If you want to use NTLM authentication, you also need the following connection properties.

    Label

    Description

    Security

    The security that enables the authentication

    Authentication scheme

    The used authentication scheme, which is NTLM.

    If you want to use CyberArk authentication, you need the following connection properties.

    Label

    Description

    Keystore file

    The name of the keystore file. The keystore must contain the client key and client certificate or certificate chain.

    If defaultTruststore is set to false, the keystore has to contain the trusted CA certificate needed to validate the server certificate offered by CyberArk.

    The value must have the following format: file://<keystore-file name.jks>.

    Example file://cyberark-keystore.jks

    Keystore password

    The password required to open the keystore.

    Default truststore

    The indication of the default truststore. The default value is set to False.

    • False: The certificate is validated through the keystoreFile property.
    • True: The certificate is validated through the default truststore from the Java JRE. This is recommended when CyberArk is set up to offer a server certificate that can be validated by a public CA (certification authority).
    CyberArk address

    The host and port number through which the CyberArk server is accessible. The format of the address is hostname:port.

    Example my.cyberark.com:5502

    CyberArk application ID

    The application ID as defined in CyberArk.

    This ID should be provided by your network or system administrator.

    CyberArk query

    The CyberArk query.

    This query should be provided by your network or system administrator.

    Login Information 
    Store Credentials

    Select this option to store the credentials to access the database. With a schema refresh, you can clear this option again.

    Username

    Username to access the database.

    Note This field is ignored if your data source uses any other means of authentication, such as Cyberark, Kerberos, NTLM or any certificate-based authentication method.
    Password

    Corresponding password to access the database.

    Note This field is ignored if your data source uses any other means of authentication, such as Cyberark, Kerberos, NTLM or any certificate-based authentication method.

    Schedule Data Refresh

    Enable or disable a schedule to automatically refresh the data registration.
    Cron Expression

    Schedule of the data refresh as a Cron pattern.

    If you create an invalid Cron pattern, Collibra Data Intelligence Platform stops responding.

    Time Zone
    The time zone of the database.

    Store Data Profile

    Option to perform data profiling on the registered data.

    Store Sample Data

    Option to extract sample data from the registered data.

    Tables excluded from registration

    Database tables that will not be ingested.

    Note 
    • If required, you can exclude multiple tables. To do this, press Enter after typing a value and then type the next.
    • You can use an asterisk (*) as wildcard to select multiple tables. For example, if you want to exclude the tables that all start with act_, you can enter act_*.
    • The table names are case sensitive.
    • You can add or remove tables from this list by refreshing the schema.
    • The Table assets that are created after ingestion have an attribute type called Table Type that defines the type of table that is declared in the data source. For example, TABLE, VIEW,...
  7. Click Save & Create.
  1. On the main toolbar, click Products icon, and then click Catalog.
    The Catalog Home opens.
  2. On the main toolbar, click .
    The Create dialog box appears.
  3. In the Create dialog box, click Register data source (use your own driver).
  4. In the Register data source dialog box, click GenericAmazon RedShiftCloudera HiveHortonworks HiveHP VerticaIBM DB2Mapr HiveSQL ServerMySQLOraclepostgreSQLTeradata.
  5. In the Register data source dialog box, enter the required information.
    FieldDescription
    Schema Name

    This name is used in Collibra as schema asset and must therefore be unique.

    Schema DescriptionThe description of the schema. This is used as description of the schema asset.
    OwnerThe owner of the registered data in Collibra.
  6. Click Next.
  7. If required, add and configure the driver of your preference:
    1. In the JDBC driver version field, click manage drivers....

      Note By default, you see the name of the driver that was used last.
    2. Do one of the following:
      • Click Add JDBC Driver if you want to create a new JDBC driver.
      • Click if you want to edit an existing JDBC driver.
    3. Enter the required information.
      FieldDescription
      JDBC Driver Version Name

      The name of the JDBC driver.

      Tip As a best practice, we recommend you use a strict naming convention which includes the data source and a version number. For example: Google BigQuery 1.5 or MySQL 5.9.

      Upload

      Button to upload the relevant files for the data source.

      Driver files

      This table contains a list of uploaded files.

      You can remove a driver file by clicking .

    4. Click Next.
    5. Configure the JDBC connection.
      FieldDescription
      Connection

      The JDBC connection string.

      Driver Class Name

      The driver class name of the connection.

      Connection properties

      This section contains the connection properties.

      Amazon Redshift

      Connection properties
      LabelProperty

      Description

      Mandatory
      Hostnamehost

      The name of your device.

      Yes
      Portport

      The port number.

      Yes
      Databasedatabase

      The name of your database.

      Yes
      Schemaschema

      The name of your schema.

      Yes

      Cloudera Hive

      Connection properties

      Label

      Property

      Description

      Mandatory

      URL (hostname:port)

      host

      Address of the used database. Use the format hostname:port.

      Yes
      Principal

      principal

      The Kerberos principal identity.

      Yes

      Schema

      schema

      The name of your schema.

      Yes

      Hortonworks Hive

      Connection properties

      Label

      Property

      Description

      Mandatory

      URL (hostname:port)

      host

      Address of the used database. Use the format hostname:port.

      Yes

      Schema

      schema

      The name of your schema.

      Yes

      HP Vertica

      Connection properties

      Label

      Property

      Description

      Mandatory

      Hostname

      host

      The name of your device.

      Yes
      Port

      port

      The port number.

      Yes

      Database

      database

      The name of your database.

      Yes

      Schema

      schema

      The name of your schema.

      Yes

      IBM DB2

      Connection properties

      Label

      Property

      Description

      Mandatory

      Hostname

      host

      The name of your device.

      Yes
      Port

      port

      The port number.

      Yes

      Database

      database

      The name of your database.

      Yes

      Schema

      schema

      The name of your schema.

      Yes

      MapR Hive

      Connection properties

      Label

      Property

      Description

      Mandatory

      URL (hostname:port)

      host

      Address of the used database. Use the format hostname:port.

      Yes

      Schema

      schema

      The name of your schema.

      Yes

      Microsoft SQL Server

      Connection properties

      Label

      Property

      Description

      Mandatory

      Hostname

      host

      The name of your device.

      Yes
      Port

      port

      The port number.

      Yes

      Database

      databaseName

      The name of your database.

      Yes

      Schema

      schema

      The name of your schema.

      Yes

      MySQL

      Connection properties

      Label

      Property

      Description

      Mandatory

      Hostname

      host

      The name of your device.

      Yes
      Port

      port

      The port number.

      Yes

      Database

      database

      The name of your database.

      Yes

      Oracle DB

      Connection properties

      Label

      Property

      Description

      Mandatory

      Hostname

      host

      The name of your device.

      Yes
      Port

      port

      The port number.

      Yes

      SID

      sid

      The Oracle System ID, which identifies a database on a system.

      Yes

      Schema

      schema

      The name of your schema.

      Yes

      PostgreSQL

      Connection properties

      Label

      Property

      Description

      Mandatory

      Hostname

      host

      The name of your device.

      Yes
      Port

      port

      The port number.

      Yes

      Database

      database

      The name of your database.

      Yes

      Schema

      schema

      The name of your schema.

      Yes

      Teradata

      Connection properties

      Label

      Property

      Description

      Mandatory

      Hostname

      host

      The name of your device.

      Yes
      Port

      port

      The port number.

      Yes

      Database

      database

      The name of your database.

      Yes

      Schema

      schema

      The name of your schema.

      Yes

    6. Click Create.
  8. Enter the database connection properties.
    FieldDescription
    JDBC driver versionThe JDBC driver to connect to your database.
    Connect via

    The jobserver used for ingesting.

    <Connection properties section>

    This section contains the connection properties.

    Amazon Redshift

    Connection properties
    LabelProperty

    Description

    Mandatory
    Hostnamehost

    The name of your device.

    Yes
    Portport

    The port number.

    Yes
    Databasedatabase

    The name of your database.

    Yes
    Schemaschema

    The name of your schema.

    Yes

    Cloudera Hive

    Connection properties

    Label

    Property

    Description

    Mandatory

    URL (hostname:port)

    host

    Address of the used database. Use the format hostname:port.

    Yes
    Principal

    principal

    The Kerberos principal identity.

    Yes

    Schema

    schema

    The name of your schema.

    Yes

    Hortonworks Hive

    Connection properties

    Label

    Property

    Description

    Mandatory

    URL (hostname:port)

    host

    Address of the used database. Use the format hostname:port.

    Yes

    Schema

    schema

    The name of your schema.

    Yes

    HP Vertica

    Connection properties

    Label

    Property

    Description

    Mandatory

    Hostname

    host

    The name of your device.

    Yes
    Port

    port

    The port number.

    Yes

    Database

    database

    The name of your database.

    Yes

    Schema

    schema

    The name of your schema.

    Yes

    IBM DB2

    Connection properties

    Label

    Property

    Description

    Mandatory

    Hostname

    host

    The name of your device.

    Yes
    Port

    port

    The port number.

    Yes

    Database

    database

    The name of your database.

    Yes

    Schema

    schema

    The name of your schema.

    Yes

    MapR Hive

    Connection properties

    Label

    Property

    Description

    Mandatory

    URL (hostname:port)

    host

    Address of the used database. Use the format hostname:port.

    Yes

    Schema

    schema

    The name of your schema.

    Yes

    Microsoft SQL Server

    Connection properties

    Label

    Property

    Description

    Mandatory

    Hostname

    host

    The name of your device.

    Yes
    Port

    port

    The port number.

    Yes

    Database

    databaseName

    The name of your database.

    Yes

    Schema

    schema

    The name of your schema.

    Yes

    MySQL

    Connection properties

    Label

    Property

    Description

    Mandatory

    Hostname

    host

    The name of your device.

    Yes
    Port

    port

    The port number.

    Yes

    Database

    database

    The name of your database.

    Yes

    Oracle DB

    Connection properties

    Label

    Property

    Description

    Mandatory

    Hostname

    host

    The name of your device.

    Yes
    Port

    port

    The port number.

    Yes

    SID

    sid

    The Oracle System ID, which identifies a database on a system.

    Yes

    Schema

    schema

    The name of your schema.

    Yes

    PostgreSQL

    Connection properties

    Label

    Property

    Description

    Mandatory

    Hostname

    host

    The name of your device.

    Yes
    Port

    port

    The port number.

    Yes

    Database

    database

    The name of your database.

    Yes

    Schema

    schema

    The name of your schema.

    Yes

    Teradata

    Connection properties

    Label

    Property

    Description

    Mandatory

    Hostname

    host

    The name of your device.

    Yes
    Port

    port

    The port number.

    Yes

    Database

    database

    The name of your database.

    Yes

    Schema

    schema

    The name of your schema.

    Yes

    Login InformationThis section contains the login information.
    Store Credentials

    Select this option to store the credentials to access the database. With a schema refresh, you can clear this option again.

    Username

    Username to access the database.

    Note This field is ignored if your data source uses any other means of authentication, such as Cyberark, Kerberos, NTLM or any certificate-based authentication method.
    Password

    Corresponding password to access the database.

    Note This field is ignored if your data source uses any other means of authentication, such as Cyberark, Kerberos, NTLM or any certificate-based authentication method.

    Schedule Data Refresh

    Enable or disable a schedule to automatically refresh the data registration.
    Cron Expression

    Schedule of the data refresh as a Cron pattern.

    If you create an invalid Cron pattern, Collibra Data Intelligence Platform stops responding.

    Time Zone
    The time zone of the database.
  9. Click Next.
  10. Select the data profiling options.
    FieldDescription

    Store Data Profile

    Option to perform data profiling on the registered data.

    Note If you have not added the QueryPassthrough connection property to your Teradata driver, it is disabled by default. However, if you enable Store Data Profile for Teradata, QueryPassthrough is enabled automatically. If you have added the QueryPassthrough connection property to your driver, the value that you specified is used.
    Detect advanced data types

    Option to detect advanced data types in the data source.

    Store Sample Data

    Option to extract sample data from the registered data.

    Tables excluded from registration

    Database tables that will not be ingested.

    Note 
    • If required, you can exclude multiple tables. To do this, press Enter after typing a value and then type the next.
    • You can use an asterisk (*) as wildcard to select multiple tables. For example, if you want to exclude the tables that all start with act_, you can enter act_*.
    • The table names are case sensitive.
    • You can add or remove tables from this list by refreshing the schema.
    • The Table assets that are created after ingestion have an attribute type called Table Type that defines the type of table that is declared in the data source. For example, TABLE, VIEW,...
  11. Click Create.

What's next?

The data source is registered and the data is automatically ingested. The ingestion of data is executed in a job. You can see this job in the list of activities.

Tip 
  • If the database contains foreign keys, they will be registered as new assets of the Foreign Key asset type. Assets of this type contain the complex relation, which is the link between all column assets that are part of the foreign key definition.
    However, the complex relation is not created if a column is part of a table that is added to the list of Tables excluded from registration.
  • If you exclude a table during the schema refresh, the corresponding table, column assets and foreign key mapping will be deleted.