Add the Catalog JDBC ingestion capability

Important 

Choose an option below to explore the documentation for the latest user interface (UI) or the classic UI.

Before you can register a specific data source via Edge, you must add the Catalog JDBC ingestion capability to the JDBC connection for that data source.

Note If you're using a Collibra Cloud site, go the Collibra Cloud site documentation to check if your data source is supported.

Prerequisites

Steps

  1. Open a site.
    1. On the main toolbar, click Products iconCogwheel icon Settings.
      The Settings page opens.
    2. In the tab pane, click Edge.
      The Sites tab opens and shows a table with an overview of your sites.
    3. In the table, click the name of the site whose status is Healthy.
      The site page opens.
  2. In the Capabilities section, click Add capability.
    The Add capability page opens.
  3. Select the Catalog JDBC ingestion capability template.
  4. Enter the required information.
    FieldDescriptionRequired

    Capability

    This section contains general information about the capability.

    Name

    The name of the capability.

    Yes

    Description

    The description of the capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following capability:

    Catalog JDBC ingestion

    Yes

    Connection

    This section contains information to connect to the data source.

    JDBC connection

    The connection to the data source.

    Yes

    JDBC data source type (Deprecated)

    Deprecated field. The field was used to indicate the type of the data source. You no longer need to change this field.

    The required value is automatically identified. The value isn't shown in this page.

    Yes

    Supports schemas

    A text field where you have to enter True to enable database registration of data sources without schemas. If the data source doesn't have a schema, Data Catalog creates a Schema asset with the same name as the full name of the database.

    If the data source has schemas, you can ignore this field.

    No

    Other Settings

    Others

    This section can contain additional capability properties.
    Click Add propertyAdd Other Settings to add a property.

    For an overview of all available properties, go to Other Settings.

    Note No validation is performed on the values you add.

    No

    General

    This section contains general information about logging.

    Debug

    An option to automatically send Edge infrastructure log files to Collibra Platform. By default, this option is set to false.

    Note We highly recommend to only send Edge infrastructure log files to Collibra Platform when you have issues with Edge. If you set it to true, it will automatically revert to false after 24h.

    For more information, go to Edge logging.

    No

    Log level

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

  5. Click Create.
    The capability is added to the Edge or Collibra Cloud site.
    The fields become read-only.

Available "Other Settings"

You can add the following properties in the Other Settings section to define how metadata is retrieved and how synchronization jobs are configured.

Note No validation is performed on the values you add.

Strategy properties

The following properties define the strategy for the retrieval of columns and tags.

Name Description Type Encryption Default value
columns-strategy

The columns-strategy controls the method used to retrieve column metadata, such as data types, descriptions, and default values, from a database.
The possible values are:

SINGLE_CALL (default)

Retrieves column metadata for all tables in the schema using a single bulk request.
Characteristics:
Optimized latency.
Higher memory usage.
Doesn't guarantee correct values in the Column Position attribute.

CALL_PER_TABLE

Retrieves column metadata for each table individually using separate requests.
Characteristics:
Optimized memory usage.
Higher network overhead.
Guarantees a correct value in the Column Position attribute.
Text Not encrypted (plain text) SINGLE_CALL
tags-strategy

For Snowflake only

The tags-strategy defines the source of the source tags.
The possible values are:

SINGLE_CALL

This strategy retrieves the tags metadata for all tables in the schema through a single bulk request. This increases the performance of the Snowflake metadata synchronization for source tags.
In this case, the Snowflake source tags are retrieved from the SNOWFLAKE.ACCOUNT_USAGE.TAG_REFERENCES table. Therefore, you require the SELECT permission on that table.

CALL_PER_TABLE (default)

This strategy retrieves the tags metadata for each table individually using separate requests.
In this case, the Snowflake source tags are retrieved from INFORMATION_SCHEMA.tag_references_all_columns and INFORMATION_SCHEMA.tag_references tables.
Therefore, you need extra permissions:
GRANT SELECT ON VIEW MY_DB.INFORMATION_SCHEMA.TAG_REFERENCES TO ROLE <role_name>;
GRANT SELECT ON VIEW MY_DB.INFORMATION_SCHEMA.TAG_REFERENCES_ALL_COLUMNS TO ROLE <role_name>;

SKIP

During the synchronization, even if you select you want to ingest the source tags, source tags won't get ingested.
Text Not encrypted (plain text)

CALL_PER_TABLE

Properties for the various synchronization jobs

The following jobs run in view of a synchronization:

  • database-list-with-metadata: This job runs when you register a data source and the Database field needs to be filled with available databases.
  • schema-list: This job runs in the Configuration tab of a Database asset to show the schemas in the database.
  • ingest-schema: This job runs when you click the Synchronize button for a Database or Schema.
Name Description Type Encryption Default value

Warning The following properties can have a significant impact on your Edge site. Only add or update them together with Collibra Support. Ensure you add the properties in the capability, not in the connection.

database-list-with-metadata-garbage-collector The garbage collector that is used by this job in the capability.
For information about other possibilities, go to the AZUL documentation.
Text Not encrypted (plain text) -XX:+UseParallelGC
database-list-with-metadata-requests-cpu database-list-with-metadata-requests-cpu The minimum amount of CPU computing power requested by this job in the capability. The amount is expressed in milliCPU.
Text Not encrypted (plain text) 100
database-list-with-metadata-limits-cpu database-list-with-metadata-limits-cpu The maximum amount of CPU computing power requested by this job in the capability. The amount is expressed in milliCPU. Text Not encrypted (plain text) 950
database-list-with-metadata-requests-memory

The minimum amount of memory requested by this job in the capability.

The amount is expressed in mebibytes (Mi).

If you add this property, you also need to add the properties: database-list-with-metadata-limits-memory and database-list-with-metadata-jvm-max-memory.

Text Not encrypted (plain text) 128
database-list-with-metadata-limits-memory

The maximum amount of memory requested by this job in the capability.

The amount is expressed in mebibytes (Mi).

If you add this property, you also need to add the properties: database-list-with-metadata-requests-memory and database-list-with-metadata-jvm-max-memory.

Text Not encrypted (plain text) 256
database-list-with-metadata-jvm-max-memory

The maximum amount of memory that can be used by the Java virtual machine (jvm) for this job. The amount is expressed in mebibytes (Mi).

If you add this property, you also need to add the properties: database-list-with-metadata-requests-memory and database-list-with-metadata-limits-memory.

Important Make sure this amount is lower than the database-list-with-metadata-limits-memory amount.

Text Not encrypted (plain text) 256
schema-list-garbage-collector The garbage collector that is used by this job in the capability.
For information about other possibilities, go to the AZUL documentation.
Text Not encrypted (plain text) -XX:+UseParallelGC
schema-list-requests-cpu The minimum amount of CPU computing power requested by this job in the capability. The amount is expressed in milliCPU.
Text Not encrypted (plain text) 100
schema-list-limits-cpu The maximum amount of CPU computing power requested by the capability.
The amount is expressed in milliCPU.
Text Not encrypted (plain text) 950
schema-list-requests-memory

The minimum amount of memory requested by this job in the capability.

The amount is expressed in mebibytes (Mi).

Text Not encrypted (plain text) 128
schema-list-limits-memory

The maximum amount of memory requested by this job in the capability.

The amount is expressed in mebibytes (Mi).

If you add this property, you also need to add the properties: schema-list-requests-memory and schema-list-jvm-max-memory.

Text Not encrypted (plain text) 256
schema-list-jvm-max-memory

The maximum amount of memory that can be used by the Java virtual machine (jvm) for this job.

The amount is expressed in mebibytes (Mi).

If you add this property, you also need to add the properties: schema-list-requests-memory and schema-list-limits-memory.

Important  Make sure this amount is lower than the schema-list-limits-memory amount.

Text Not encrypted (plain text) 256
ingest-schema-garbage-collector The garbage collector that is used by this job in the capability.
For information about other possibilities, go to the AZUL documentation.
Text Not encrypted (plain text) -XX:+UseParallelGC
ingest-schema-requests-cpu The minimum amount of CPU computing power requested by this job in the capability.
The amount is expressed in milliCPU.
Text Not encrypted (plain text) 100
ingest-schema-limits-cpu The maximum amount of CPU computing power requested by this job in the capability. The amount is expressed in milliCPU. Text Not encrypted (plain text) 1500
ingest-schema-requests-memory

The minimum amount of memory requested by this job in the capability.

The amount is expressed in mebibytes (Mi).

If you add this property, you also need to add the properties: ingest-schema-limits-memory and ingest-schema-jvm-max-memory.

Text Not encrypted (plain text) 128
ingest-schema-limits-memory

The maximum amount of memory requested by the capability.
The amount is expressed in mebibytes (Mi).

If you add this property, you also need to add the properties: ingest-schema-requests-memory and ingest-schema-jvm-max-memory.

Text Not encrypted (plain text) 2048
ingest-schema-jvm-max-memory

The maximum amount of memory that can be used by the Java virtual machine (jvm) for this job. The amount is expressed in mebibytes (Mi).

If you add this property, you also need to add the properties: ingest-schema-requests-memory and ingest-schema-limits-memory.

Important Make sure this amount is lower than the ingest-schema-limits-memory amount.

Text Not encrypted (plain text) 2048
http-connect-timeout-seconds

The maximum amount of time allowed to create a connection with Collibra.

The value must be set to 30 or higher.

Text Not encrypted (plain text) 30
http-read-timeout-seconds

The maximum amount of time allowed to wait for a response before closing the connection.

The value must be set to 300 or higher.

Text Not encrypted (plain text) 300

What's next

If want to run profiling, you can already, add the JDBC Profiling capability to the connection.
You can then register the data source via Edge.