Add the Amazon Redshift capability for JDBC connections

After you have created a connection to Amazon Redshift in your Edge or Collibra Cloud site, you have to add the capability to the connection.

Required permissions

You have a global role with the Product Rights > System administration global permission.
You have a global role that has the Manage connections and capabilities global permission, for example, Edge integration engineer.

Steps

Open a site.
1. On the main toolbar, click → Settings.
  The Settings page opens.
2. In the tab pane, click Edge.
  The Sites tab opens and shows a table with an overview of your sites.
3. In the table, click the name of the site whose status is Healthy.
  The site page opens.
In the Capabilities section, click Add capability.
The Add capability page appears.
Select the Technical Lineage for Amazon Redshift capability template.

Enter the required information.

Field Description Required?

Name

The name of the capability.

Yes

Description

The description of the capability.

Yes

Source ID

The name of the data source. The name must be unique and cannot contain special characters, for example, /.

Yes

TechLin Admin Connection (in preview)

If you want to use the OAuth authentication type to connect to the Collibra Data Lineage service instances, you have to create a Technical Lineage Admin Edge or Collibra Cloud site connection and select the OAuth authentication type. Then, in this field, specify the name of the Technical Lineage Admin connection.

For more information about the authentication types, go to Create a Technical Lineage Admin connection.

JDBC Connection

The JDBC connection that you created for Catalog JDBC ingestion.

Yes

Collibra System Name

The system or server name of the data source. This field is also the full name of your System asset in Data Catalog.

The value of this field must be the same as the full name of the System asset that you created when you registered the data source.

Yes

Database Name Override

If stitching is missing specifically because you edited the full name of your Database asset, you can use this field to specify the current name of your Database asset in Data Catalog.

Important We strongly recommend that you not edit the full name of your System, Database and Schema assets in Data Catalog. Doing so can lead to errors during the technical lineage creation process.

Queries

The queries to download all the data that is required to create technical lineage. The queries vary depending on the data source you use.

When you add a capability, default queries are shown in the code fields and the Use default value checkbox is selected. On occasion, to improve performance, we update the default queries. When that happens, the next time your data sources are synchronized, the new default queries are checked against the previous default queries. If there is a difference, the new queries are used.

Note Collibra Data Lineage can only check for changes between the new default queries and the previous set of default queries. If the queries in your Oracle Edge capabilityTechnical Lineage for Snowflake capability are older than the previous set of queries (or if you have customized them) they are recognized as customized queries and cannot be updated. Therefore, you won't benefit from the performance improvements.

To benefit from the performance improvements, you can create a new capability and copy the set of default queries from that into your existing capabilities. You can, then, modify them to suit your needs.

If you want to use customized queries, clear the Use default value checkbox, and then enter your queries.

Note

If you use customized queries, ensure that you use only supported SQL syntax.
Collibra Support does not provide support for customized SQL queries. After synchronization, if no lineage was created, we recommend that you edit your queries or reach out to Collibra Coaching Services.

Example Enter the following filter in a Views query: where v.table_schema not in ('pg_catalog', 'information_schema');. This query excludes the pg_catalog and information_schema schemas, which don't contain customer data. If you want to exclude other schemas, adjust the query to, for example where v.table_schema not in ('pg_catalog', 'information_schema', 'another_schema');.

Query	Description
Columns	This query retrieves all columns, tables, schemas, databases or projects in the form: database or project > schema > table > column.
Views	This query retrieves the view definitions.

Yes

Property

Use this section to define custom parameters for technical lineage. Click Add property to add a parameter.

Available properties:

Type	Value Type	Name	Descriptions	Example value
Text	Plaintext	httpTimeout	Sets the HTTP timeout duration, in seconds. You can enter a value in the range of 1 to 3599. The default value is 15.	`15`
Text	Plaintext	linkedServerDatabaseMapping	Specifies the database name to use when a linked server reference does not include a database. This value is used during lineage parsing to resolve incomplete object references.	`{"LNKD1":"DB1","LNKD2":"DB2"}`

Properties for Collibra Platform for Government customers

Type Value type Name Description Example value

Text

Plaintext

techlinHost

This is the URL of the Collibra Data Lineage service instance to which you want to upload metadata.

techlin-europe-west1.collibra.com

Text

Secret

techlinKey

This is the unique API key to connect to a Collibra Data Lineage service instance.

Specify a unique user key for each Collibra environment. If you're not sure what your user key is, contact your Collibra Collibra Account Team.

<your-techlin-key>

Yes for US government customers.

Dependent On Sources

This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.

To use this option, enter the source ID of the independent source.

Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Oracle, Snowflake, and Teradata. For all other dialects:

An analyze error is raised, prompting you to provide the DDL file.
The only workaround is to consolidate your SQL statements and DDL file in a single data source.

For complete information, go to Sharing database models across data sources.

Database-System mapping

This optional field allows you to map databases to their rightful systems, to obtain stitching. This resolves missing stitching, which occurs when Collibra Data Lineage associates multiple databases with the default system name that you provide in the Collibra System Name field.

Delete Raw Metadata After Processing

Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

Select this option to indicate that the raw source metadata is deleted after processing.

Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.

Analyze Only (Deprecated)

Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.

The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.

Processing Level

Important This setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.

For each of your data sources, you have to specify one of the following values: Load, Analyze, or Sync. Then, when you synchronize your technical lineage, the following process begins:

Metadata for all data sources is loaded, regardless of the value of this setting for a particular data source.
Metadata from data sources for which the value of this setting is either Analyze or Sync, is analyzed.
Metadata from data sources for which the value of this setting is Sync, is synchronized.

Value Description

Load

Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.

When the job is done, you can download and review the metadata:

Open the Activities list.
In the row containing the job, click Result.
The Synchronization Results dialog box appears.
Click download and save the ZIP file to your hard drive.

Tip The download link resembles the following: https://integrations.collibra-abc.com/rest/2.0/files/01944f12-7665-7d9c-8bc5-aa426b6a63cc. Take note of the file ID, in this example: 01944f12-7665-7d9c-8bc5-aa426b6a63cc. After you inspect the metadata, you can send the ZIP file for analysis by using the "Analyze files" option. Alternatively, you can upload the ZIP file using the POST /files API. In either case, you need to specify the file ID.

Analyze

Load and analyze the metadata on the Collibra Data Lineage service instance.

Synchronization does not start after analysis; it starts only after either:

You trigger synchronization of another data source for which you specify Sync in the Processing Level drop-down list.
You configure the Technical Lineage Admin Edge or Collibra Cloud site capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.

Important If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge or Collibra Cloud site capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.

For complete information and important considerations, go to Tips for successful lineage synchronization
For more information about the Sync option in the Technical Lineage Admin Edge or Collibra Cloud site capability, go to Technical lineage admin options.

Sync

Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.

Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.

Yes

Active

The option determines whether to include or remove the technical lineage of the data source.

Select this option to include the technical lineage of this data source.

Clear the checkbox to exclude the technical lineage of this data source.

Yes

Debug

An option to enable logging of a JDBC job. If you enable logging, you can download the output file of the JDBC job in the Edge Jobs dashboard (in preview). The output file contains the logs of the JDBC driver. For more information about downloading the output file, go to Download job output files.

Select one of the following values:

True: Enables logging of the JDBC job.
False: Disables logging of the JDBC job. This is the default value.

Log level

An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

Click Add.
The capability is added to the Edge or Collibra Cloud site.
The fields become read-only.

What's next

You can now synchronize the technical lineage.