Add the SQL Server Integration Services capability for Shared Storage connections
Add an Edge capability to authorize your Edge site to crawl the metadata and generate the technical lineage.Important Collibra Cloud sites do not support Shared Storage connections.
Required permissions
- You have a global role with the Product Rights > System administration global permission.
- You have a global role that has the Manage connections and capabilities global permission, for example, Edge integration engineer.
Steps
- Open a site.
-
On the main toolbar, click
→
Settings.
The Settings page opens. -
In the tab pane, click Edge.
The Sites tab opens and shows a table with an overview of your sites. - In the table, click the name of the site whose status is Healthy.
The site page opens.
-
On the main toolbar, click
- In the Capabilities section, click Add capability.
The Add capability page appears. - Select the Technical Lineage for SQL Server Integration Services (SSIS) capability template.
- Enter the required information.
Field Description Required? Name
The name of the capability.
Yes
Description
The description of the capability.
No
Source ID
The name of the data source. The name must be unique and cannot contain special characters, for example,
/.
Yes
TechLin Admin Connection (in preview) If you want to use the OAuth authentication type to connect to the Collibra Data Lineage service instances, you have to create a Technical Lineage Admin Edge or Collibra Cloud site connection and select the OAuth authentication type. Then, in this field, you specify the name of the Technical Lineage Admin Edge or Collibra Cloud site connection.
No
Shared Storage Connection
The Shared Storage connection that you created.
Yes
Mask The pattern of the file names in the directory. By default, the value is
*.
No
Source Configuration
The connection definitions, where you specify relevant translations for each data source. Specify the following properties in JSON format and enter the content in this field.
If you previously created a technical lineage for this data source with connection definitions by using the lineage harvester, you can enter the content from the <source ID>.conf file in this field.Connection definition propertiesProperty
Description Required? DataSources
The parent element that contains the connection definitions of your data sources in SQL Server Integration Services.
If you specify the properties in this section and also the ConnStringRegExTranslation property for a data source, the connection definitions in the ConnStringRegExTranslation property takes precedence.
No
DataSourceNameThe name of your data source.
No
dialectThe dialect of the referenced database.
See the list of allowed values.You can enter one of the following values:
azure, for an Azure SQL Server data source.bigquery, for a Google BigQuery data source.db2, for an IBM DB2 data source.hana, for an SAP HANA data source.hana-cviews, for getting lineage from calculated views in an SAP HANA Classic on-premises data source.hana-cviews-v2, for getting lineage from calculated views in an SAP HANA Cloud/Advanced data source.ImportantTo get technical lineage including calculated views, you must harvest SAP HANA by adding two Technical Lineage for SqlDirectory capabilities with the Shared Storage connections. In one capability, specify the
hanadialect, and in the other, specify thehana-cviewsorhana-cviews-v2dialect.hive, for a HiveQL data source.greenplum, for a Greenplum data source.mssql, for a Microsoft SQL Server data source.mysql, for a MySQL data source.netezza, for a Netezza data source.oracle, for an Oracle data source.postgres, for a PostgreSQL data source.redshift, for an Amazon Redshift data source.snowflake, for a Snowflake data source.spark, for a Spark SQL data source.sybase, for a Sybase data source.teradata, for a Teradata data source.
No
collibraSystemNameThe system or server name of the data source.
Use this property with the
useCollibraSystemNameproperty in the lineage harvester configuration file to override the default Collibra System asset name for this data source.Specify this property with the same name as the name of the System asset that you create when you prepare the physical data layer in Data Catalog. If you don't prepare the physical data layer, Collibra Data Lineage cannot stitch the data objects in your technical lineage to the assets in Data Catalog.
No
ConnStringRegExTranslation
The parent element that opens the connection definitions.
If you specify this property and also the properties in the DataSources section for a data source, the connection definitions in this property takes precedence.
No
<regular expression>
A regular expression that must match one or more connection strings.
NoteImportant considerations:
- By default, the regular expression is not case sensitive. As a consequence, a regular expression can match with connection strings containing uppercase characters or lowercase characters.
- The connection string is part of the SSIS connection manager.
- SSIS connection managers are included in an SSIS package files (DTSX) or in connection manager files (CONMGR).
ExampleRegular expression:
Server=sb-dhub;User ID=SYB_USER2;Initial Catalog=STAGEDB;Port=6306.*
Explanation: The first section, up to .*, is a literal, but not case-sensitive, match of the characters. The dot (.) can match any single character. The asterisk (*) means zero or more of the previous, in this case any character.
Match: Any connection string that starts withServer=sb-dhub;User ID=SYB_USER2;Initial Catalog=STAGEDB;Port=6306.
Example:Server=sb-dhub;User ID=SYB_USER2;Initial Catalog=STAGEDB;Port=6306;Persist Security Info=True;Auto Translate=False;.
No
dbnameThe name of your database, to which the data source connection refers.
No
schemaThe name of your schema, to which the regular expression refers.
No
dialectThe dialect of the referenced database.
See the list of allowed values.You can enter one of the following values:
azure, for an Azure SQL Server data source.bigquery, for a Google BigQuery data source.db2, for an IBM DB2 data source.hana, for an SAP HANA data source.hana-cviews, for getting lineage from calculated views in an SAP HANA Classic on-premises data source.hana-cviews-v2, for getting lineage from calculated views in an SAP HANA Cloud/Advanced data source.ImportantTo get technical lineage including calculated views, you must harvest SAP HANA by adding two Technical Lineage for SqlDirectory capabilities with the Shared Storage connections. In one capability, specify the
hanadialect, and in the other, specify thehana-cviewsorhana-cviews-v2dialect.hive, for a HiveQL data source.greenplum, for a Greenplum data source.mssql, for a Microsoft SQL Server data source.mysql, for a MySQL data source.netezza, for a Netezza data source.oracle, for an Oracle data source.postgres, for a PostgreSQL data source.redshift, for an Amazon Redshift data source.snowflake, for a Snowflake data source.spark, for a Spark SQL data source.sybase, for a Sybase data source.teradata, for a Teradata data source.
No
collibraSystemNameThe system or server name of the data source.
Specify this property when you set the value of the Collibra system name setting to
True to override the default Collibra System asset name for this data source. Specify this property with the same name as the name of the System asset that you created when you registered the data source.
No
See an exampleCopy{
"DataSources": {
"dhb-sql-prod": {
"dialect": "mssql",
"collibraSystemName": "my-system-name"
}
},
"ConnStringRegExTranslation": {
"Data Source=dhb-sql-prod;Initial Catalog=SFG_repl_staging;Provider=SQLNCLI11;Integrated Security=SSPI.*": {
"dbname": "DATAHUB",
"schema": "DBO",
"dialect": "mssql",
"collibraSystemName" : "WAREHOUSE"
},
"Server=sb-dhub;User ID=SYS_USER;Initial Catalog=STAGEDB;Port=6306.*": {
"dbname": "STAGEDB",
"schema": "STAGE_OWNER",
"dialect": "sybase",
"collibraSystemName" : ""
}
}
}
No
Property This section contains the custom parameters you can specify to create technical lineage. Click Add property to add a property.
Warning If you are a Collibra Platform for Government customer, this field is required to connect to a Collibra Data Lineage service instance:Properties for Collibra Platform for Government customersType Value Type Name Value Text
Plaintext
techlinHostThis is the URL of the Collibra Data Lineage service instance to which you want to upload metadata, for example
techlin-europe-west1.collibra.com.Example: techlin-europe-west1.collibra.com Text
Secret
techlinKeyThis is the unique API key to connect to a Collibra Data Lineage service instance.
Specify a unique user key for each Collibra environment. If you're not sure what your user key is, contact your Collibra Collibra Account Team.
<your-techlin-key>
Yes for US government customers.Dependent On Sources
This option allows you to provide table-definition details from an independent data source to a data source that is dependent on those details. This is needed to avoid analysis errors and to have a complete lineage that includes lineage from the SQL statements from dependent data sources.
To use this option, enter the source ID of the independent source.
Show me an exampleLet's consider an example. Let's say that you want to create a technical lineage for two data sources:
Database1 contains the DDL that specifies that the database has a table named "Table1", which has three columns: Col1, Col2, Col3, and Col4.
Database2 contains an SQL statement: SELECT * from Database1.Schema1.Table1.
The SQL statement in Database2 refers to the table in Database1. Therefore, to get lineage from the statement in Database2, the table definition from Database1 must be known. In this case, we say that Database2 is dependent on Database1. Database1 is considered the independent data source.
To configure this option, specify the Source ID of the independent data source, in this example, Database1, as shown in the following imagee
Important If a dependent data source contains lowercase column names, this feature will only work for the following dialects: Oracle, Snowflake, and Teradata. For all other dialects:- An analyze error is raised, prompting you to provide the DDL file.
- The only workaround is to consolidate your SQL statements and DDL file in a single data source.
For complete information, go to Sharing database models across data sources.
No
Delete Raw Metadata After Processing
Technical lineage via Edge harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance. This option indicates whether the raw metadata should be deleted from the Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.
Select this option to indicate that the raw source metadata is deleted after processing.
Clear the checkbox to keep the raw source metadata after processing. In this case, it is stored in the Collibra infrastructure.
No
Analyze Only (Deprecated)
Important This option is deprecated and will be removed in a future version of Collibra. We recommend that you no longer use it. The mandatory Processing Level setting, below, replaces this option.- The "Analyze" option in the Processing Level setting is the equivalent of selecting the Analyze Only option.
- The "Sync" option in the Processing Level setting is the equivalent of clearing the Analyze Only option.
No
Important This setting replaces the deprecated Analyze Only option, which will be removed in a future version of Collibra.
For each of your data sources, you have to specify one of the following values: Load, Analyze, or Sync. Then, when you synchronize your technical lineage, the following process begins:
- Metadata for all data sources is loaded, regardless of the value of this setting for a particular data source.
- Metadata from data sources for which the value of this setting is either Analyze or Sync, is analyzed.
- Metadata from data sources for which the value of this setting is Sync, is synchronized.
Value Description Load Harvest metadata from the data source and upload it to your Collibra environment. This allows you to inspect and, if necessary, edit the harvested metadata before uploading it to the Collibra Data Lineage service instance for analysis.
When the job is done, you can download and review the metadata:
- Open the Activities list.
-
In the row containing the job, click Result.
The Synchronization Results dialog box appears.
- Click download and save the ZIP file to your hard drive.
Tip The download link resembles the following:
https://integrations.collibra-abc.com/rest/2.0/files/01944f12-7665-7d9c-8bc5-aa426b6a63cc. Take note of the file ID, in this example:01944f12-7665-7d9c-8bc5-aa426b6a63cc. After you inspect the metadata, you can send the ZIP file for analysis by using the "Analyze files" option. Alternatively, you can upload the ZIP file using the POST /files API. In either case, you need to specify the file ID.Analyze Load and analyze the metadata on the Collibra Data Lineage service instance.
Synchronization does not start after analysis; it starts only after either:
- You trigger synchronization of another data source for which you specify Sync in the Processing Level drop-down list.
- You configure the Technical Lineage Admin Edge or Collibra Cloud site capability, and trigger synchronization via the Sync option in the Integration Configuration tab in Data Catalog.
Important If you want to synchronize multiple data sources, we strongly recommend that you select this option in the respective Edge or Collibra Cloud site capabilities for each of your data sources. This allows you to synchronize all data sources in a single job, thereby maximizing efficiency and mitigating the risk of failed synchronization jobs.- For complete information and important considerations, go to Tips for successful lineage synchronization
- For more information about the Sync option in the Technical Lineage Admin Edge or Collibra Cloud site capability, go to Technical lineage admin options.
Sync Load, analyze, and synchronize metadata from all data sources. Synchronization starts – or is queued, if another synchronization job is running – immediately after analysis.
Important If you want to synchronize multiple data sources and you select this option, each data source is processed as a separate job. This is highly inefficient and will likely lead to failed sync jobs. For complete information and important considerations, go to Tips for successful lineage synchronization.
Yes
Active
The option determines whether to include or remove the technical lineage of the data source.
Select this option to include the technical lineage of this data source.
Clear the checkbox to exclude the technical lineage of this data source.
Yes
Debug
This setting is not valid for this integration. It should be set to false.
No
Log level
Only complete this field on the request of or together with Collibra Support.
No
- Click Create.
The capability is added to the Edge or Collibra Cloud site.
The fields become read-only.
You can now synchronize the technical lineage.
