The configuration file generator
The configuration file generator helps you create your lineage harvester configuration file by providing the structure of the file with the correct properties per data source.
The lineage harvester configuration file
The lineage harvester uses a configuration file to connect to JDBC data sources, BI tools and ETL tools. The configuration file contains references to the data sources for which you want to create a technical lineage. You have to prepare the configuration file if you want to create a technical lineage and add new relations of the type "Data Element targets / sources Data Element" between existing assets in Data Catalog, and "Column is target of / is source of Data Attribute" between assets from ingested BI sources and assets in Data Catalog.
Empty configuration file
When you run the lineage harvester for the first time, it creates an empty configuration file. To create a technical lineage, you have to manually add properties and values, per data source, to this configuration file.
The following image shows an example of the empty configuration file created by the lineage harvester.
{ "general" : { "catalog" : { "url" : "", "username" : "", }, "useCollibraSystemName" : false }, "sources" : [ { "type" : "Database", "id" : "MyDB", "hostname" : "", "username" : "", "dialect" : "", "collibraSystemName" : "", "databaseNames" : [ ], "port" : 1521 } ] }
Configuration file generator
The configuration file generator creates an example configuration file with the data source properties of your choosing:
- Select the (meta)data sources for which you want to create a technical lineage.Tip Click the Lineage harvester for GovCloud customers tab below, to view an example configuration file for GovCloud customers.
- Scroll down to the configuration file example.
- Click to copy the example.
The configuration file example is copied to your clipboard.
- Paste the example in your empty configuration file in the lineage harvesterconfig folder.
- Replace the values in the example to match your actual data source information.
Tip Make sure you understand each property and know which values you must use to access your data source information.
- Run the lineage harvester.
Warning Some browser plug-ins may slow the configuration file generator down.
- Lineage harvester configuration file
- Source-specific configuration files
- Lineage harvester for GovCloud customers
Use these options to filter the data source properties in the configuration file to your needs.
-+ Azure Data Factory -+ dbt Cloud -+ dbt Core -+ IBM InfoSphere DataStage
-+ Informatica PowerCenter -+ SQL Server Integration Services -+ Informatica Intelligent Cloud Services - Data Integration -+ Matillion -+ Tableau -+ Looker -+ MicroStrategy -+ Power BI -+ SQL Server Reporting Services (SSRS) and Power BI Report Server (PBRS) -+ Custom lineages -+ SQL files in the lineage harvester output folder (downloaded SQL files) | -+ Oracle -+ Google BigQuery -+ Snowflake -+ Other SQL data sources with connection type "JDBC":
| -+ SQL directories with connection type "folder":
|
{ "general": { "catalog" : { "url" : "https://companydomain.collibra.com", "username" : "my-Collibra-username" }, "useCollibraSystemName" : false }, "sources" : [ { "id" : "adf_source", "type" : "AzureDataFactory", "tenantDomain": "tenant-domain", "loginFlow": { "type": "ServicePrincipal", "applicationId": "application-id" }, "subscriptionId" : "subscription-id", "resourceGroupName" : "resource-group-name", "factories" : ["factoryname1","factoryname2"], "deleteRawMetadataAfterProcessing": false } { "type": "dbt", "id": "mydbt", "tokenName": "mytokenname", "adminUrl":"https://cloud.getdbt.com/api/v2", "environmentIds":"[123456, 987654]", "metadataUrl":"https://metadata.cloud.getdbt.com/graphql", "deleteRawMetadataAfterProcessing": false } { "id" : "my_dbt_source", "type" : "ExternalDirectory", "dirType" : "dbt", "path" : "/opt/Collibra/techlin/dbt-core-files/", "recursive" : true, "deleteRawMetadataAfterProcessing" : false } { "id" : "datastage_source", "type" : "ExternalDirectory", "dirType" : "DATASTAGE", "path" : "/path/to/the/datastage/folder/", "mask" : "*", "recursive" : false, "deleteRawMetadataAfterProcessing": false } { "dialect" : "oracle", "id" : "informatica_source", "type" : "ExternalDirectory", "dirType" : "powercenter", "path" : "/path/to/the/informatica/folder/", "mask" : "*", "recursive" : false, "deleteRawMetadataAfterProcessing": false } { "id" : "datastage_source", "type" : "ExternalDirectory", "dirType" : "SSIS", "path" : "/path/to/the/ssis/folder/", "mask" : "*", "recursive" : false, "deleteRawMetadataAfterProcessing": false } { "type" : "IICS", "id" : "iics_source", "loginUrl" : "https://dm-us.informaticaintelligentcloud.com", "username" : "login-iics", "deleteRawMetadataAfterProcessing": false, "objects" : [ { "path" : "Default/Sales", "type" : "Project" }, { "path" : "My Project/Statistics", "type" : "Project" } ] } { "id" : "my-matillion-project", "type" : "Matillion", "url" : "https://my-domain", "groupName" : "my-matillion-group", "projectName" : "redshift-project", "environmentName" : "redshift-environment", "dialect" : "redshift", "startTimestamp" : 1594080796911, "httpTimeout" : 30, "deleteRawMetadataAfterProcessing": false, "auth": { "type": "Basic", "username": "ec2-user" } } { "type": "Tableau", "id": "unique-ID", "url": "URL to Tableau server?", "username": "Admin", "siteIds": ["site ID of Tableau Site 1", "site ID of Tableau Site 2"], "siteNames": ["site name of Tableau Site 1", "site name of Tableau Site 2"], "restOnly": false, "domainId": "Domain-resource-ID", "excludeImages": true, "deleteRawMetadataAfterProcessing": false, "paging": { "pagination-setting": 100, "pagination-setting-2": 100 } } { "id" : "looker-source", "type" : "Looker", "lookerUrl" : "https://<instance-name.api.looker.com", "clientId" : "my-looker-api-user-name", "clientSecret": "looker-api-userkey", "domainId" : "22258f64-40b6-4b16-9c08-c95f8ec0da26", "deleteRawMetadataAfterProcessing": false } { "id": "<unique-id>", "type": "SSRS", "url": "http://<IP address or computer name>/Reports", "username": "<server-api-user-name>", "domainId": "<domain-resource-id>", "folderFilter": ["/Folder1/*", "Folder2"], "deleteRawMetadataAfterProcessing": false } { "collibraSystemName" : "custom-system-name", "id" : "MyCustomLineage", "type" : "ExternalDirectory", "dirType" : "custom-lineage", "path" : "/path/to/custom-lineage/dir", "deleteRawMetadataAfterProcessing": false } { "type" : "LoadedSource", "id" : "MySource", "zipFile" : "/path/to/source-MySource.zip", "deleteRawMetadataAfterProcessing": false } { "id" : "database_source", "type" : "Database", "username" : "MyUsername", "dialect" : "hive", "externalDbName": "CData" "databaseNames" : ["MyDefaultDbName"], "hostname" : "localhost", "collibraSystemName" : "apache-hive-system", "port" : 1521, "deleteRawMetadataAfterProcessing": false, "customConnectionProperties" : "" } { "id": "oracle-id", "type": "DatabaseOracle", "hostname": "host_url", "username": "user1", "collibraSystemName": "automation_csn", "port": 1521, "serviceNames": ["sn1", "sn2"], "databaseNames": ["db1", "db2"], "deleteRawMetadataAfterProcessing": false } { "id" : "bigquery_source", "type" : "DatabaseBigQuery", "projectIDs" : [ "bigquery_project1", "bigquery_project2" ], "region": "europe-west1" "auth" : "/path/to/the/authentication/file.json", "collibraSystemName" : "bigquery-system-name", "deleteRawMetadataAfterProcessing": false } { "id" : "snowflake_source", "type" : "DatabaseSnowflake", "mode" : "SQL|SQL-API", "collibraSystemName" : "snowflake-system-name", "auth": { "type": "KeyPair|Basic", "username": "some_username", "pathToPrivateKey": "path_to_your_private_key_file", "usePassword": "true|false" }, "hostname" : "MyAccountName.snowflakecomputing.com", "databaseNames" : ["MyFirstDbName","MySecondDbName"], "extraDatabaseDefinitions:" : ["MyFirstExternalDbName","MySecondExternalDbName"], "schemaNames" : ["MyFirstSchemaName","MySecondSchemaName"], "warehouse" : "MySnowflakeWarehouseName", "days" : "1", "deleteRawMetadataAfterProcessing": false, "customConnectionProperties" : "role=MYROLE" } { "id": "mstr", "type": "MSTR_V2", "url": "http://xc2-35-278-34-142.new.amazonaws.com:8080", "username": "Administrator", "maxParallelRequests": 5, "appUrlSuffix": "MicroStrategy/servlet/mstrWeb", "deleteRawMetadataAfterProcessing": false } { "type" : "PowerBI", "id" : "power-bi-1", "tenantDomain": "collibra3.onmicrosoft.com", "loginFlow": { "type": "ServicePrincipal", "applicationId": "be560fac-7545-4ce2-ad9f-cbce14c59af6" }, "domainId": "domain-reference-ID" "daxParserEnabled": true, "deleteRawMetadataAfterProcessing": false } { "id" : "sqldirectory_source", "type" : "SqlDirectory", "path" : "/path/to/the/sql/folder/", "mask" : "*", "recursive" : false, "dialect" : "db2", "database" : "MyDefaultDbName", "collibraSystemName" : "data-source-system", "schema" : "MyDefaultDbSchema", "verbose" : true, "deleteRawMetadataAfterProcessing": false } ] }
If the useCollibraSystemName
property in the lineage harvester configuration file is set to true
, you also need a source-specific configuration file. Use these options to only show the <source ID> or connection definition configuration files that you need.
|
Azure Data Factory
The following example shows an Azure Data Factory <source ID> configuration file.
{ "found_dbname=databasename1;found_hostname=server-name.onmicrosoft.com;found_schema=schema1": { "dbname": "mssql-database-name", "schema": "mssql-schema-name", "dialect": "mssql", "collibraSystemName": "mssql-system-name" }, "found_dbname=datafactory_linkedservice;found_hostname=*": { "dbname": "linkedservice-dbname", "schema": "linkedservice-schema", "collibraSystemName": "linkedservice-system-name" } }
dbt Cloud
The following example shows a dbt Cloud <source ID> configuration file.
{ "collibraSystemNames":{ "projects":[ {"project_id":"654321","collibraSystemName":"SystemName"} ] }, "filter":{ "jobIds":[1234], "projectIds":[654321] }, "materializedMapping":{ "ELS_MATERIALIZE_MULTIPLE_EXTERNAL_TABLES":"VIEW" } }
dbt Core
The following example shows a dbt Core <source ID> configuration file.
{ "collibraSystemNames":{ "projects":[ {"collibraSystemName":"SystemName"} ] }, "materializedMapping":{ "ELS_MATERIALIZE_MULTIPLE_EXTERNAL_TABLES":"VIEW" } }
Informatica PowerCenter
The following example shows an Informatica PowerCenter <source ID> configuration file.
{ "connectionDefinitions": { "oracle_source": { "dbname": "oracle-source-database-name1", "schema": "my Oracle source schema", "dialect": "oracle" }, "oracle_target": { "dbname": "oracle-target-database-name2", "schema": "my other oracle target schema", "dialect": "oracle" } }, "collibraSystemNames": { "databases": [ { "dbname": "oracle-source-database-name1", "collibraSystemName": "oracle-system-name1" }, { "dbname": "oracle-target-database-name2", "collibraSystemName": "oracle-system-name2" } ], "connections": [ { "connectionName": "oracle-source", "collibraSystemName": "oracle-system-name1" }, { "connectionName": "oracle-target", "collibraSystemName": "oracle-system-name2" } ] } }
SQL Server Integration Services
The following example shows an SQL Server Integration Services connection definitions configuration file.
{ "DataSources": { "dhb-sql-prod": { "dialect": "mssql", "collibraSystemName": "my-system-name" } }, "ConnStringRegExTranslation": { "Data Source=dhb-sql-prod;Initial Catalog=SFG_repl_staging;Provider=SQLNCLI11;Integrated Security=SSPI.*": { "dbname": "DATAHUB", "schema": "DBO", "dialect": "mssql", "collibraSystemName" : "WAREHOUSE" }, "Server=sb-dhub;User ID=SYS_USER;Initial Catalog=STAGEDB;Port=6306.*": { "dbname": "STAGEDB", "schema": "STAGE_OWNER", "dialect": "sybase", "collibraSystemName" : "" } } }
IBM InfoSphere DataStage
The following example shows a DataStage connection definitions configuration file.
{ "OdbcDataSources": { "oracle-data-source": { "dbname": "my-oracle-database", "schema": "my-oracle-schema", "dialect": "oracle", "collibraSystemName": "my-system" }, "mssql-data-source": { "dbname": "my-mssql-database", "schema": "my-mssql-schema", "dialect": "mssql", "collibraSystemName": "my-system" } }, "NonOdbcConnectors": { "admin@database-name": { "dbname": "my-netezza-database", "schema": "my-netezza-schema", "dialect": "netezza", "collibraSystemName": "my-system" }, "admin@second-database-name": { "dbname": "my-second-netezza-database", "schema": "my-second-netezza-schema", "dialect": "netezza", "collibraSystemName": "my-system" } }, "jobs": [ "my_job_1", "my_job_2" ], "jobParameters": [ { "name": "parameter_value_1" "value": "parameter_value_1" }, { "name": "parameter_name_2", "value": "parameter_value_2" } ] "perJobParameters": { "jobId1": [ { "name": "parameter_name_1", "value": "parameter_value_3" } ], "jobId2": [ { "name": "parameter_name_1", "value": "parameter_value_4" } ] } }
Informatica Intelligent Cloud Services
The following example shows an Informatica Intelligent Cloud Services <source ID> configuration file.
{ "collibraSystemNames": { "connections": [ { "connectionName": "DG_con_standby_cmdm_clientors", "collibraSystemName": "PUBLIC" }, { "connectionName": "DG_con_dev_dg_dgiauser_su", "collibraSystemName": "PUBLIC" } ] }, "connectionDefinitions": [ { "connectionName": "DG_con_standby_cmdm_clientors", "databaseName": "main", "schemaName": "dbo", "dialect": "oracle" }, { "connectionName": "DG_con_dev_dg_dgiauser_su", "databaseName": "main", "schemaName": "dbo", "dialect": "oracle" } ] }
Tableau
The following example shows a Tableau <source ID> configuration file.
{ "collibraSystemNames": { "files": [ {"filePath": "C:\\ProgramData\\Tableau\\Tableau Server\\data\\files\\sample.xls", "collibraSystemName": "sample-files" } ], "connectors": [ { "connectorUrl": "tableau-server-connector-url.com", "collibraSystemName": "Oracle-connector" } ], "cloudFiles": [ { "name": "file-name", "collibraSystemName": "FILE" } ] }, "hostnameMapping": { "found_dbname=*;found_hostname=abc.net;found_schema=DEFAULT": { "dbname": "CData", "schema": "Jan_1_2022", "dialect": "spark", "collibraSystemName": "TV_testing" } }, "filters": { "sites":{ "site_name":"domain_id" }, "projects":{ "site_name2 > project_name2": "domain-reference-id2", "site_name3 > project_name3 > subproject_name": "domain-reference-id2" } } }
Looker
The following example shows a Looker <source ID> configuration file.
{ "Connections":{ "connection-object1":{ "schema":"mssql-schema-name", "dbname":"mssql-database-name", "collibraSystemName":"mssql-system-name" }, "connection-object2":{ "schema":"oracle-schema-name", "dbname":"oracle-database-name", "collibraSystemName":"oracle-system-name" } }, "filters":[ { "domainId":"605fa8ae-f8c6-4261-938b-8326e2806f3d", "description":"Databricks_folder", "folderIds":[ "abc-123", "def-456" ] }, { "domainId":"245bc5a4-4c30-44b5-8356-ddbe708b56d6", "description":"personal", "folderIds":[ "hij-789*", "jkl-101112" ] } ] }
SQL Server Reporting Services and Power BI Report Server
The following example shows a SQL Server Reporting Services and Power BI Report Server <source ID> configuration file.
{ "DataSources": { "Redshift": { "dbname": "redshift-database-name", "schema": "redshift-schema-name", "dialect": "redshift", "collibraSystemName": "redshift-system-name" }, "Oracle": { "dbname": "oracle-database-name", "schema": "oracle-schema-name", "dialect": "oracle", "collibraSystemName": "oracle-system-name" } }, "CustomDataSources": { "/path to report/custom data souce name": { "dbname": "mssql-database-name", "dialect": "mssql" } } }
Power BI
The following example shows a Power BI <source ID> configuration file.
{ "found_dbname=databasename1;found_hostname=*;found_schema=schema1": { "dbname": "mssql-database-name", "schema": "mssql-schema-name", "dialect": "mssql", "collibraSystemName": "mssql-system-name" }, "found_dbname=databasename2;found_hostname=server-name.onmicrosoft.com;found_schema=schema2": { "dbname": "oracle-database-name", "schema": "oracle-schema-name", "dialect": "oracle", "collibraSystemName": "oracle-system-name" }, "filters":[ { "domainId": "<domain-ref-id>", "description": "FirstFilter", "workspaceNames": ["*"], "excludeWorkspaceIds": ["workspaceC", "workspaceD"] }, { "domainId": "<domain-ref-id>", "description": "SecondFilter", "workspaceNames": ["workspace3", "workspace4"], "capacityIds": ["id1","id2"] } ] }
Matillion
The following example shows a Matillion <source ID> configuration file.
{ "found_dbname=dbtest;found_hostname=test": { "collibraSystemName": "mssql-system-name" }, "found_dbname=testsid;found_hostname=*": { "dbname": "oracle-database-name", "schema": "oracle-schema-name", "collibraSystemName": "oracle-system-name" } }
MicroStrategy
The following example shows a MicroStrategy <source ID> configuration file.
{ "default_domain_id": "66ab7911-d84a-43d6-b9f7-40130jd84ds98", "filters": [ { "domainId": "3ae946ec-9030-46b8-ad86-a217b8f99bab", "projectIDs": ["3FF7CF7D4C5A2AEBF14D8B9802698B27"] }, { "domainId": "21cfb3ae-0616-478e-9060-114130540be5", "projectNames": ["Human Resources Analysis Module"] } ], "datasourceMapping": [ { "found_datasource": "REDSHIFT", "found_project": "*", "mapping": { "dbname": "RD_pearl", "schema_name": "Default_North", "dialect": "spark", "collibraSystemName": "TV_dev" } } ] }
The following is an example of a lineage harvester configuration file for US government customers using Power BI. The techlin
section contains the information required to connect to the Collibra Data Lineage service instance.
{ "general": { "techlin": { "url": "https://techlin-gov.collibra.com", "userKey": "<your-unique-api-key>" }, "catalog": { "url": "https://catalog-instance.collibra.com", "username": "Admin" }, "useCollibraSystemName": false }, "sources": [ { "type": "PowerBI", "id": "power-bi-id", "tenantDomain": "collibrapowerbi.onmicrosoft.com", "loginFlow": { "type": "ServicePrincipal", "applicationId": "ab123cde-1234-1234-1234-abcd12e34fg5" }, "domainId": "domain-reference-ID", "deleteRawMetadataAfterProcessing": true } ] }