The configuration file generator
The configuration file generator helps you create your lineage harvester configuration file more easily by providing the structure of the file with the correct properties per data source.
The lineage harvester configuration file
The lineage harvester uses a configuration file to connect to data sources, BI tools and ETL tools. The configuration file contains references to the data sources for which you want to create a technical lineage. You have to prepare the configuration file if you want to create a technical lineage and add new relations of the type "Data Element targets / sources Data Element" between existing assets in Data Catalog and "Column is target of / is source of Data Attribute" between assets from ingested BI sources and assets in Data Catalog.
If you use multiple lineage harvesters to create the technical lineage, you can create a configuration file for each lineage harvester and add different data source configurations to each configuration file. Note that multiple configuration files can only be used for testing purposes.
Empty configuration file
When you run the lineage harvester for the first time, it creates an empty configuration file. To create a technical lineage, you have to manually add properties and values, per data source, to this configuration file.
The following image shows an example of the empty configuration file created by the lineage harvester.
{
"general" : {
"catalog" : {
"url" : "",
"username" : "",
},
"useCollibraSystemName" : false
},
"sources" : [ {
"type" : "Database",
"id" : "MyDB",
"hostname" : "",
"username" : "",
"dialect" : "",
"collibraSystemName" : "",
"databaseNames" : [ ],
"port" : 1521
} ]
}
Configuration file generator
The configuration file generator creates an example configuration file with the data source properties of your choosing:
- Select the (meta)data sources for which you want to create a technical lineage.
- Scroll down to the configuration file example.
- Click
to copy the example.
The configuration file example is copied to your clipboard. - Paste the example in your empty configuration file in the lineage harvesterconfig folder.
- Replace the values in the example to match your actual data source information.
Tip Make sure you understand each property and know which values you must use to access your data source information.
- Run the lineage harvester.
Warning Some browser plug-ins may slow the configuration file generator down.
- Lineage harvester configuration file
- Source-specific configuration files
Use these options to filter the data source properties in the configuration file to your needs.
-+ IBM InfoSphere DataStage
-+ Informatica PowerCenter -+ SQL Server Integration Services -+ Informatica Intelligent Cloud Services - Data Integration -+ Matillion -+ Tableau -+ Looker -+ MicroStrategy -+ Power BI (deprecated) -+ Power BI -+ SQL Server Reporting Services (SSRS) and Power BI Report Server (PBRS) -+ Custom lineages -+ SQL files in the lineage harvester output folder (downloaded SQL files) | -+ Oracle -+ Google BigQuery -+ Snowflake -+ Other SQL data sources with connection type "JDBC":
| -+ SQL directories with connection type "folder":
|
{
"general": {
"catalog" : {
"url" : "https://companydomain.collibra.com",
"username" : "my-Collibra-username"
},
"useCollibraSystemName" : false
},
"sources" : [
{
"collibraSystemName" : "adf-system-name",
"id" : "adf_source",
"type" : "AzureDataFactory",
"tenantDomain": "tenant-domain",
"loginFlow": {
"type": "ServicePrincipal",
"applicationId": "application-id"
},
"subscriptionId" : "subscription-id",
"resourceGroupName" : "resource-group-name"
"deleteRawMetadataAfterProcessing": false
}
{
"collibraSystemName" : "datastage-system-name",
"id" : "datastage_source",
"type" : "ExternalDirectory",
"dirType" : "DATASTAGE",
"path" : "/path/to/the/datastage/folder/",
"mask" : "*",
"recursive" : false,
"deleteRawMetadataAfterProcessing": false
}
{
"collibraSystemName" : "infa-system-name",
"id" : "informatica_source",
"type" : "ExternalDirectory",
"dirType" : "INFA",
"path" : "/path/to/the/informatica/folder/",
"mask" : "*",
"recursive" : false,
"deleteRawMetadataAfterProcessing": false
}
{
"collibraSystemName" : "ssis-system-name",
"id" : "datastage_source",
"type" : "ExternalDirectory",
"dirType" : "SSIS",
"path" : "/path/to/the/ssis/folder/",
"mask" : "*",
"recursive" : false,
"deleteRawMetadataAfterProcessing": false
}
{
"type" : "IICS",
"id" : "iics_source",
"collibraSystemName" : "iics-development",
"loginUrl" : "https://dm-us.informaticaintelligentcloud.com",
"username" : "login-iics",
"deleteRawMetadataAfterProcessing": false,
"objects" : [
{
"path" : "Default/Sales",
"type" : "Project"
},
{
"path" : "My Project/Statistics",
"type" : "Project"
}
]
}
{
"id" : "my-matillion-project",
"type" : "Matillion",
"url" : "https://my-domain",
"groupName" : "my-matillion-group",
"projectName" : "redshift-project",
"environmentName" : "redshift-environment",
"dialect" : "redshift",
"startTimestamp" : 1594080796911,
"collibraSystemName": "Matillion-system",
"deleteRawMetadataAfterProcessing": false,
"auth": {
"type": "Basic",
"username": "ec2-user"
}
}
{
"type": "Tableau",
"id": "unique-ID",
"url": "URL to Tableau server?",
"username": "Admin",
"siteIds": ["site ID of Tableau Site 1", "site ID of Tableau Site 2"],
"siteNames": ["site name of Tableau Site 1", "site name of Tableau Site 2"],
"restOnly": false,
"domainId": "Domain-resource-ID",
"excludeImages": true,
"deleteRawMetadataAfterProcessing": false,
"paging": {
"pagination-setting": 100,
"pagination-setting-2": 100
}
}
{
"id" : "looker-source",
"type" : "Looker",
"lookerUrl" : "https://<instance-name.api.looker.com",
"clientId" : "my-looker-api-user-name",
"clientSecret": "looker-api-userkey",
"domainId" : "22258f64-40b6-4b16-9c08-c95f8ec0da26",
"deleteRawMetadataAfterProcessing": false }
{
"type" : "ExistingLineage",
"id" : "MyPowerBISourceID",
"deleteRawMetadataAfterProcessing": false
}
{
"id": "<unique-id>",
"type": "SSRS",
"url": "http://<IP address or computer name>/Reports",
"username": "<server-api-user-name>",
"domainId": "<domain-resource-id>",
"folderFilter": ["/Folder1/*", "Folder2"],
"deleteRawMetadataAfterProcessing": false
}
{
"collibraSystemName" : "custom-system-name",
"id" : "MyCustomLineage",
"type" : "ExternalDirectory",
"dirType" : "custom-lineage",
"path" : "/path/to/custom-lineage/dir",
"deleteRawMetadataAfterProcessing": false
}
{
"type" : "LoadedSource",
"id" : "MySource",
"zipFile" : "/path/to/source-MySource.zip",
"deleteRawMetadataAfterProcessing": false
}
{
"id" : "database_source",
"type" : "Database",
"username" : "MyUsername",
"dialect" : "hive",
"databaseNames" : ["MyDefaultDbName"],
"hostname" : "localhost",
"collibraSystemName" : "apache-hive-system",
"port" : 1521,
"deleteRawMetadataAfterProcessing": false,
"customConnectionProperties" : ""
}
{
"id": "oracle-id",
"type": "DatabaseOracle",
"hostname": "host_url",
"username": "user1",
"collibraSystemName": "automation_csn",
"port": 1521,
"serviceNames": ["sn1", "sn2"],
"databaseNames": ["db1", "db2"],
"deleteRawMetadataAfterProcessing": false
}
{
"id" : "bigquery_source",
"type" : "DatabaseBigQuery",
"projectIDs" : [ "bigquery_project1", "bigquery_project2" ],
"region": "europe-west1"
"auth" : "/path/to/the/authentication/file.json",
"collibraSystemName" : "bigquery-system-name",
"deleteRawMetadataAfterProcessing": false
}
{
"id" : "snowflake_source",
"type" : "DatabaseSnowflake",
"mode" : "SQL|SQL-API",
"collibraSystemName" : "snowflake-system-name",
"auth": {
"type": "KeyPair|Basic",
"username": "some_username",
"pathToPrivateKey": "path_to_your_private_key_file",
"usePassword": "true|false"
},
"hostname" : "MyAccountName.snowflakecomputing.com",
"databaseNames" : ["MyFirstDbName","MySecondDbName"],
"warehouse" : "MySnowflakeWarehouseName",
"deleteRawMetadataAfterProcessing": false,
"customConnectionProperties" : "role=MYROLE"
}
{
"type": "Microstrategy",
"id": "microstrategy-batch",
"domainId": "<domain-resource-id>",
"username": "mstr",
"hostname": "remote.postgres.com",
"port": 5432,
"databaseName": "poc_metadata",
"deleteRawMetadataAfterProcessing": false
}
{
"type" : "PowerBI",
"id" : "power-bi-1",
"tenantDomain": "collibra3.onmicrosoft.com",
"loginFlow": {
"type": "ServicePrincipal",
"applicationId": "be560fac-7545-4ce2-ad9f-cbce14c59af6"
},
"domainId": "domain-reference-ID",
"deleteRawMetadataAfterProcessing": false
}
{
"id" : "sqldirectory_source",
"type" : "SqlDirectory",
"path" : "/path/to/the/sql/folder/",
"mask" : "*",
"recursive" : false,
"dialect" : "db2",
"database" : "MyDefaultDbName",
"collibraSystemName" : "data-source-system",
"schema" : "MyDefaultDbSchema",
"verbose" : true,
"deleteRawMetadataAfterProcessing": false
} ]
}
If the useCollibraSystemName property in the lineage harvester configuration file is set to true, you also need a source-specific configuration file. Use these options to only show the <source ID> or connection definition configuration files that you need.
|
|
Informatica PowerCenter
The following example shows an Informatica PowerCenter <source ID> configuration file.
{
"connectionDefinitions": {
"oracle_source": {
"dbname": "oracle-source-database-name1",
"schema": "my Oracle source schema",
"dialect": "oracle"
},
"oracle_target": {
"dbname": "oracle-target-database-name2",
"schema": "my other oracle target schema",
"dialect": "oracle"
}
},
"collibraSystemNames": {
"databases": [
{
"dbname": "oracle-source-database-name1",
"collibraSystemName": "oracle-system-name1"
},
{
"dbname": "oracle-target-database-name2",
"collibraSystemName": "oracle-system-name2"
}
],
"connections": [
{
"connectionName": "oracle-connection-name1",
"collibraSystemName": "oracle-system-name1"
},
{
"connectionName": "oracle-connection-name2",
"collibraSystemName": "oracle-system-name2"
}
]
}
}
SQL Server Integration Services
The following example shows an SQL Server Integration Services connection definitions configuration file.
{
"ConnStringRegExTranslation": {
"Data Source=dhb-sql-prod;Initial Catalog=SFG_repl_staging;Provider=SQLNCLI11;Integrated Security=SSPI.*": {
"dbname": "DATAHUB",
"schema": "DBO",
"dialect": "mssql",
"collibraSystemName" : "WAREHOUSE"
},
"Server=sb-dhub;User ID=SYS_USER;Initial Catalog=STAGEDB;Port=6306.*": {
"dbname": "STAGEDB",
"schema": "STAGE_OWNER",
"dialect": "sybase",
"collibraSystemName" : ""
}
}
}
IBM InfoSphere DataStage
The following example shows a DataStage connection definitions configuration file.
{
"OdbcDataSources": {
"oracle-data-source": {
"dbname": "my-oracle-database",
"schema": "my-oracle-schema",
"dialect": "oracle",
"collibraSystemName": "my-system"
},
"mssql-data-source": {
"dbname": "my-mssql-database",
"schema": "my-mssql-schema",
"dialect": "mssql",
"collibraSystemName": "my-system"
}
},
"NonOdbcConnectors": {
"admin@database-name": {
"dbname": "my-netezza-database",
"schema": "my-netezza-schema",
"dialect": "netezza",
"collibraSystemName": "my-system"
},
"admin@second-database-name": {
"dbname": "my-second-netezza-database",
"schema": "my-second-netezza-schema",
"dialect": "netezza",
"collibraSystemName": "my-system"
}
}
}
Informatica Intelligent Cloud Services
The following example shows an Informatica Intelligent Cloud Services <source ID> configuration file.
{
"collibraSystemNames": {
"connections": [
{
"connectionName": "DG_con_standby_cmdm_clientors",
"collibraSystemName": "PUBLIC"
},
{
"connectionName": "DG_con_dev_dg_dgiauser_su",
"collibraSystemName": "PUBLIC"
}
]
},
"connectionDefinitions": [
{
"connectionName": "DG_con_standby_cmdm_clientors",
"databaseName": "main",
"schemaName": "dbo",
"dialect": "oracle"
},
{
"connectionName": "DG_con_dev_dg_dgiauser_su",
"databaseName": "main",
"schemaName": "dbo",
"dialect": "oracle"
}
]
}
Tableau
The following example shows a Tableau <source ID> configuration file.
{
"collibraSystemNames": {
"databases": [
{
"hostName": "database-hostname",
"collibraSystemName": "public"
}
],
"files": [
{"filePath": "C:\\ProgramData\\Tableau\\Tableau Server\\data\\files\\sample.xls",
"collibraSystemName": "sample-files"
}
],
"connectors": [
{
"connectorUrl": "tableau-server-connector-url.com",
"collibraSystemName": "Oracle-connector"
}
],
"cloudFiles": [
{
"name": "file-name",
"collibraSystemName": "FILE"
}
]
},
"databaseMapping": {
"<hostname:port>":"<actual database name>"
},
"filters": {
"sites":{
"site_name":"domain_id"
},
"projects":{
"site_name2 > project_name2": "domain-reference-id2",
"site_name3 > project_name3 > subproject_name": "domain-reference-id2"
}
}
}
Looker
The following example shows a Looker <source ID> configuration file.
{
"Connections": {
"connection-object1": {
"dialect": "mssql",
"schema": "mssql-schema-name",
"dbname": "mssql-database-name",
"collibraSystemName": "mssql-system-name"
},
"connection-object2": {
"dialect": "oracle",
"schema": "oracle-schema-name",
"dbname": "oracle-database-name",
"collibraSystemName": "oracle-system-name"
}
}
"filters":[
{
"domainId":"<reference ID>",
"description":"any-description",
"folderNames":["Folder1", "Folder2"]
},
{
"domainId":"<reference ID>",
"description":"any-description",
"folderNames":["Folder3", "Folder4"]
},
{
"domainId":"<reference ID>",
"description":"any-description",
"folderIds":["123xxxx", "456xxxx"]
}
]
}
}
SQL Server Reporting Services and Power BI Report Server
The following example shows a SQL Server Reporting Services and Power BI Report Server <source ID> configuration file.
{
"DataSources": {
"Redshift": {
"dbname": "redshift-database-name",
"schema": "redshift-schema-name",
"dialect": "redshift",
"collibraSystemName": "redshift-system-name"
},
"Oracle": {
"dbname": "oracle-database-name",
"schema": "oracle-schema-name",
"dialect": "oracle",
"collibraSystemName": "oracle-system-name"
}
},
"CustomDataSources": {
"/path to report/custom data souce name": {
"dbname": "mssql-database-name",
"dialect": "mssql"
}
}
}
Power BI
The following example shows a Power BI <source ID> configuration file.
{
"found_dbname=databasename1;found_hostname=*;found_schema=schema1": {
"dbname": "mssql-database-name",
"schema": "mssql-schema-name",
"dialect": "mssql",
"collibraSystemName": "mssql-system-name"
},
"found_dbname=databasename2;found_hostname=server-name.onmicrosoft.com;found_schema=schema2": {
"dbname": "oracle-database-name",
"schema": "oracle-schema-name",
"dialect": "oracle",
"collibraSystemName": "oracle-system-name"
},
"filters":[
{
"domainId": "<domain-ref-id>",
"description": "FirstFilter",
"workspaceNames": "*",
"excludeWorkspaceIds": ["workspaceC", "workspaceD"]
},
{
"domainId": "<domain-ref-id>",
"description": "SecondFilter",
"workspaceNames": ["workspace3", "workspace4"],
"capacityIds": ["id1","id2"]
}
]
}
Matillion
The following example shows a Matillion <source ID> configuration file.
{
"collibraSystemNames":
{
"sources":[
{
"jobName":"<name of job>",
"collibraSystemName":"<name>"
},
{
"jobName":"<name of job>",
"collibraSystemName":"<name>"
}
],
"targets":[
{
"jobName":"<name of job>",
"collibraSystemName":"<name>"
},
{
"jobName":"<name of jobv",
"collibraSystemName":"<name>"
}
]
}
}