The configuration file generator

The configuration file generator helps you create your lineage harvester configuration file by providing the structure of the file with the correct properties per data source.

The lineage harvester configuration file

The lineage harvester uses a configuration file to connect to JDBC data sources, BI tools and ETL tools. The configuration file contains references to the data sources for which you want to create a technical lineage. You have to prepare the configuration file if you want to create a technical lineage and add new relations of the type "Data Element targets / sources Data Element" between existing assets in Data Catalog, and "Column is target of / is source of Data Attribute" between assets from ingested BI sources and assets in Data Catalog.

Tip  You have to save the configuration file in the config directory in the lineage harvester folder.

Empty configuration file

When you run the lineage harvester for the first time, it creates an empty configuration file. To create a technical lineage, you have to manually add properties and values, per data source, to this configuration file.

The following image shows an example of the empty configuration file created by the lineage harvester.

{
	"general" : {
		"catalog" : {
			"url" : "",
			"username" : "",
		},
		"useCollibraSystemName" : false
	},
	"sources" : [ {
		"type" : "Database",
		"id" : "MyDB",
		"hostname" : "",
		"username" : "",
		"dialect" : "",
		"collibraSystemName" : "",
		"databaseNames" : [ ],
		"port" : 1521
	} ]
}

Configuration file generator

The configuration file generator creates an example configuration file with the data source properties of your choosing:

  1. Select the (meta)data sources for which you want to create a technical lineage.
    Tip Click the Lineage harvester for GovCloud customers tab below, to view an example configuration file for GovCloud customers.
  2. Scroll down to the configuration file example.
  3. Click Copy code to copy the example.
    The configuration file example is copied to your clipboard.
  4. Paste the example in your empty configuration file in the lineage harvesterconfig folder.
  5. Replace the values in the example to match your actual data source information.
    Tip Make sure you understand each property and know which values you must use to access your data source information.
  6. Run the lineage harvester.

Warning Some browser plug-ins may slow the configuration file generator down.

Tip 
Use these options to filter the data source properties in the configuration file to your needs.
-+ Azure Data Factory
-+ dbt Cloud
-+ dbt Core
-+ IBM InfoSphere DataStage
-+ Informatica PowerCenter
-+ SQL Server Integration Services
-+ Informatica Intelligent Cloud Services - Data Integration
-+ Matillion
-+ Tableau
-+ Looker
-+ MicroStrategy
-+ Power BI
-+ SQL Server Reporting Services (SSRS) and Power BI Report Server (PBRS)
-+ Custom lineages
-+ SQL files in the lineage harvester output folder (downloaded SQL files)
-+ Oracle
-+ Google BigQuery
-+ Snowflake
-+ Other SQL data sources with connection type "JDBC":
  • Amazon Redshift
  • Azure SQL Data Warehouse
  • Azure SQL Server
  • Greenplum
  • IBM DB2
  • PostgreSQL
  • Microsoft SQL Server
  • MySQL
  • Netezza
  • SAP HANA
  • Spark SQL
  • Sybase Adaptive Server Enterprise
  • Teradata
-+ SQL directories with connection type "folder":
  • Amazon Redshift
  • Azure SQL Data Warehouse
  • Azure SQL Server
  • Google BigQuery
  • Greenplum
  • HiveQL
  • IBM DB2
  • Oracle
  • PostgreSQL
  • Microsoft SQL Server
  • MySQL
  • Netezza
  • SAP HANA
  • Snowflake
  • Spark SQL
  • Sybase Adaptive Server Enterprise
  • Teradata

Copy code

{
	"general": {	
		"catalog" : {
			"url" : "https://companydomain.collibra.com",
			"username" : "my-Collibra-username"
			},
			"useCollibraSystemName" : false
	},
	"sources" : [ 
	
	{
		"id" : "adf_source",
		"type" : "AzureDataFactory",
		"tenantDomain": "tenant-domain",
		"loginFlow": {
		    "type": "ServicePrincipal",
		    "applicationId": "application-id"
		    },
	            "subscriptionId" : "subscription-id",
		    "resourceGroupName" : "resource-group-name",
		    "factories" : ["factoryname1","factoryname2"],
		    "deleteRawMetadataAfterProcessing": false

	}
	{
		"type": "dbt",
		"id": "mydbt",
		"tokenName": "mytokenname",
		"adminUrl":"https://cloud.getdbt.com/api/v2",
		"environmentIds":"[123456, 987654]",
		"metadataUrl":"https://metadata.cloud.getdbt.com/graphql",
		"deleteRawMetadataAfterProcessing": false
	}
			
	{
		"id" : "my_dbt_source",
		"type" : "ExternalDirectory",
		"dirType" : "dbt",
		"path" : "/opt/Collibra/techlin/dbt-core-files/",
		"recursive" : true,
		"deleteRawMetadataAfterProcessing" : false
			}

	{
		"id" : "datastage_source",
		"type" : "ExternalDirectory",
		"dirType" : "DATASTAGE",
		"path" : "/path/to/the/datastage/folder/",
		"mask" : "*",
		"recursive" : false,
		"deleteRawMetadataAfterProcessing": false
	}
	{
		"dialect" : "oracle",
		"id" : "informatica_source",
		"type" : "ExternalDirectory",
		"dirType" : "powercenter",
		"path" : "/path/to/the/informatica/folder/",
		"mask" : "*",
		"recursive" : false,
		"deleteRawMetadataAfterProcessing": false
	}
	{
		"id" : "datastage_source",
		"type" : "ExternalDirectory",
		"dirType" : "SSIS",
		"path" : "/path/to/the/ssis/folder/",
		"mask" : "*",
		"recursive" : false,
		"deleteRawMetadataAfterProcessing": false
	}
	{
		"type" : "IICS",
		"id" : "iics_source",
		"loginUrl" : "https://dm-us.informaticaintelligentcloud.com",
		"username" : "login-iics",
		"deleteRawMetadataAfterProcessing": false,
		"objects" : [
			{
				"path" : "Default/Sales",
				"type" : "Project"
			},
			{
				"path" : "My Project/Statistics",
				"type" : "Project"
			}
		]
	}
	{
		"id" : "my-matillion-project",
		"type" : "Matillion",
		"url" : "https://my-domain",
		"groupName" : "my-matillion-group",
		"projectName" : "redshift-project",
		"environmentName" : "redshift-environment",
		"dialect" : "redshift",			
		"startTimestamp" : 1594080796911,
		"deleteRawMetadataAfterProcessing": false,
		"auth": {
			"type": "Basic",
			"username": "ec2-user"
		}
	}
	{
		"type": "Tableau",
		"id": "unique-ID",
		"url": "URL to Tableau server?",
		"username": "Admin",
		"siteIds": ["site ID of Tableau Site 1", "site ID of Tableau Site 2"],
		"siteNames": ["site name of Tableau Site 1", "site name of Tableau Site 2"],
		"restOnly": false,
		"domainId": "Domain-resource-ID",
		"excludeImages": true,
		"deleteRawMetadataAfterProcessing": false,
		"paging": {
			"pagination-setting": 100,
			"pagination-setting-2": 100
		}
	}
	{
		"id" : "looker-source",
		"type" : "Looker",
		"lookerUrl" : "https://<instance-name.api.looker.com",
		"clientId" : "my-looker-api-user-name",
		"clientSecret": "looker-api-userkey",
		"domainId" : "22258f64-40b6-4b16-9c08-c95f8ec0da26",
		"deleteRawMetadataAfterProcessing": false
	}
	{
		"id": "<unique-id>",
		"type": "SSRS",
		"url": "http://<IP address or computer name>/Reports",
		"username": "<server-api-user-name>",
		"domainId": "<domain-resource-id>",
		"folderFilter":  ["/Folder1/*", "Folder2"],
		"deleteRawMetadataAfterProcessing": false
	}
	{
		"collibraSystemName" : "custom-system-name",
		"id" : "MyCustomLineage",
		"type" : "ExternalDirectory",
		"dirType" : "custom-lineage",
		"path" : "/path/to/custom-lineage/dir",
		"deleteRawMetadataAfterProcessing": false
	}
	{
		"type" : "LoadedSource",
		"id" : "MySource",
		"zipFile" : "/path/to/source-MySource.zip",
		"deleteRawMetadataAfterProcessing": false
	}
	{
		"id" : "database_source",
		"type" : "Database",
		"username" : "MyUsername",
		"dialect" : "hive",
		"externalDbName": "CData"
		"databaseNames" : ["MyDefaultDbName"],
		"hostname" : "localhost",
		"collibraSystemName" : "apache-hive-system",
		"port" : 1521,
		"deleteRawMetadataAfterProcessing": false,
		"customConnectionProperties" : ""
	}
	{
		"id": "oracle-id",
		"type": "DatabaseOracle",
		"hostname": "host_url",
		"username": "user1",
		"collibraSystemName": "automation_csn",
		"port": 1521,
		"serviceNames": ["sn1", "sn2"],
		"databaseNames": ["db1", "db2"],
		"deleteRawMetadataAfterProcessing": false
	}
	{			
		"id" : "bigquery_source",
		"type" : "DatabaseBigQuery",
		"projectIDs" : [ "bigquery_project1", "bigquery_project2" ],
		"region": "europe-west1"
		"auth" : "/path/to/the/authentication/file.json",
		"collibraSystemName" : "bigquery-system-name",
		"deleteRawMetadataAfterProcessing": false
	}
	{
		"id" : "snowflake_source",
		"type" : "DatabaseSnowflake",
		"mode" : "SQL|SQL-API",
		"collibraSystemName" : "snowflake-system-name",
		"auth": {
			"type": "KeyPair|Basic",
			"username": "some_username",
			"pathToPrivateKey": "path_to_your_private_key_file",
			"usePassword": "true|false" 
		},
		"hostname" : "MyAccountName.snowflakecomputing.com",
		"databaseNames" : ["MyFirstDbName","MySecondDbName"],
		"extraDatabaseDefinitions:" : ["MyFirstExternalDbName","MySecondExternalDbName"],
		"schemaNames" : ["MyFirstSchemaName","MySecondSchemaName"],
		"warehouse" : "MySnowflakeWarehouseName",
		"days" : "1",			
		"deleteRawMetadataAfterProcessing": false,
		"customConnectionProperties" : "role=MYROLE"
					}
	{
		"id": "mstr",
		"type": "MSTR_V2",
		"url": "http://ec2-18-216-24-232.us-east-2.compute.amazonaws.com:8080",
		"username": "Administrator",
		"maxParallelRequests": 5,
		"appUrlSuffix": "MicroStrategy/servlet/mstrWeb",
		"deleteRawMetadataAfterProcessing": false
	}
	{
		"type" : "PowerBI",
		"id" : "power-bi-1",
		"tenantDomain": "collibra3.onmicrosoft.com",
		"loginFlow": {
			"type": "ServicePrincipal",
			"applicationId": "be560fac-7545-4ce2-ad9f-cbce14c59af6"
		},
		"domainId": "domain-reference-ID"
		"daxParserEnabled": true,
		"deleteRawMetadataAfterProcessing": false
	}
	{
		"id" : "sqldirectory_source",
		"type" : "SqlDirectory",
		"path" : "/path/to/the/sql/folder/",
		"mask" : "*",
		"recursive" : false,
		"dialect" : "db2",
		"database" : "MyDefaultDbName",
		"collibraSystemName" : "data-source-system",
		"schema" : "MyDefaultDbSchema",
		"verbose" : true,
		"deleteRawMetadataAfterProcessing": false
	} ]
}
Tip 

If the useCollibraSystemName property in the lineage harvester configuration file is set to true, you also need a source-specific configuration file. Use these options to only show the <source ID> or connection definition configuration files that you need.

Azure Data Factory

The following example shows an Azure Data Factory <source ID> configuration file.

{
     "found_dbname=databasename1;found_hostname=server-name.onmicrosoft.com;found_schema=schema1": {
         "dbname": "mssql-database-name",
         "schema": "mssql-schema-name",
         "dialect": "mssql",
         "collibraSystemName": "mssql-system-name"
     },
     "found_dbname=datafactory_linkedservice;found_hostname=*": {
         "dbname": "linkedservice-dbname",
         "schema": "linkedservice-schema",
         "collibraSystemName": "linkedservice-system-name"
     }
}	

Copy code

dbt Cloud

The following example shows a dbt Cloud <source ID> configuration file.

Copy code

Copy code

{
	"collibraSystemNames":{
	    "projects":[
		{"project_id":"654321","collibraSystemName":"SystemName"}
	    ]
	},
	"filter":{
	    "jobIds":[1234],
	    "projectIds":[654321]
	},
	"materializedMapping":{
	    "ELS_MATERIALIZE_MULTIPLE_EXTERNAL_TABLES":"VIEW"
	}
}	

dbt Core

The following example shows a dbt Core <source ID> configuration file.

Copy code

Copy code

{
	"collibraSystemNames":{
	    "projects":[
		{"collibraSystemName":"SystemName"}
	    ]
	},
	"materializedMapping":{
	    "ELS_MATERIALIZE_MULTIPLE_EXTERNAL_TABLES":"VIEW"
	}
}	

Informatica PowerCenter

The following example shows an Informatica PowerCenter <source ID> configuration file.

Copy code

Copy code

{
	"connectionDefinitions": {
		"oracle_source": {
			"dbname": "oracle-source-database-name1",
			"schema": "my Oracle source schema",
			"dialect": "oracle"
		},
		"oracle_target": {
			"dbname": "oracle-target-database-name2",
			"schema": "my other oracle target schema",
			"dialect": "oracle"
		}
	},
	"collibraSystemNames": {
		"databases": [
			{
				"dbname": "oracle-source-database-name1",
				"collibraSystemName": "oracle-system-name1"
			},
			{
				"dbname": "oracle-target-database-name2",
				"collibraSystemName": "oracle-system-name2"
			}
		],
		"connections": [
			{
				"connectionName": "oracle-connection-name1",
				"collibraSystemName": "oracle-system-name1"
			},
			{
				"connectionName": "oracle-connection-name2",
				"collibraSystemName": "oracle-system-name2"
			}
		]
	}
}	

SQL Server Integration Services

The following example shows an SQL Server Integration Services connection definitions configuration file.

Copy code

Copy code

{
  "DataSources": {
    "dhb-sql-prod": {
      "dialect": "mssql",
      "collibraSystemName": "my-system-name"
    }
},
{
  "ConnStringRegExTranslation": {

    "Data Source=dhb-sql-prod;Initial Catalog=SFG_repl_staging;Provider=SQLNCLI11;Integrated Security=SSPI.*": {
      "dbname": "DATAHUB",
      "schema": "DBO",
      "dialect": "mssql",
      "collibraSystemName" : "WAREHOUSE"
    },

    "Server=sb-dhub;User ID=SYS_USER;Initial Catalog=STAGEDB;Port=6306.*": {
      "dbname": "STAGEDB",
      "schema": "STAGE_OWNER",
      "dialect": "sybase",
      "collibraSystemName" : ""
    }

  }
}

IBM InfoSphere DataStage

The following example shows a DataStage connection definitions configuration file.

Copy code

{
  "OdbcDataSources": {
    "oracle-data-source": {
      "dbname": "my-oracle-database",
      "schema": "my-oracle-schema",
      "dialect": "oracle",
      "collibraSystemName": "my-system"
    },
    "mssql-data-source": {
      "dbname": "my-mssql-database",
      "schema": "my-mssql-schema",
      "dialect": "mssql",
      "collibraSystemName": "my-system"
    }
  },
  "NonOdbcConnectors": {
    
    "admin@database-name": {
      "dbname": "my-netezza-database",
      "schema": "my-netezza-schema",
      "dialect": "netezza",
      "collibraSystemName": "my-system"
    },
    "admin@second-database-name": {
      "dbname": "my-second-netezza-database",
      "schema": "my-second-netezza-schema",
      "dialect": "netezza",
      "collibraSystemName": "my-system"
    }
  },   
  "jobs": [
    "my_job_1",
    "my_job_2"
  ],
  "jobParameters": [
    {
      "name": "job_parameter_name_1",
      "value": "job_parameter_value_1"
    },
    {
      "name": "job_parameter_name_2",
      "value": "job_parameter_value_2"
    }
  ]
}

Informatica Intelligent Cloud Services

The following example shows an Informatica Intelligent Cloud Services <source ID> configuration file.

Copy code

Copy code

{
	"collibraSystemNames": {
		"connections": [
			{
				"connectionName": "DG_con_standby_cmdm_clientors",
				"collibraSystemName": "PUBLIC"
			},
			{
				"connectionName": "DG_con_dev_dg_dgiauser_su",
				"collibraSystemName": "PUBLIC"
			}
		]
	},
	"connectionDefinitions": [
		{
			"connectionName": "DG_con_standby_cmdm_clientors",
			"databaseName": "main",
			"schemaName": "dbo",
			"dialect": "oracle"
		},
		{
			"connectionName": "DG_con_dev_dg_dgiauser_su",
			"databaseName": "main",
			"schemaName": "dbo",
			"dialect": "oracle"
		}
	]
}	

Tableau

The following example shows a Tableau <source ID> configuration file.

Copy code

{
  "collibraSystemNames": {										
    "files": [
      {"filePath": "C:\\ProgramData\\Tableau\\Tableau Server\\data\\files\\sample.xls",
        "collibraSystemName": "sample-files"
      }
    ],
    "connectors": [
      {
        "connectorUrl": "tableau-server-connector-url.com",
        "collibraSystemName": "Oracle-connector"
      }
    ],
    "cloudFiles": [
      {
        "name": "file-name",
        "collibraSystemName": "FILE"
      }
    ]
},
  "hostnameMapping": {
	"found_dbname=*;found_hostname=abc.net;found_schema=DEFAULT": {
		"dbname": "CData",
		"schema": "Jan_1_2022",
		"dialect": "spark",
		"collibraSystemName": "TV_testing"
	}
},
  "filters": {
	 "sites":{
	   "site_name":"domain_id"
	 },
        "projects":{
          "site_name2 > project_name2": "domain-reference-id2",
          "site_name3 > project_name3 > subproject_name": "domain-reference-id2"
	 }
   }	
}			

Looker

The following example shows a Looker <source ID> configuration file.

Copy code

Copy code

{
   "Connections":{
      "connection-object1":{
         "dialect":"mssql",
         "schema":"mssql-schema-name",
         "dbname":"mssql-database-name",
         "collibraSystemName":"mssql-system-name"
      },
      "connection-object2":{
         "dialect":"oracle",
         "schema":"oracle-schema-name",
         "dbname":"oracle-database-name",
         "collibraSystemName":"oracle-system-name"
      }
   },
   "filters":[
      {
         "domainId":"605fa8ae-f8c6-4261-938b-8326e2806f3d",
         "description":"Databricks_folder",
         "folderIds":[
            "abc-123",
            "def-456"
         ]
      },
      {
         "domainId":"245bc5a4-4c30-44b5-8356-ddbe708b56d6",
         "description":"personal",
         "folderIds":[
            "hij-789*",
            "jkl-101112"
         ]
      }
   ]
}	

SQL Server Reporting Services and Power BI Report Server

The following example shows a SQL Server Reporting Services and Power BI Report Server <source ID> configuration file.

Copy code

Copy code

{
	"DataSources": {
		"Redshift": {
			"dbname": "redshift-database-name",
			"schema": "redshift-schema-name",
			"dialect": "redshift",
			"collibraSystemName": "redshift-system-name"
		},
		"Oracle": {
			"dbname": "oracle-database-name",
			"schema": "oracle-schema-name",
			"dialect": "oracle",
			"collibraSystemName": "oracle-system-name"
		}
	},
	"CustomDataSources": {
		"/path to report/custom data souce name": {
			"dbname": "mssql-database-name",
			"dialect": "mssql"
		}
	}
}	

Power BI

The following example shows a Power BI <source ID> configuration file.

Copy code

Copy code

{
	"found_dbname=databasename1;found_hostname=*;found_schema=schema1": {
		"dbname": "mssql-database-name",
		"schema": "mssql-schema-name",
		"dialect": "mssql",
		"collibraSystemName": "mssql-system-name"
	},
	"found_dbname=databasename2;found_hostname=server-name.onmicrosoft.com;found_schema=schema2": {
		"dbname": "oracle-database-name",
		"schema": "oracle-schema-name",
		"dialect": "oracle",
		"collibraSystemName": "oracle-system-name"
	},
	"filters":[ 
		{
		"domainId": "<domain-ref-id>",
		"description": "FirstFilter",
		"workspaceNames": ["*"],
		"excludeWorkspaceIds": ["workspaceC", "workspaceD"]
		},
		{
		"domainId": "<domain-ref-id>",
		"description": "SecondFilter",
		"workspaceNames": ["workspace3", "workspace4"],
		"capacityIds": ["id1","id2"]
		}
	]
}	

Matillion

The following example shows a Matillion <source ID> configuration file.

Copy code

Copy code

{
    "found_dbname=dbtest;found_hostname=test": {
        "collibraSystemName": "mssql-system-name"
    },
    "found_dbname=testsid;found_hostname=*": {
        "dbname": "oracle-database-name",
        "schema": "oracle-schema-name",
        "collibraSystemName": "oracle-system-name"
    }	
}

MicroStrategy

The following example shows a MicroStrategy <source ID> configuration file.

Copy code

{
	"default_domain_id": "66ab7911-d84a-43d6-b9f7-40130jd84ds98",
	"filters": [
		{
			"domainId": "3ae946ec-9030-46b8-ad86-a217b8f99bab",
			"projectIDs": ["3FF7CF7D4C5A2AEBF14D8B9802698B27"]
		},
		{
			"domainId": "21cfb3ae-0616-478e-9060-114130540be5",
			"projectNames": ["Human Resources Analysis Module"]
		}
	],
    "datasourceMapping": [
	 {
	     "found_datasource": "REDSHIFT",
	     "found_project": "*",
	     "mapping": {
		 	"dbname": "RD_pearl",
		  	"schema": "Default_North",
		  	"dialect": "spark",
		  	"collibraSystemName": "TV_dev"
	     }
	 }
    ]
}			

Copy code

The following is an example of a lineage harvester configuration file for US government customers using Power BI. The techlin section contains the information required to connect to the Collibra Data Lineage service instance.

Copy code

{
  "general": {
	"techlin": {
	  "url": "https://techlin-gov.collibra.com",
	  "userKey": "<your-unique-api-key>"
	  },
	"catalog": {
	  "url": "https://catalog-instance.collibra.com",
	  "username": "Admin"
	  },
	"useCollibraSystemName": false
	},
	"sources": [ {
	  "type": "PowerBI",
	  "id": "power-bi-id",
	  "tenantDomain": "collibrapowerbi.onmicrosoft.com",
	  "loginFlow": {
	    "type": "ServicePrincipal",
	    "applicationId": "ab123cde-1234-1234-1234-abcd12e34fg5"
	  },
	  "domainId": "domain-reference-ID",
	  "deleteRawMetadataAfterProcessing": true
	} ]
}