The configuration file generator

The configuration file generator helps you create your lineage harvester configuration file more easily by providing the structure of the file with the correct properties per data source.

The lineage harvester configuration file

The lineage harvester uses a configuration file to connect to data sources, BI tools and ETL tools. The configuration file contains references to the data sources for which you want to create a technical lineage. You have to prepare the configuration file if you want to create a technical lineage and add new relations of the type "Data Element targets / sources Data Element" between existing assets in Data Catalog and "Column is target of / is source of Data Attribute" between assets from ingested BI sources and assets in Data Catalog.

If you use multiple lineage harvesters to create the technical lineage, you can create a configuration file for each lineage harvester and add different data source configurations to each configuration file. Note that multiple configuration files can only be used for testing purposes.

Tip  You have to save the configuration file in the config directory in the lineage harvester folder.

Empty configuration file

When you run the lineage harvester for the first time, it creates an empty configuration file. To create a technical lineage, you have to manually add properties and values, per data source, to this configuration file.

The following image shows an example of the empty configuration file created by the lineage harvester.

{
	"general" : {
		"catalog" : {
			"url" : "",
			"username" : "",
		},
		"useCollibraSystemName" : false
	},
	"sources" : [ {
		"type" : "Database",
		"id" : "MyDB",
		"hostname" : "",
		"username" : "",
		"dialect" : "",
		"collibraSystemName" : "",
		"databaseNames" : [ ],
		"port" : 1521
	} ]
}

Configuration file generator

The configuration file generator creates an example configuration file with the data source properties of your choosing:

  1. Select the (meta)data sources for which you want to create a technical lineage.
  2. Scroll down to the configuration file example.
  3. Click Copy code to copy the example.
    The configuration file example is copied to your clipboard.
  4. Paste the example in your empty configuration file in the lineage harvesterconfig folder.
  5. Replace the values in the example to match your actual data source information.
    Tip Make sure you understand each property and know which values you must use to access your data source information.
  6. Run the lineage harvester.

Warning Some browser plug-ins may slow the configuration file generator down.

Tip 
Use these options to filter the data source properties in the configuration file to your needs.
-+ Azure Data Factory (beta)
-+ IBM InfoSphere DataStage
-+ Informatica PowerCenter
-+ SQL Server Integration Services
-+ Informatica Intelligent Cloud Services - Data Integration
-+ Matillion
-+ Tableau
-+ Looker
-+ MicroStrategy
-+ Power BI (deprecated)
-+ Power BI
-+ SQL Server Reporting Services (SSRS) and Power BI Report Server (PBRS)
-+ Custom lineages
-+ SQL files in the lineage harvester output folder (downloaded SQL files)
-+ Oracle
-+ Google BigQuery
-+ Snowflake
-+ Other SQL data sources with connection type "JDBC":
  • Amazon Redshift
  • Azure SQL Data Warehouse
  • Azure SQL Server
  • Greenplum
  • IBM DB2
  • PostgreSQL
  • Microsoft SQL Server
  • MySQL
  • Netezza
  • SAP HANA
  • Spark SQL
  • Sybase Adaptive Server Enterprise
  • Teradata
-+ SQL directories with connection type "folder":
  • Amazon Redshift
  • Azure SQL Data Warehouse
  • Azure SQL Server
  • Google BigQuery
  • Greenplum
  • HiveQL
  • IBM DB2
  • Oracle
  • PostgreSQL
  • Microsoft SQL Server
  • MySQL
  • Netezza
  • SAP HANA
  • Snowflake
  • Spark SQL
  • Sybase Adaptive Server Enterprise
  • Teradata

Copy code

{
	"general": {	
		"catalog" : {
			"url" : "https://companydomain.collibra.com",
			"username" : "my-Collibra-username"
			},
			"useCollibraSystemName" : false
	},
	"sources" : [ 
	
	{
		"collibraSystemName" : "adf-system-name",
		"id" : "adf_source",
		"type" : "AzureDataFactory",
		"tenantDomain": "tenant-domain",
		"loginFlow": {
		    "type": "ServicePrincipal",
		    "applicationId": "application-id"
		    },
	            "subscriptionId" : "subscription-id",
		    "resourceGroupName" : "resource-group-name"
		    "deleteRawMetadataAfterProcessing": false
	}
	{
		"collibraSystemName" : "datastage-system-name",
		"id" : "datastage_source",
		"type" : "ExternalDirectory",
		"dirType" : "DATASTAGE",
		"path" : "/path/to/the/datastage/folder/",
		"mask" : "*",
		"recursive" : false,
		"deleteRawMetadataAfterProcessing": false
	}
	{
		"collibraSystemName" : "infa-system-name",
		"id" : "informatica_source",
		"type" : "ExternalDirectory",
		"dirType" : "INFA",
		"path" : "/path/to/the/informatica/folder/",
		"mask" : "*",
		"recursive" : false,
		"deleteRawMetadataAfterProcessing": false
	}
	{
		"collibraSystemName" : "ssis-system-name",
		"id" : "datastage_source",
		"type" : "ExternalDirectory",
		"dirType" : "SSIS",
		"path" : "/path/to/the/ssis/folder/",
		"mask" : "*",
		"recursive" : false,
		"deleteRawMetadataAfterProcessing": false
	}
	{
		"type" : "IICS",
		"id" : "iics_source",
		"collibraSystemName" : "iics-development",
		"loginUrl" : "https://dm-us.informaticaintelligentcloud.com",
		"username" : "login-iics",
		"deleteRawMetadataAfterProcessing": false,
		"objects" : [
			{
				"path" : "Default/Sales",
				"type" : "Project"
			},
			{
				"path" : "My Project/Statistics",
				"type" : "Project"
			}
		]
	}
	{
		"id" : "my-matillion-project",
		"type" : "Matillion",
		"url" : "https://my-domain",
		"groupName" : "my-matillion-group",
		"projectName" : "redshift-project",
		"environmentName" : "redshift-environment",
		"dialect" : "redshift",
		"startTimestamp" : 1594080796911,
		"collibraSystemName": "Matillion-system",
		"deleteRawMetadataAfterProcessing": false,
		"auth": {
			"type": "Basic",
			"username": "ec2-user"
		}
	}
	{
		"type": "Tableau",
		"id": "unique-ID",
		"url": "URL to Tableau server?",
		"username": "Admin",
		"siteIds": ["site ID of Tableau Site 1", "site ID of Tableau Site 2"],
		"siteNames": ["site name of Tableau Site 1", "site name of Tableau Site 2"],
		"restOnly": false,
		"domainId": "Domain-resource-ID",
		"excludeImages": true,
		"deleteRawMetadataAfterProcessing": false,
		"paging": {
			"pagination-setting": 100,
			"pagination-setting-2": 100
		}
	}
	{
		"id" : "looker-source",
		"type" : "Looker",
		"lookerUrl" : "https://<instance-name.api.looker.com",
		"clientId" : "my-looker-api-user-name",
		"clientSecret": "looker-api-userkey",
		"domainId" : "22258f64-40b6-4b16-9c08-c95f8ec0da26",
		"deleteRawMetadataAfterProcessing": false	}
	{
		"type" : "ExistingLineage",
		"id" : "MyPowerBISourceID",
		"deleteRawMetadataAfterProcessing": false
	}
	{
		"id": "<unique-id>",
		"type": "SSRS",
		"url": "http://<IP address or computer name>/Reports",
		"username": "<server-api-user-name>",
		"domainId": "<domain-resource-id>",
		"folderFilter":  ["/Folder1/*", "Folder2"],
		"deleteRawMetadataAfterProcessing": false
	}
	{
		"collibraSystemName" : "custom-system-name",
		"id" : "MyCustomLineage",
		"type" : "ExternalDirectory",
		"dirType" : "custom-lineage",
		"path" : "/path/to/custom-lineage/dir",
		"deleteRawMetadataAfterProcessing": false
	}
	{
		"type" : "LoadedSource",
		"id" : "MySource",
		"zipFile" : "/path/to/source-MySource.zip",
		"deleteRawMetadataAfterProcessing": false
	}
	{
		"id" : "database_source",
		"type" : "Database",
		"username" : "MyUsername",
		"dialect" : "hive",
		"databaseNames" : ["MyDefaultDbName"],
		"hostname" : "localhost",
		"collibraSystemName" : "apache-hive-system",
		"port" : 1521,
		"deleteRawMetadataAfterProcessing": false,
		"customConnectionProperties" : ""
	}
	{
		"id": "oracle-id",
		"type": "DatabaseOracle",
		"hostname": "host_url",
		"username": "user1",
		"collibraSystemName": "automation_csn",
		"port": 1521,
		"serviceNames": ["sn1", "sn2"],
		"databaseNames": ["db1", "db2"],
		"deleteRawMetadataAfterProcessing": false
	}
	{			
		"id" : "bigquery_source",
		"type" : "DatabaseBigQuery",
		"projectIDs" : [ "bigquery_project1", "bigquery_project2" ],
		"region": "europe-west1"
		"auth" : "/path/to/the/authentication/file.json",
		"collibraSystemName" : "bigquery-system-name",
		"deleteRawMetadataAfterProcessing": false
	}
	{
		"id" : "snowflake_source",
		"type" : "DatabaseSnowflake",
		"mode" : "SQL|SQL-API",
		"collibraSystemName" : "snowflake-system-name",
		"auth": {
			"type": "KeyPair|Basic",
			"username": "some_username",
			"pathToPrivateKey": "path_to_your_private_key_file",
			"usePassword": "true|false" 
		},
		"hostname" : "MyAccountName.snowflakecomputing.com",
		"databaseNames" : ["MyFirstDbName","MySecondDbName"],
		"warehouse" : "MySnowflakeWarehouseName",
		"deleteRawMetadataAfterProcessing": false,
		"customConnectionProperties" : "role=MYROLE"
					}
	{
		"type": "Microstrategy",
		"id": "microstrategy-batch",
		"domainId": "<domain-resource-id>",
		"username": "mstr",
		"hostname": "remote.postgres.com",
		"port": 5432,
		"databaseName": "poc_metadata",
		"deleteRawMetadataAfterProcessing": false
	}
	{
		"type" : "PowerBI",
		"id" : "power-bi-1",
		"tenantDomain": "collibra3.onmicrosoft.com",
		"loginFlow": {
			"type": "ServicePrincipal",
			"applicationId": "be560fac-7545-4ce2-ad9f-cbce14c59af6"
		},
		"domainId": "domain-reference-ID",
		"deleteRawMetadataAfterProcessing": false
	}
	{
		"id" : "sqldirectory_source",
		"type" : "SqlDirectory",
		"path" : "/path/to/the/sql/folder/",
		"mask" : "*",
		"recursive" : false,
		"dialect" : "db2",
		"database" : "MyDefaultDbName",
		"collibraSystemName" : "data-source-system",
		"schema" : "MyDefaultDbSchema",
		"verbose" : true,
		"deleteRawMetadataAfterProcessing": false
	} ]
}
Tip 

If the useCollibraSystemName property in the lineage harvester configuration file is set to true, you also need a source-specific configuration file. Use these options to only show the <source ID> or connection definition configuration files that you need.

Informatica PowerCenter

The following example shows an Informatica PowerCenter <source ID> configuration file.

Copy code

{
	"connectionDefinitions": {
		"oracle_source": {
			"dbname": "oracle-source-database-name1",
			"schema": "my Oracle source schema",
			"dialect": "oracle"
		},
		"oracle_target": {
			"dbname": "oracle-target-database-name2",
			"schema": "my other oracle target schema",
			"dialect": "oracle"
		}
	},
	"collibraSystemNames": {
		"databases": [
			{
				"dbname": "oracle-source-database-name1",
				"collibraSystemName": "oracle-system-name1"
			},
			{
				"dbname": "oracle-target-database-name2",
				"collibraSystemName": "oracle-system-name2"
			}
		],
		"connections": [
			{
				"connectionName": "oracle-connection-name1",
				"collibraSystemName": "oracle-system-name1"
			},
			{
				"connectionName": "oracle-connection-name2",
				"collibraSystemName": "oracle-system-name2"
			}
		]
	}
}	

SQL Server Integration Services

The following example shows an SQL Server Integration Services connection definitions configuration file.

Copy code

{
  "ConnStringRegExTranslation": {

    "Data Source=dhb-sql-prod;Initial Catalog=SFG_repl_staging;Provider=SQLNCLI11;Integrated Security=SSPI.*": {
      "dbname": "DATAHUB",
      "schema": "DBO",
      "dialect": "mssql",
      "collibraSystemName" : "WAREHOUSE"
    },

    "Server=sb-dhub;User ID=SYS_USER;Initial Catalog=STAGEDB;Port=6306.*": {
      "dbname": "STAGEDB",
      "schema": "STAGE_OWNER",
      "dialect": "sybase",
      "collibraSystemName" : ""
    }

  }
}

IBM InfoSphere DataStage

The following example shows a DataStage connection definitions configuration file.

Copy code

{
  "OdbcDataSources": {
    "oracle-data-source": {
      "dbname": "my-oracle-database",
      "schema": "my-oracle-schema",
      "dialect": "oracle",
      "collibraSystemName": "my-system"
    },
    "mssql-data-source": {
      "dbname": "my-mssql-database",
      "schema": "my-mssql-schema",
      "dialect": "mssql",
      "collibraSystemName": "my-system"
    }
  },
  "NonOdbcConnectors": {
    
    "admin@database-name": {
      "dbname": "my-netezza-database",
      "schema": "my-netezza-schema",
      "dialect": "netezza",
      "collibraSystemName": "my-system"
    },
    "admin@second-database-name": {
      "dbname": "my-second-netezza-database",
      "schema": "my-second-netezza-schema",
      "dialect": "netezza",
      "collibraSystemName": "my-system"
    }
  }   
}

Informatica Intelligent Cloud Services

The following example shows an Informatica Intelligent Cloud Services <source ID> configuration file.

Copy code

{
	"collibraSystemNames": {
		"connections": [
			{
				"connectionName": "DG_con_standby_cmdm_clientors",
				"collibraSystemName": "PUBLIC"
			},
			{
				"connectionName": "DG_con_dev_dg_dgiauser_su",
				"collibraSystemName": "PUBLIC"
			}
		]
	},
	"connectionDefinitions": [
		{
			"connectionName": "DG_con_standby_cmdm_clientors",
			"databaseName": "main",
			"schemaName": "dbo",
			"dialect": "oracle"
		},
		{
			"connectionName": "DG_con_dev_dg_dgiauser_su",
			"databaseName": "main",
			"schemaName": "dbo",
			"dialect": "oracle"
		}
	]
}	

Tableau

The following example shows a Tableau <source ID> configuration file.

Copy code

{
  "collibraSystemNames": {
    "databases": [
      {
        "hostName": "database-hostname",
        "collibraSystemName": "public"
      }
    ],										
    "files": [
      {"filePath": "C:\\ProgramData\\Tableau\\Tableau Server\\data\\files\\sample.xls",
        "collibraSystemName": "sample-files"
      }
    ],
    "connectors": [
      {
        "connectorUrl": "tableau-server-connector-url.com",
        "collibraSystemName": "Oracle-connector"
      }
    ],
    "cloudFiles": [
      {
        "name": "file-name",
        "collibraSystemName": "FILE"
      }
    ]
  },
  "databaseMapping": {
	 "<hostname:port>":"<actual database name>"
  },
  "filters": {
	 "sites":{
	   "site_name":"domain_id"
	 },
        "projects":{
          "site_name2 > project_name2": "domain-reference-id2",
          "site_name3 > project_name3 > subproject_name": "domain-reference-id2"
	 }
   }	
}			

Looker

The following example shows a Looker <source ID> configuration file.

Copy code

{
	"Connections": {
		"connection-object1": {
			"dialect": "mssql",
			"schema": "mssql-schema-name",
			"dbname": "mssql-database-name",
			"collibraSystemName": "mssql-system-name"
		},
		"connection-object2": {
			"dialect": "oracle",
			"schema": "oracle-schema-name",
			"dbname": "oracle-database-name",	
			"collibraSystemName": "oracle-system-name"
		}
	}
	"filters":[
		{
		"domainId":"<reference ID>",
			"description":"any-description",
			"folderNames":["Folder1", "Folder2"]
		},
		{
		"domainId":"<reference ID>",
			"description":"any-description",
			"folderNames":["Folder3", "Folder4"]
		},
		{
		"domainId":"<reference ID>",
			"description":"any-description",
			"folderIds":["123xxxx", "456xxxx"]
		}
		]
	}
}	

SQL Server Reporting Services and Power BI Report Server

The following example shows a SQL Server Reporting Services and Power BI Report Server <source ID> configuration file.

Copy code

{
	"DataSources": {
		"Redshift": {
			"dbname": "redshift-database-name",
			"schema": "redshift-schema-name",
			"dialect": "redshift",
			"collibraSystemName": "redshift-system-name"
		},
		"Oracle": {
			"dbname": "oracle-database-name",
			"schema": "oracle-schema-name",
			"dialect": "oracle",
			"collibraSystemName": "oracle-system-name"
		}
	},
	"CustomDataSources": {
		"/path to report/custom data souce name": {
			"dbname": "mssql-database-name",
			"dialect": "mssql"
		}
	}
}	

Power BI

The following example shows a Power BI <source ID> configuration file.

Copy code

{
	"found_dbname=databasename1;found_hostname=*;found_schema=schema1": {
		"dbname": "mssql-database-name",
		"schema": "mssql-schema-name",
		"dialect": "mssql",
		"collibraSystemName": "mssql-system-name"
	},
	"found_dbname=databasename2;found_hostname=server-name.onmicrosoft.com;found_schema=schema2": {
		"dbname": "oracle-database-name",
		"schema": "oracle-schema-name",
		"dialect": "oracle",
		"collibraSystemName": "oracle-system-name"
	},
	"filters":[ 
		{
		"domainId": "<domain-ref-id>",
		"description": "FirstFilter",
		"workspaceNames": "*",
		"excludeWorkspaceIds": ["workspaceC", "workspaceD"]
		},
		{
		"domainId": "<domain-ref-id>",
		"description": "SecondFilter",
		"workspaceNames": ["workspace3", "workspace4"],
		"capacityIds": ["id1","id2"]
		}
	]
}	

Matillion

The following example shows a Matillion <source ID> configuration file.

Copy code

{
	"collibraSystemNames":
		{
			"sources":[
				{
					"jobName":"<name of job>",
					"collibraSystemName":"<name>"
				},
				{
					"jobName":"<name of job>",
					"collibraSystemName":"<name>"
				}
			],
			"targets":[
				{
					"jobName":"<name of job>",
					"collibraSystemName":"<name>"
				},
				{
					"jobName":"<name of jobv",
					"collibraSystemName":"<name>"
				}
			]
		}
}