The configuration file generator

The configuration file generator helps you create your lineage harvester configuration file more easily by providing the structure of the file with the correct properties per data source.

The lineage harvester configuration file

The lineage harvester uses a configuration file when it connects to Data Catalog via Collibra REST API. The configuration file contains references to the data sources for which you want to create a technical lineage. You have to prepare the configuration file if you want to create a technical lineage and add new relations of the type "Data Element targets / sources Data Element" between existing assets in Data Catalog and "Column is target of / is source of Data Attribute" between assets from ingested BI sources and assets in Data Catalog.

Tip  You have to save the configuration file in the config directory in the lineage harvester folder.

Empty configuration file

When you run the lineage harvester for the first time, it creates an empty configuration file. To create a technical lineage, you have to manually add properties and values, per data source, to this configuration file.

The following image shows an example of the empty configuration file created by the lineage harvester.

{
	"general" : {
		"catalog" : {
			"url" : "",
			"username" : "",
		},
		"useCollibraSystemName" : false
	},
	"sources" : [ {
		"type" : "Database",
		"id" : "MyDB",
		"hostname" : "",
		"username" : "",
		"dialect" : "",
		"collibraSystemName" : "",
		"databaseNames" : [ ],
		"port" : 1521
	} ]
}

Configuration file generator

The configuration file generator creates an example configuration file with the data source properties of your choosing:

  1. Select the (meta)data sources for which you want to create a technical lineage.
  2. Scroll down to the configuration file example.
  3. Click Copy code to copy the example.
    The configuration file example is copied to your clipboard.
  4. Paste the example in your empty configuration file in the lineage harvesterconfig folder.
  5. Replace the values in the example to match your actual data source information.
    Tip Make sure you understand each property and know which values you must use to access your data source information.
  6. Run the lineage harvester.

Warning Some browser plug-ins may slow the configuration file generator down.

Tip 
Use these options to filter the data source properties in the configuration file to your needs.
-+ IBM InfoSphere DataStage.
-+ Informatica PowerCenter.
-+ SQL Server Integration Services.
-+ Informatica Intelligent Cloud Services - Data Integration.
-+ Looker.
-+ Power BI.
-+ Custom lineages.
-+ SQL files in the lineage harvester output folder (downloaded SQL files).
-+ Oracle.
-+ Google BigQuery.
-+ Snowflake.
-+ Other SQL data sources with connection type "JDBC":
  • Amazon Redshift
  • Azure SQL Server
  • Greenplum
  • IBM DB2
  • PostgreSQL
  • Microsoft SQL Server
  • MySQL
  • Netezza
  • SAP Hana
  • Spark SQL
  • Sybase Adaptive Server Enterprise
  • Teradata
-+ SQL directories with connection type "folder":
  • Amazon Redshift
  • Azure SQL Server
  • Google BigQuery
  • Greenplum
  • HiveQL
  • IBM DB2
  • Oracle
  • PostgreSQL
  • Microsoft SQL Server
  • MySQL
  • Netezza
  • SAP Hana
  • Snowflake
  • Spark SQL
  • Sybase Adaptive Server Enterprise
  • Teradata

Copy code

{
	"general": {	
		"catalog" : {
			"url" : "https://companydomain.collibra.com",
			"username" : "my-Collibra-username"
			},
			"useCollibraSystemName" : false
	},
	"sources" : [ 
	
	{
		"collibraSystemName" : "datastage-system-name",
		"id" : "datastage_source",
		"type" : "ExternalDirectory",
		"dirType" : "DATASTAGE",
		"path" : "/path/to/the/datastage/folder/",
		"mask" : "*",
		"recursive" : false
	}
	{
		"collibraSystemName" : "infa-system-name",
		"id" : "informatica_source",
		"type" : "ExternalDirectory",
		"dirType" : "INFA",
		"path" : "/path/to/the/informatica/folder/",
		"mask" : "*",
		"recursive" : false
	}
	{
		"collibraSystemName" : "ssis-system-name",
		"id" : "datastage_source",
		"type" : "ExternalDirectory",
		"dirType" : "SSIS",
		"path" : "/path/to/the/ssis/folder/",
		"mask" : "*",
		"recursive" : false
	}
	{
		"type" : "IICS",
		"id" : "iics_source",
		"collibraSystemName" : "iics-development",
		"loginUrl" : "https://dm-us.informaticaintelligentcloud.com",
		"username" : "login-iics"
		"objects" : [
			{
				"path" : "Default/Sales",
				"type" : "Project"
			},
			{
				"path" : "My Project/Statistics",
				"type" : "Project"
			}
		]
	}
	{
		"collibraSystemName" : "looker",
		"id" : "looker-source",
		"type" : "Looker",
		"lookerUrl" : "https://<instance-name.api.looker.com",
		"clientId" : "my-looker-api-user-name",
		"domainId" : "22258f64-40b6-4b16-9c08-c95f8ec0da26"	
	}
	{
		"type" : "ExistingLineage",
		"id" : "MyPowerBISourceID"
	}
	{
		"collibraSystemName" : "custom-system-name",
		"id" : "MyCustomLineage",
		"type" : "ExternalDirectory",
		"dirType" : "custom-lineage",
		"path" : "/path/to/custom-lineage/dir/file.json"
	}
	{
		"type" : "LoadedSource",
		"id" : "MySource",
		"zipFile" : "/path/to/source-MySource.zip"
	}
	{
		"id" : "database_source",
		"type" : "Database",
		"username" : "MyUsername",
		"dialect" : "hive",
		"databaseNames" : ["MyDefaultDbName"],
		"hostname" : "localhost",
		"collibraSystemName" : "apache-hive-system",
		"port" : 1521,
		"customConnectionProperties" : ""
	}
	{
		"id" : "oracle_source",
		"type" : "Database",
		"username" : "MyUsername",
		"dialect" : "oracle",
		"databaseNames" : ["oracle-service-name"],
		"connectAsServiceName" : true,
		"hostname" : "localhost",
		"collibraSystemName" : "oracle-system-name",
		"port" : 1521
	}
	{			
		"id" : "bigquery_source",
		"type" : "DatabaseBigQuery",
		"projectIDs" : [ "bigquery_project1", "bigquery_project2" ],
		"region": "europe-west1"
		"auth" : "/path/to/the/authentication/file.json",
		"collibraSystemName" : "bigquery-system-name"
	}
	{
		"id" : "snowflake_source",
		"type" : "DatabaseSnowflake",
		"username" : "MyUsername",
		"hostname" : "MyAccountName.snowflakecomputing.com",
		"collibraSystemName" : "snowflake-system-name",
		"databaseNames" : ["MyFirstDbName","MySecondDbName"],
		"warehouse" : "MySnowflakeWarehouseName",
		"customConnectionProperties" : ""
	}
	{
		"id" : "sqldirectory_source",
		"type" : "SqlDirectory",
		"path" : "/path/to/the/sql/folder/",
		"mask" : "*",
		"recursive" : false,
		"dialect" : "db2",
		"database" : "MyDefaultDbName",
		"collibraSystemName" : "data-source-system",
		"schema" : "MyDefaultDbSchema",
		"verbose" : true
	} ]
}
Tip 

If the useCollibraSystemName in the lineage harvester configuration file is set to true, you also need a source-specific configuration file. Use these options to only show the <source ID> or connection definition configuration files that you need.

Important If you want to ingest Power BI in Data Catalog you need both the Power BI harvester and the lineage harvester. You can find more information about the Power BI harvester configuration file and Power BI source ID configuration file in the Power BI section of the documentation.

Informatica PowerCenter

The following example shows an Informatica PowerCenter <source ID> configuration file.

Copy code

{
	"connectionDefinitions": {
		"oracle_source": {
			"dbname": "my Oracle source database",
			"schema": "my Oracle source schema",
			"dialect": "oracle"
		},
		"oracle_target": {
			"dbname": "my other oracle target database",
			"schema": "my other oracle target schema",
			"dialect": "oracle"
		}
	},
	"collibraSystemNames": {
		"databases": [
			{
				"dbname": "oracle-database-name1",
				"collibraSystemName": "oracle-system-name1"
			},
			{
				"dbname": "oracle-database-name2",
				"collibraSystemName": "oracle-system-name2"
			}
		],
		"connections": [
			{
				"connectionName": "oracle-connection-name1",
				"collibraSystemName": "oracle-system-name1"
			},
			{
				"connectionName": "oracle-connection-name2",
				"collibraSystemName": "oracle-system-name2"
			}
		]
	}
}	

SQL Server Integration Services

The following example shows an SQL Server Integration Services connection definitions configuration file.

Copy code

{
  "ConnStringRegExTranslation": {

    "Data Source=dhb-sql-prod;Initial Catalog=SFG_repl_staging;Provider=SQLNCLI11;Integrated Security=SSPI.*": {
      "dbname": "DATAHUB",
      "schema": "DBO",
      "dialect": "mssql",
      "collibraSystemName" : "WAREHOUSE"
    },

    "Server=sb-dhub;User ID=SYS_USER;Initial Catalog=STAGEDB;Port=6306.*": {
      "dbname": "STAGEDB",
      "schema": "STAGE_OWNER",
      "dialect": "sybase",
      "collibraSystemName" : ""
    }

  }
}

IBM InfoSphere DataStage

The following example shows a DataStage connection definitions configuration file.

Copy code

{
  "OdbcDataSources": {
    "oracle-data-source": {
      "dbname": "my-oracle-database",
      "schema": "my-oracle-schema",
      "dialect": "oracle",
      "collibraSystemName": "my-system"
    },
    "mssql-data-source": {
      "dbname": "my-mssql-database",
      "schema": "my-mssql-schema",
      "dialect": "mssql",
      "collibraSystemName": "my-system"
    }
  },
  "NonOdbcConnectors": {
    
    "admin@database-name": {
      "dbname": "my-netezza-database",
      "schema": "my-netezza-schema",
      "dialect": "netezza",
      "collibraSystemName": "my-system"
    },
    "admin@second-database-name": {
      "dbname": "my-second-netezza-database",
      "schema": "my-second-netezza-schema",
      "dialect": "netezza",
      "collibraSystemName": "my-system"
    }
  }   
}

Informatica Intelligent Cloud Services

The following example shows an Informatica Intelligent Cloud Services <source ID> configuration file.

Copy code

{
	"collibraSystemNames": {
		"connections": [
			{
				"connectionName": "connection-name1",
				"collibraSystemName": "system-name1"
			}
		]
	},
	"connectionDefinitions": [
		{
			"connectionName": "connection-name1",
			"databaseName": "oracle-db",
			"schemaName": "oracle-schema",
			"dialect": "oracle"
		}
	]
}	

Looker

The following example shows an Looker <source ID> configuration file.

Copy code

{
	"Connections": {
		"connection-object1": {
			"dialect": "mssql",
			"schema": "mssql-schema-name",
			"dbname": "mssql-database-name",
			"collibraSystemName": "mssql-system-name"
		},
		"connection-object2": {
			"dialect": "oracle",
			"schema": "oracle-schema-name",
			"dbname": "oracle-database-name",	
			"collibraSystemName": "oracle-system-name"
		}
	}
}