Lineage harvester configuration file for the custom technical lineage

The lineage harvester uses this lineage harvester configuration file to extract data from the metadata of the data sources that you want to process.

When you run the lineage harvester for the first time, it creates an empty lineage harvester configuration file. You can manually add properties and values to the configuration file.

If you want to create the technical lineage for multiple data sources, use the configuration file generator to create an example configuration file with different data sources, and update the example to match your data source information.

Requirements and restrictions

  • In the configuration file, you must use UTF-8 or ISO-8859-1 characters, with the exception of SQL files, which can only be UTF-8 encoded.
  • Comments in the lineage harvester configuration file are not supported.
  • Technical lineage supports the username and password authentication method for the custom technical lineage.

Format

{
	"general" : {
		"catalog" : {
			"url" : "",
			"username" : "",
		},
		"useCollibraSystemName" : false|ture
	},
	"sources" : [ {
		"type" : "ExternalDirectory",
		"id" : "",
		"dirType" : "custom-lineage",
		"collibraSystemName" : "",
		"path" : "",
		"deleteRawMetadataAfterProcessing": false|true
	} ]
}

Properties

Description
general

Describes the connection between Collibra Data Lineage and Data Catalog.

catalog

Contains information that is necessary to connect to Data Catalog.

Note Versions of the lineage harvester older than 1.1.2 show collibra instead of catalog.

url

The URL of your Collibra environment.

Specify the public URL of your Collibra environment. Other URLs are not accepted.

username

The username that you use to sign in Collibra.

useCollibraSystemName
The lineage harvester ignores this property for custom technical lineage.

To use the system or server name of your data source to match the System asset in Data Catalog, specify the system data object in the tree and lineage sections in the custom technical lineage JSON file.

If you do not want to use the system or server name of your data source to match the System asset in Data Catalog, ensure that you do not add the system data object in the custom technical lineage JSON file.

sources

Contains the required information to retrieve a custom lineage. Use this property to locate the JSON file that defines the custom technical lineage.

If you want to create the technical lineage for multiple data sources, create a sources section for each data source.

type

The kind of data source. The value must be ExternalDirectory.

id

The unique ID of your custom technical lineage. This property identifies the metadata that the lineage harvester processes.

Specify this property with an unique string, for example, MyCustomLineage.

dirType

The type of external directory. The value is custom-lineage.

collibraSystemName
The lineage harvester ignores this property for custom technical lineage.

To use the system or server name of your data source to match the System asset in Data Catalog, specify the system data object in the tree and lineage sections in the custom technical lineage JSON file.

If you do not want to use the system or server name of your data source to match the System asset in Data Catalog, ensure that you do not add the system data object in the custom technical lineage JSON file.

path

The full path to the folder of the custom technical lineage JSON file, for example C:\path\to\custom-lineage\dir.

There must be only one JSON file that defines the lineage, and the JSON file must be named lineage.json. You can, however, add other files in the harvested directory and subdirectories and refer to those files from within the JSON file.

deleteRawMetadataAfterProcessing

The lineage harvester harvests raw metadata from specified data sources and uploads it in a ZIP file to a Collibra Data Lineage service instance, for processing.

You can use this optional property to specify whether or not the raw metadata should be deleted from Collibra Data Lineage service instance after the metadata that is targeted for ingestion in Data Catalog is processed.

The default value is false.

If the property is set to true, the raw source metadata is deleted after processing. If set to false, it is stored in the Collibra infrastructure.

Note Setting this property to true can negatively impact performance.

Example

{
	"general" : {
		"catalog" : {
			"url" : "https://companydomain.collibra.com",
			"username" : "my-Collibra-username",
		},
		"useCollibraSystemName" : false
	},
	"sources" : [{
           "id": "MyCustomLineage",
			"type": "ExternalDirectory",
			"dirType": "custom-lineage",
			"path”: “/path/to/custom-lineage/dir/",
			"collibraSystemName": "MySystemName"
			}
			]
}