The lineage harvester

You use the lineage harvester to collect source code from your data sources and create new relations between data elements from your data source and existing assets into Data Catalog.

The lineage harvester runs close to the data source and can harvest transformation logic like SQL scripts and ETL scripts from a specific location, for example a database table or a folder on a file system.

The lineage harvester connects to different Collibra Data Lineage service instances based on your geographical location and cloud provider. Ensure you have the correct system requirements before you run the lineage harvester. If your location or cloud provider changes, the lineage harvester re-harvests all your data sources.

Note Technical lineage is created by a cloud-based service. You only connect to the cloud via an API call that is triggered by the lineage harvester.

This information includes the following sections:

For details about the typical workflow of creating technical lineage by using the lineage harvester or technical lineage via Edge, go to Technical lineage typical workflow.

The lineage harvester configuration file

The lineage harvester uses a configuration file to connect to JDBC data sources, BI tools and ETL tools. The configuration file contains references to the data sources for which you want to create a technical lineage. You have to prepare the configuration file if you want to create a technical lineage and add new relations of the type "Data Element targets / sources Data Element" between existing assets in Data Catalog, and "Column is target of / is source of Data Attribute" between assets from ingested BI sources and assets in Data Catalog.

Warning You can only use UTF-8 or ISO-8859-1 characters in all lineage harvester files.

The lineage harvester components

The lineage harvester consists of components that harvest the metadata from the data sources specified in your configuration file and send their metadata to the Collibra Data Lineage service.

Using the lineage harvester

If you want to separately process data sources on different servers, you can use more than one lineage harvester connected to a single Collibra Data Intelligence Platform instance. In this case, you can create a configuration file for the lineage harvester on each server and configure different data sources in each configuration file.

Note You can use different command options and arguments to perform various actions with the lineage harvester.

Permissions

You need a global role with the System Administration global permission, for example Sysadmin. This role must have access to all assets in the data sources in the configuration file and be able to create new relations between these assets.