Creating a Technical lineage via the lineage harvester

The following table shows which steps you have to take to create a technical lineage for the JDBC data sources and ETL tools by using the lineage harvester.

Tip 

For information on ingesting metadata from the following BI tools and creating a technical lineage via the lineage harvester, see the dedicated sections:

For information on creating custom technical lineage by using the lineage harvester, go to Working with custom technical lineage.

Step

What?

Description

1

Prepare Data Catalog physical data layer

Before you create a technical lineage, you prepare Data Catalog's physical data layer. This is necessary to automatically stitch assets in Data Catalog and the data elements in the data source for which you want to create a technical lineage.

By preparing Data Catalog's physical data layer, you create assets of the following types:

  • System
  • Database
  • Schema
  • Table

Note If you don't prepare the Data Catalog physical data layer, you can still create a technical lineage. However, stitching will not be performed.

2

Set up the lineage harvester

You use the lineage harvester to collect source code from your data sources and create new relations between data elements from your data source and existing assets into Data Catalog.

The lineage harvester runs close to the data source and can harvest transformation logic like SQL scripts and ETL scripts from a specific location, for example a database table or a folder on a file system.

You can download the lineage harvester from the Collibra Community Downloads page.

3 If you want to create technical lineage for Azure Data Factory, complete the tasks in Azure Data Factory prerequisites.

The lineage harvester uses Azure APIs to get the information necessary to build technical lineage from Azure Data Factory. Use the Azure Data Factory prerequisites topic to register Azure Data Factory in the Azure Portal and assign the necessary permissions and access.

4

Prepare the configuration file

Prepare a configuration file to determine for which data sources you want to create a technical lineage. The lineage harvester uses the configuration file to extract information from data sources for which you want to create a technical lineage.

Tip Use the configuration file generator to create an example configuration file with the properties of your choosing. You can copy this example to your configuration file and replace the values of the properties to match your data source information.

When you have prepared the configuration file, you can use specific commands to perform different actions on the data sources that are defined in your configuration file.

For example, you use the full-sync command to upload the source code from the data sources in the configuration file to the Collibra, where they are analyzed and processed and where the technical lineage is created.

Tip 
  • If you want to use SQL files from a previously loaded data source, you have to download the SQL files of a data source to the lineage harvester.
  • If you want to use a data source in an external directory, for example Informatica PowerCenter, SQL Server Integration Services or IBM InfoSphere DataStage, you have to prepare the external directory folder.
  • If you want to use a JSON file to create a custom technical lineage, you have to prepare the JSON file.
5 Run the lineage harvester.

After you prepared the lineage harvester configuration file, you can run the lineage harvester.

6 View the technical lineage.

After you created the technical lineage, you can go to a Power BI Column, Looker Look, Column or Table asset page and click the Technical lineage tab to view the technical lineage.

You can use the Browse tab pane to search for different data objects and trace their dependencies or use the Settings tab pane to edit or export the technical lineage and see the logs created by the lineage harvester.