About technical lineage

Technical lineage is a detailed lineage graph that shows how data transforms and flows from source to destination across its entire lifecycle. It enables you to easily discover where tables and columns are used and how they relate to each other. You use it to visualize dependencies between Table assets, Column assets, Power BI Column assets, Looker Look assets and other data objects.

During the technical lineage process, relations of the type "Data Element targets / sources Data Element" are automatically created:

  • Between data objects in your data source and assets from registered data sources.
  • Between ingested assets from BI sources and Data Catalog assets from registered data sources.

Tip If you want to ingest and create a technical lineage for Looker or Power BI, we highly advise you to read the dedicated sections.

Steps to create a technical lineage

The following table shows which steps you have to take to create a technical lineage and which prerequisites you need to execute each step.

Step

What?

Description

Prerequisites

1

Prepare Data Catalog physical data layer

Before you create a technical lineage, you prepare Data Catalog's physical data layer. This is necessary to automatically stitch assets in Data Catalog and the data elements in the data source for which you want to create a technical lineage.

By preparing Data Catalog's physical data layer, you create assets of the following types:

  • System
  • Database
  • Schema
  • Table

Note If you don't prepare the Data Catalog physical data layer, you can still create a technical lineage. However, stitching will not be performed.

  • Catalog Experience is enabled in Collibra Console.
  • You have a global role with the Catalog global permission, for example Catalog Author.
  • You have a resource role with the following resource permissions:
    • Asset: Add
    • Attribute: Add
    • Domain: Add
    • Attachment: Add

2

Set up the lineage harvester

You use the lineage harvester to collect source code from your data sources and create new relations between data elements from your data source and existing assets into Data Catalog.

You can download the lineage harvester from the Collibra Community Downloads page.

  • You have installed Java Runtime Environment version 8 or newer.
  • You have purchased Collibra Data Lineage.
  • You have Collibra Data Intelligence Cloud 5.7.3 or newer.
  • Your environment meets the hardware requirements to install and use the lineage harvester.
  • You have added Firewall rules so that the lineage harvester can connect to:
    • All Collibra Data Lineage servers within your geographical location:
      • 18.198.89.106 (techlin-aws-eu)
      • 54.242.194.190 (techlin-aws-us)
      • 15.222.200.199 (techlin-aws-ca)
      • 35.205.146.124 (techlin-gcp-eu)
      • 34.73.33.120 (techlin-gcp-us)
      • 35.197.182.41 (techlin-gcp-au)
    • The host names of all databases in the lineage harvester configuration file.

3

Prepare the configuration file

You create a configuration file to determine for which data sources you want to create a technical lineage. The configuration file is used by the lineage harvester to extract information from data sources for which you want to create a technical lineage.

Tip You can use the configuration file generator to create an example configuration file with the properties of your choosing. You can easily copy this example to your configuration file and replace the values of the properties to match your data source information.

When you have created a configuration file, you can use specific commands to perform different actions on the data sources that are defined in your configuration file.

For example, you use the full-sync command to upload the source code from the data sources in the configuration file to the Collibra Data Intelligence Cloud, where they are analyzed and processed and where the technical lineage is created.

Tip 
  • If you want to use SQL files from a previously loaded data source, you have to download the SQL files of a data source to the lineage harvester.
  • If you want to use a data source in an external directory, for example Informatica PowerCenter, SQL Server Integration Services or IBM InfoSphere DataStage, you have to prepare the external directory folder.
  • If you want to use a JSON file to create a custom technical lineage, you have to prepare the JSON file.
  • You have a global role that has the Manage all resources global permission.
  • You have a global role with the Catalog global permission, for example Catalog Author.
  • You have the Technical lineage global permission.
  • The lineage harvester is able to access all data sources in the configuration file.
  • You have the necessary permissions to all database objects that the lineage harvester accesses.
4 View the technical lineage.

After you created the technical lineage, you can go to a Power BI Column, Looker Look, Column or Table asset page and click the Technical lineage tab to view the technical lineage.

You can use the Browse tab pane to search for different data objects and trace their dependencies or use the Settings tab pane to edit or export the technical lineage and see the logs created by the lineage harvester.

Data objects

You can see two types of data objects in your technical lineage:

  • Data objects from your data source that are stitched to assets in Data Catalog and for which you created the technical lineage. These assets have a yellow background.
    Example 

  • Other objects, for example temporary tables and columns, that the lineage harvester collects from your data sources, but are not stitched to assets in Data Catalog. These objects have a gray background.
    Example 

Warning We do not support stitching for Looker assets. We do support stitching for Power BI assets, but the stitched assets still have a gray background. This is a known issue.

Naming convention

When you create a technical lineage, Data Catalog follows a strict naming convention for the full names of assets. Each asset has a display name and full name. You can freely edit the display name. However, you should never edit the full name, because Data Catalog needs it to refresh data sources for which you created the technical lineage and to refresh the technical lineage itself.

When you prepare the Data Catalog physical data layer and the configuration file, you should always use the full name as the name of the corresponding data object in your data source for the following assets:

  • Schema
  • Database
  • System

Note If you want to create a technical lineage for a Google BigQuery database, the project name in the configuration file must be the same as the full name of the Database asset.

Warning Editing the full name of the Schema, Database and System assets may lead to errors during the technical lineage creation process.