About technical lineage

Technical lineage is a detailed lineage graph that shows how data transforms and flows from source to destination across its entire lifecycle. It enables you to easily discover where tables and columns are used and how they relate to each other. You can view a technical lineage for the following asset types:

  • Table
  • Column
  • Power BI Report
  • Power BI Table
  • Power BI Column
  • SSRS Report
  • SSRS Column
  • Tableau Worksheet
  • Tableau Data Attribute
  • Looker Look

During the technical lineage process, relations of the type "Data Element targets / sources Data Element" are automatically created:

  • Between data objects in your data source and assets from registered data sources.
  • Between ingested assets from BI sources and Data Catalog assets from registered data sources.

Tip For detailed information on how a technical lineage is created, including how the lineage harvester interacts with your data sources and the Collibra Data Lineage service, and the interaction between the Collibra Data Lineage service and Data Catalog, see the Typical workflow section, in About the lineage harvester.

Steps to create a technical lineage

The following table shows which steps you have to take to create a technical lineage and which prerequisites you need to execute each step.

Step

What?

Description

Prerequisites

1

Prepare Data Catalog physical data layer

Before you create a technical lineage, you prepare Data Catalog's physical data layer. This is necessary to automatically stitch assets in Data Catalog and the data elements in the data source for which you want to create a technical lineage.

By preparing Data Catalog's physical data layer, you create assets of the following types:

  • System
  • Database
  • Schema
  • Table

Note If you don't prepare the Data Catalog physical data layer, you can still create a technical lineage. However, stitching will not be performed.

  • You have a global role with the Catalog global permission, for example Catalog Author.
  • You have a resource role with the following resource permissions:
    • Asset: Add
    • Attribute: Add
    • Domain: Add
    • Attachment: Add

2

Set up the lineage harvester

You use the lineage harvester to collect source code from your data sources and create new relations between data elements from your data source and existing assets into Data Catalog.

You can download the lineage harvester from the Collibra Community Downloads page.

  • Java Runtime Environment version 11 or newer or OpenJDK 11 or newer.
  • You have purchased Collibra Data Lineage.
  • You have Collibra Data Intelligence Cloud 5.7.3 or newer.
  • Your environment meets the hardware requirements to install and use the lineage harvester.
  • You have added Firewall rules so that the lineage harvester can connect to:
    • The host names of all databases in the lineage harvester configuration file.
    • All Collibra Data Lineage service instances within your geographical location:
      • 15.222.200.199 (techlin-aws-ca.collibra.com)
      • 18.198.89.106 (techlin-aws-eu.collibra.com)
      • 54.242.194.190 (techlin-aws-us.collibra.com)
      • 51.105.241.132 (techlin-azure-eu.collibra.com)
      • 20.102.44.39 (techlin-azure-us.collibra.com)
      • 35.197.182.41 (techlin-gcp-au.collibra.com)
      • 34.152.20.240 (techlin-gcp-ca.collibra.com)
      • 35.205.146.124 (techlin-gcp-eu.collibra.com)
      • 34.87.122.60 (techlin-gcp-sg.collibra.com)
      • 35.234.130.150 (techlin-gcp-uk.collibra.com)
      • 34.73.33.120 (techlin-gcp-us.collibra.com)

      Note The lineage harvester connects to different instances based on your geographic location and cloud provider. If your location or cloud provider changes, the lineage harvester rescans all your data sources. You have to whitelist all Collibra Data Lineage service instances in your geographic location. In addition, we highly recommend that you always whitelist the techlin-aws-us instance as a backup, in case the lineage harvester cannot connect to other Collibra Data Lineage service instances.

3

Prepare the configuration file

You create a configuration file to determine for which data sources you want to create a technical lineage. The configuration file is used by the lineage harvester to extract information from data sources for which you want to create a technical lineage.

Tip You can use the configuration file generator to create an example configuration file with the properties of your choosing. You can easily copy this example to your configuration file and replace the values of the properties to match your data source information.

When you have created a configuration file, you can use specific commands to perform different actions on the data sources that are defined in your configuration file.

For example, you use the full-sync command to upload the source code from the data sources in the configuration file to the Collibra Data Intelligence Cloud, where they are analyzed and processed and where the technical lineage is created.

Tip 
  • If you want to use SQL files from a previously loaded data source, you have to download the SQL files of a data source to the lineage harvester.
  • If you want to use a data source in an external directory, for example Informatica PowerCenter, SQL Server Integration Services or IBM InfoSphere DataStage, you have to prepare the external directory folder.
  • If you want to use a JSON file to create a custom technical lineage, you have to prepare the JSON file.
  • A global role with the following global permissions:
    • Data Stewardship Manager
    • Manage all resources
    • System administration
    • Technical lineage
  • A resource role with the following resource permission on the community level in which you created the BI Data Catalog domain:
    • Asset: add
    • Attribute: add
    • Domain: add
    • Attachment: add
  • Necessary permissions to all database objects that the lineage harvester accesses.
    Tip 

    Some data sources require specific permissions.

    Ensure that you meet the Azure Data Factory prerequisites.

    You need read access on the SYS schema.

    You need read access on the SYS schema and the View Definition Permission in your SQL Server.

    You need read access on information_schema:

    • bigquery.datasets.get
    • bigquery.tables.get
    • bigquery.tables.list
    • bigquery.jobs.create
    • bigquery.routines.get
    • bigquery.routines.list

    GRANT SELECT, at table level. Grant this to every table for which you want to create a technical lineage.

    The role of the user that you specify in the username property in lineage harvester configuration file must be the owner of the views in PostgreSQL.

    You need read access on information_schema. Only views that you own are processed.

    SELECT, at table level. Grant this to every table for which you want to create a technical lineage.

    A role with the LOGIN option.

    SELECT WITH GRANT OPTION, at Table level.

    CONNECT ON DATABASE

    You need a role that can access the Snowflake shared read-only database. To access the shared database, the account administrator must grant the IMPORTED PRIVILEGES privilege on the shared database to the user that runs the lineage harvester.

    Tip If the default role in Snowflake does not have the IMPORTED PRIVILEGES privilege, you can use the customConnectionProperties property in the lineage harvester configuration file to assign the appropriate role to the user. For example:
    "customConnectionProperties": "role=METADATA"

    You need read access on the DBC.

    You need read access to the following dictionary views:

    • all_tab_cols
    • all_col_comments
    • all_objects
    • ALL_DB_LINKS
    • all_mviews
    • all_source
    • all_synonyms
    • all_views

    You need read access on definition_schema.

    You need Admin permission on all objects that you want to harvest.

    You have added the Matillion certificate to a Java truststore.

    You have at least a Matillion Enterprise license.

4 Run the lineage harvester.

After you prepared the lineage harvester configuration file, you can run the lineage harvester.

Not applicable.
5 View the technical lineage.

After you created the technical lineage, you can go to a Power BI Column, Looker Look, Column or Table asset page and click the Technical lineage tab to view the technical lineage.

You can use the Browse tab pane to search for different data objects and trace their dependencies or use the Settings tab pane to edit or export the technical lineage and see the logs created by the lineage harvester.

Tip For information on ingesting metadata from the following BI tools and creating a technical lineage, see the dedicated sections:

Data objects

You can see two types of data objects in your technical lineage:

  • Data objects from your data source that are stitched to assets in Data Catalog and for which you created the technical lineage. These assets have a yellow background.
    Example 

  • Other objects, for example temporary tables and columns, that the lineage harvester collects from your data sources, but are not stitched to assets in Data Catalog. These objects have a gray background.
    Example 

Note We do not support stitching for Looker or MicroStrategy assets.

Naming convention

When you create a technical lineage, Data Catalog follows a strict naming convention for the full names of assets. Each asset has a display name and full name. You can freely edit the display name. However, you should never edit the full name, because Data Catalog needs it to refresh data sources for which you created the technical lineage and to refresh the technical lineage itself.

When you prepare the Data Catalog physical data layer and the configuration file, you should always use the full name as the name of the corresponding data object in your data source for the following assets:

  • System
  • Database
  • Schema

Note If you want to create a technical lineage for a Google BigQuery database, the project name in the configuration file must be the same as the full name of the Database asset.

Warning Editing the full name of the Schema, Database and System assets may lead to errors during the technical lineage creation process.