Collibra Data Lineage

In this topic, we addresses the following:

Tip Check out our free Business and Technical Lineage training course in Collibra University.

What is Collibra Data Lineage?

Collibra Data Lineage is a cloud-only product that allows you to trace data from its source system, across the various contact points of your data landscape, to its final destination system.

Ultimately, our objective is to help you establish trust in your reports and use the data to make sound business decisions.

Collibra Data Lineage consists of two components:

  • Technical lineage
  • Business lineage

The value of these components are the same, but they are designed for different audiences.

Technical lineage

  • Designed for Data Engineers, Data Architects, and other technical stewards.
  • A detailed lineage graph that provides complete end-to-end lineage, to visualize the journey of the data objects in your external data sources.
  • Allows you to explore data objects, including temporary tables and columns, in your external data sources. You don't need to register data sources in Collibra to include them in a technical lineage.
    Tip We use the term "data objects" when referring to columns and tables in your external data sources. We use the term "assets" (specifically Column assets and Table assets) when referring to the representation of data objects in Collibra.
  • Includes all source code and data transformation details.
  • Shows you in which system data objects are used and how they are transformed from data source to data source.
  • Automatically created as part of the technical lineage process.

Business lineage

  • Designed for Analysts, Governance roles, and other business stewards.
  • Shows the relations between assets in Collibra that represent the data objects in your external data sources.
  • Refers specifically to the relation type "Data Element targets / sources Data Element" that is drawn between Column assets.
    Note During the ingestion process, relations of the type "Data Element targets / sources Data Element" are automatically created between certain assets. Any relations of this type that you manually create between assets will be deleted during the synchronization process. If you want to manually create such relations and ensure that they are maintained, you can create a custom technical lineage.
  • Shows how registered data sources relate to each another.
    Tip Registering a data source means creating assets (and the relations between the assets) in Collibra that represent the data objects in your external data sources.
  • Automatically created as part of the technical lineage process.

Tip The main difference between a technical lineage and a business lineage:
  • Technical lineage identifies data objects in your external data sources.
  • Business lineage show assets in Collibra that represent some or all of those data objects.
Example 

Let's say that you have created a technical lineage for four different databases:

  • The first database, Oracle, is not registered in Collibra, therefore there are no assets in Data Catalogthat represent the Oracle data objects.
  • The second database, Raw, is registered in Collibra.
    • The yellow background of the first node indicates that Table and Column assets that were created in Data Catalog are stitched to their corresponding data objects in the Raw database.
    • The other node, the one with the gray background, is a temporary table. No assets are created for temporary data objects and so stitching is not relevant. That is why the node has a gray background.
  • The third and fourth databases, Refined and Consumption, are ingested in Collibra. The assets that were created in Data Catalog are stitched to their corresponding data objects in the two databases.

What we what to point out here is that Technical lineage shows the data flow of all data objects across all four databases, regardless of any assets in Collibra.

The corresponding business lineage shows only the relations between data objects that have corresponding assets in Data Catalog. In the following image, we see the data flow of assets from the second database, to the third, to the fourth. The first database, Oracle, which is not registered in Collibra, and , is not shown on the diagram.

For more information on the differences between these two components, go to Differences between technical lineage and business lineage.

For a complete list of supported data sources, go to Supported data sources for technical lineage. If you want to create a technical lineage for a data source that is not currently supported, you can create a Custom technical lineage.

BI tool integration

Business intelligence software helps organizations to collect data from the various data sources across their data ecosystem and present the data in interactive dashboards and reports, to facilitate decision-making and strategic planning.

When you integrate your BI tool in Collibra:

  • Metadata about the data objects in your external data sources is created as BI assets in Collibra.
  • Relations are created:
    • Between data objects in your external data source and assets in Collibra that represent those data objects.
      Tip These assets are created when the data source is registered, which is automatically carried out during the technical lineage process.
    • Between BI assets and the assets in Collibra that represent the data objects in your external data source.
  • technical lineage and business lineage are automatically created.

Business value

Collibra Data Lineage has many important use cases. Here are a few.

How do I create a technical lineage?

There are two ways to create technical lineage and business lineage:

The typical workflow for creating a technical lineage is the same whether you use the lineage harvester or Edge. If you want to use Edge and the lineage harvester together, you must use lineage harvester version 2023.04 or newer. If you want to maintain on Edge the technical lineage that you created by using the lineage harvester, you can add technical lineage capabilities for the data sources with the same source IDs. For details, go to Migrate the technical lineage of a data source.

For details about the typical workflow, go to Technical lineage typical workflow.

Edge

You can create a technical lineage and business lineage via Edge, for Tableau, Power BI and all supported JDBC and ETL data sources. Benefits include:

  • Seamless integration with Data Catalog.
  • The Edge User Interface (UI), instead of Command Line Interface.
  • Connections via Edge, instead of lineage harvester drivers.
  • Job scheduling via Data Catalog.

The lineage harvester

The lineage harvester is a connectivity tool that allows you to create a technical lineage and business lineage.