Collibra Data Lineage
In this topic, we addresses the following:
- What is Collibra Data Lineage?
- BI tool integration
- Business value
- How do I create a technical lineage?
Tip Check out our free Business and Technical Lineage training course in Collibra University.
What is Collibra Data Lineage?
Collibra Data Lineage is a cloud-only product that allows you to trace data from its source system, across the various contact points of your data landscape, to its final destination system.
Ultimately, our objective is to help you establish trust in your reports and use the data to make sound business decisions.
Collibra Data Lineage consists of two components:
- Technical lineage
- Business lineage
The value of these components are the same, but they are designed for different audiences.
Technical lineage
- Designed for Data Engineers, Data Architects, and other technical stewards.
- A detailed lineage graph that provides complete end-to-end lineage, to visualize the journey of the data objects in your external data sources.
- Allows you to explore data objects, including temporary tables and columns, in your external data sources. You don't need to register data sources in Collibra to include them in a technical lineage.Tip We use the term "data objects" when referring to columns and tables in your external data sources. We use the term "assets" (specifically Column assets and Table assets) when referring to the representation of data objects in Collibra.
- Includes all source code and data transformation details.
- Shows you in which system data objects are used and how they are transformed from data source to data source.
- Automatically created as part of the technical lineage process.
Business lineage
- Designed for Analysts, Governance roles, and other business stewards.
- Shows the relations between assets in Collibra that represent the data objects in your external data sources.
- Refers specifically to the relation type "Data Element targets / sources Data Element" that is drawn between Column assets.Note During the ingestion process, relations of the type "Data Element targets / sources Data Element" are automatically created between certain assets. Any relations of this type that you manually create between assets will be deleted during the synchronization process. If you want to manually create such relations and ensure that they are maintained, you can create a custom technical lineage.
- Shows how registered data sources relate to each another.Tip Registering a data source means creating assets (and the relations between the assets) in Collibra that represent the data objects in your external data sources.
- Automatically created as part of the technical lineage process.
- Technical lineage identifies data objects in your external data sources.
- Business lineage show assets in Collibra that represent some or all of those data objects.
Let's say that you have created a technical lineage for four different databases:
- The first database, Oracle, is not registered in Collibra, therefore there are no assets in Data Catalogthat represent the Oracle data objects.
- The second database, Raw, is registered in Collibra.
- The yellow background of the first node indicates that Table and Column assets that were created in Data Catalog are stitched to their corresponding data objects in the Raw database.
- The other node, the one with the gray background, is a temporary table. No assets are created for temporary data objects and so stitching is not relevant. That is why the node has a gray background.
- The third and fourth databases, Refined and Consumption, are ingested in Collibra. The assets that were created in Data Catalog are stitched to their corresponding data objects in the two databases.
What we what to point out here is that Technical lineage shows the data flow of all data objects across all four databases, regardless of any assets in Collibra.
The corresponding business lineage shows only the relations between data objects that have corresponding assets in Data Catalog. In the following image, we see the data flow of assets from the second database, to the third, to the fourth. The first database, Oracle, which is not registered in Collibra, and , is not shown on the diagram.
For more information on the differences between these two components, go to Differences between technical lineage and business lineage.
For a complete list of supported data sources, go to Supported data sources for technical lineage. If you want to create a technical lineage for a data source that is not currently supported, you can create a Custom technical lineage.
BI tool integration
Business intelligence software helps organizations to collect data from the various data sources across their data ecosystem and present the data in interactive dashboards and reports, to facilitate decision-making and strategic planning.
When you integrate your BI tool in Collibra:
- Metadata about the data objects in your external data sources is created as BI assets in Collibra.
- Relations are created:
- Between data objects in your external data source and assets in Collibra that represent those data objects.Tip These assets are created when the data source is registered, which is automatically carried out during the technical lineage process.
- Between BI assets and the assets in Collibra that represent the data objects in your external data source.
- Between data objects in your external data source and assets in Collibra that represent those data objects.
- technical lineage and business lineage are automatically created.
Business value
Collibra Data Lineage has many important use cases. Here are a few.
By providing transparency and traceability to the data used in a report, data lineage plays a foundational role in the report certification process:
- Review data sources and transformations associated with the data in a report, to help ensure accuracy and reliability.
- Identify the original sources of data used in the report, and how the data moves from the source system to intermediate systems.
- View and analyze the calculation rules that are used to extract and transform the data before it reaches the report.
All critical metadata is ingested during BI integration and shown on the Collibra asset pages. This includes information like data timestamps, quality metrics, data ownership, and other valuable attributes that help you to assess the reliability and quality of the data.
In Collibra 2024.05, we launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.
Use the following options to see the documentation in the latest UI or in the previous, classic UI:
You can manually synchronize the data in Collibra or set up a synchronization schedule, to help ensure the accuracy and completeness of the data over time. This can help identify inconsistencies or gaps in the data flow and transformation processes.
Collibra Data Lineage can help you with impact analysis when making changes to data sources, adjusting the calculation rules that drive transformations, migrating data and more. It can help you assess the potential impact of changes on downstream systems, data and reports.
Example Let's say you have data in a Snowflake data source, and you need to move everything to Databricks. After migration, you can create a technical lineage to trace the movement of data from one data source to the other and ensure data integrity throughout the migration process.
Understanding data dependencies and relationships helps you to:
- Anticipate which downstream systems could be impacted if you've made changes to a data source or calculation rule.
- Anticipate how changes to a particular data object or system will propagate across your data landscape.
- Minimize risks and make better informed decisions.
Collibra Data Lineage is a valuable tool for helping data analysts and engineers trace the source of data quality issues and anomalies. When you detect a discrepancy in your data, you can examine the lineage and source code to:
- Trace the issue back to the source system or process that is causing the problem.
- Analyze any calculations rules that might have affected the consistency or quality of the data.
- Identify how the issue is affecting downstream systems and reporting.
This can help you identify potential areas where the root cause might exist.
Compliance with data privacy regulations such as GDPR and CCPA, and various security, auditing and reporting standards, often requires organizations to show end-to-end traceability across their data landscape. In the data privacy context, Collibra Data Lineage can give you a complete view of where sensitive and restricted data is processed, shared, and stored.
- Trace the information across its systems, data source and processes.
- Monitor any migrations and transformations to the data.
- Identify who has access to the systems and data sources that consume the data.
BI integration in Collibra enables you to view all of the critical metadata about your reports and dashboards on dedicated asset pages in Data Catalog. The many attributes help you to identify the most critical reports that have the highest impact. This can help you effectively allocate your resources and minimize disruptions.
In Collibra 2024.05, we launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.
Use the following options to see the documentation in the latest UI or in the previous, classic UI:
A few of the key attributes include the following:
- Document creation and modification dates: See when the report was created and updated in your BI tool.
- Visits count: See how many people have viewed the report.Tip Let's say that you have two reports with the same name, but one has 400 views and the other has almost none. That gives a strong indication as to which is the more helpful report.
- Owner in Source: Easily identify who owns and who certified a report, to know where to turn for additional help and information
- Calculation Rule: See DAX calculations for calculated columns and measures on Power BI Column asset pages.
- URL: Easily access the report in your BI tool.
- Relation types allow you to immediately identify in which other reports a report is used.
How do I create a technical lineage?
There are two ways to create technical lineage and business lineage:
The typical workflow for creating a technical lineage is the same whether you use the lineage harvester or Edge. If you want to use Edge and the lineage harvester together, you must use lineage harvester version 2023.04 or newer. If you want to maintain on Edge the technical lineage that you created by using the lineage harvester, you can add technical lineage capabilities for the data sources with the same source IDs. For details, go to Migrate the technical lineage of a data source.
For details about the typical workflow, go to Technical lineage typical workflow.
Edge
You can create a technical lineage and business lineage via Edge, for Tableau, Power BI and all supported JDBC and ETL data sources. Benefits include:
- Seamless integration with Data Catalog.
- The Edge User Interface (UI), instead of Command Line Interface.
- Connections via Edge, instead of lineage harvester drivers.
- Job scheduling via Data Catalog.
The lineage harvester
The lineage harvester is a connectivity tool that allows you to create a technical lineage and business lineage.
- You can use the lineage harvester for any supported data source.
- You can download the latest lineage harvester from the Collibra Community Downloads page.
- You need to use the Command Line Interface in conjunction with a lineage harvester configuration file.