Automatic stitching for technical lineage
Stitching is a process that creates relations between data objects from a data source and their corresponding assets in Collibra. When the data sources are scanned, Collibra Data Lineage automatically creates new relations of the type "Data Element targets / sources Data Element":
- Between data objects in your data source and their corresponding assets in Data Catalog, including the asset that you create when preparing the Data Catalog physical data layer.
- If you are integrating a BI tool, between ingested assets from BI sources and Data Catalog assets from registered data sources.
The importance of preparing the Data Catalog physical data layer
To stitch data objects in your data sources to their corresponding assets in Collibra Data Intelligence Platform, the full names of the data objects and assets must match exactly. The full names are constructed according to the full path of the data objects in your data source:
(system name) > database name > schema name > table name > column name
However, when you register a data source via Jobserver or by running the lineage harvester, only assets of the following asset types are created in Data Catalog:
- Schema
- Table
- Column
Therefore, you have to create a Database asset and create a relation between it and the relevant Schema asset, to construct the full path hierarchy required for full name matching. If you set the useCollibraSystemName
property to true
in your lineage harvester configuration file, you also need to create a System asset and create a relation between it and the Database asset. We refer to this as preparing the Data Catalog physical data layer.
Important This does not apply if you register a data source via Edge because in that case, Collibra automatically creates the system > database > schema > table > column hierarchy.
Note If you don't prepare the Data Catalog physical data layer, Data Catalog creates a technical lineage without stitching. As a result, when you click the Technical lineage tab on any Column, Table, Tableau Data Attribute, Power BI Column or SSRS Column asset page, you get the message The current asset doesn't have a technical lineage yet. However, you can use the Browse tab pane to view the technical lineage of data objects in data sources for which you created the technical lineage.
Stitching issues
To stitch assets in Data Catalog to data objects collected by the lineage harvester, the Collibra Data Lineage service looks at the full path of the assets in Data Catalog and the full path of data objects in your data source. Stitching is based on the full path of objects with the following structure: (system) > database > schema > table > column. If the full paths match, the Collibra Data Lineage automatically stitches the data objects to the existing assets in Data Catalog. To indicate this, the assets have a yellow background in the technical lineage graph. Note that in Collibra, full paths are case-sensitive.
If the full path of an asset in Data Catalog does not match (including for case-sensitivity) the full path of a data object in your data source, Collibra Data Lineage cannot stitch them. To indicate this, the data objects have a gray background in your technical lineage graph. To fix stitching issues, you must check the full path of the assets in Data Catalog and make sure they match the full path of the data objects that are shown in the technical lineage graph. If you change the full path, make sure to run the lineage harvester again. Note that in Collibra, full paths are case-sensitive.
Tip To stitch assets outside of the traditional system > database > schema > table > column hierarchy, you can use Custom technical lineage with the batch-definition option.
You can use the Stitching tab page to easily find the full path of assets in Data Catalog and data objects that were collected by the lineage harvester. The Stitching tab page also shows an overview of all assets and data objects that are stitched successfully.
Table assets and View assets without columns
If you ingest a table or a view that does not have any columns, the Table or View asset that is created in Data Catalog will not have any relations to Column assets. In this case:
- Stitching cannot be achieved, because stitching is based on columns, not tables or views.
-
On the Stitching tab page, the Found in column will reflect that the table or view:
- Is found in the Technical lineage.
- Is not found in Catalog.