Automatic stitching for technical lineage

Stitching is a process that creates relations between data objects from a data source and their corresponding assets in Collibra. Stitching is the bridge between the metadata you ingest as assets in Data Catalog and the technical lineage. When the data sources are scanned, Collibra Data Lineage automatically creates new relations of the type "Data Element targets / sources Data Element":

  • Between data objects in your data source and their corresponding assets in Data Catalog, including the asset that you create when preparing the Data Catalog physical data layer.
  • If you are integrating a BI tool, between ingested assets from BI sources and Data Catalog assets from registered data sources.

Preflight checks: Requirements for successful stitching

Before running the lineage harvester or Edge ingestion, verify the following requirements to ensure your Data Catalog assets successfully stitch to the data objects in the technical lineage.

Verify the Data Catalog physical data layer: If you are using the CLI lineage harvester (deprecated), you have to create a Database asset and create a relation between it and the relevant Schema asset. If you set the useCollibraSystemName property to true in your lineage harvester configuration file, you also need to create a System asset and create a relation between it and the Database asset.

Note This is applicable if you are using the CLI lineage harvester (deprecated). Technical lineage via Edge automatically creates the necessary hierarchy.

Ensure exact name matching: Stitching requires a case-sensitive, exact match of the full path of data objects in your data source and their corresponding assets in Collibra: System > Database > Schema > Table > Column.

Check system name configuration:

  • If you are using the CLI lineage harvester (deprecated), and you set useCollibraSystemName to true in your configuration file, you must use the collibraSystemName property to specify the full name of the System asset you created when you prepared the physical data layer.
  • If you are using technical lineage via Edge, the value of the Collibra System Name field must be the same as the full name of the System asset in Data Catalog, which Edge creates automatically when you integrate your BI tool.

If this is not correctly defined, the system name defaults to "DEFAULT", which breaks full name matching.

Confirm BI user permissions: If integrating one of the supported BI tools, ensure that the user account used for BI integration (meaning in the BI tool) has sufficient permissions to access external data assets. Insufficient access to the data objects in your data source results in missing stitching.

Review query syntax: If integrating one of the supported BI tools, check your BI reports for unsupported custom SQL, stored procedures (Tableau), or unsupported Power Query M functions (Power BI). These can prevent the external database node from appearing in the graph entirely.

Prepare for database mapping: Be ready to map technical names (such as IP addresses) to logical asset names in Collibra. APIs often return, for example, 111.93.0.181, while the asset is named something like oracle-db. To achieve stitching, you need to map these names in the source configuration file (if using the CLI lineage harvester) or in the Source Configuration field (if using technical lineage via Edge).

Verify column existence: Ensure your tables and views contain columns. Stitching is based on columns; assets without columns cannot be stitched. For any tables or views without columns, on the Stitching tab page, the Found in column shows that the table or view is found in the technical lineage, but is not found in Data Catalog.

Full path, full name matching

To stitch assets in Data Catalog to data objects collected by the lineage harvester, the Collibra Data Lineage looks at the full path of the assets in Data Catalog and the full path of data objects in your data source. The full names are constructed according to the full path of the data objects in your data source:

(system name) > database name > schema name > table name > column name

If the full paths match, the Collibra Data Lineage automatically stitches the data objects to the existing assets in Data Catalog. To indicate this, the assets have yellow icons in the technical lineage graph. Note that in Collibra, full paths are case-sensitive.

If the full paths don't match, including for case-sensitivity, Collibra Data Lineage can't stitch them. To indicate this, the data objects have a gray background in your technical lineage graph. If you change the full path, make sure to run the lineage harvester (deprecated) again.

Tip To stitch assets outside of the traditional system > database > schema > table > column hierarchy, you can use Custom technical lineage with the batch-definition option.

You can use the Stitching tab page to find the full path of assets in Data Catalog and data objects that were collected by the lineage harvester. The Stitching tab page also shows an overview of all assets and data objects that are stitched successfully.

The importance of preparing the Data Catalog physical data layer

Note This is applicable if you are using the CLI lineage harvester (deprecated). Technical lineage via Edge automatically creates the necessary hierarchy.

As stated in the previous section, full names are constructed according to the full path of the data objects in your data source:

(system name) > database name > schema name > table name > column name

However, when you register a data source via the CLI lineage harvester (deprecated), only assets of the following types are created in Data Catalog:

  • Schema
  • Table
  • Column

Therefore, you have to create a Database asset and create a relation between it and the relevant Schema asset, to construct the full path hierarchy required for full name matching. If you set the useCollibraSystemName property to true in your lineage harvester (deprecated) configuration file, you also need to create a System asset and create a relation between it and the Database asset.

If you don't prepare the Data Catalog physical data layer, Data Catalog creates a technical lineage without stitching. As a result, when you click the Technical lineage tab on any relevant asset page, the message The current asset doesn't have a technical lineage yet is shown. However, you can use the Browse tab pane to view the technical lineage of data objects in data sources for which you created the technical lineage.

Visual indicators in the technical lineage graph

As it concerns a technical lineage graph, stitching or the lack of stitching is reflected only in the color of the nodes.

A yellow icon indicates stitching. Specifically:

  • There is an asset in Collibra with a full name that exactly matches the data object in the technical lineage.
  • A relation of the type "Data Element targets / sources Data Element" is created between the asset and the data object, and shown on the asset page.
  • In the Stitching tab, the Found In column indicates that the database table was found in both Data Catalog and the technical lineage.
    Image of the Stitching tab. The Found In column indicates that the database table was found in both Data Catalog and the technical lineage.

A gray icon indicates a lack of stitching. Specifically:

  • There is no asset in Data Catalog with a full name that exactly matches the name of the data object.
  • In the Stitching tab, the Found In column indicates that the database table was found only in the technical lineage.
    Image showing the Stitching tab. The Found In column indicates that the database table was found only in the technical lineage.

Stitching for BI tool integrations

Let's assume you want to trace the data from your data sources to your reports, and you've already integrated your BI tool. Now go to the asset page of your BI Data Model asset and click the Technical Lineage tab. There are most likely be three nodes or groupings of nodes:

  • The external database
  • The BI data model
  • The BI report

Image showing three nodes or groupings of nodes: the external database, BI data model, and BI report 

Note If you are integrating MicroStrategy, there are four groups of nodes; the BI data model node consists of two groups of nodes, one representing MicroStrategy Attributes and one representing MicroStrategy Facts.

The external database

This node represents the table from the database you used to create the report in your BI tool. This node is a prerequisite for stitching. If it is not shown in the technical lineage, stitching is not possible. The node is shown in the technical lineage as long as:

  • You have the required roles and permissions in your BI tool to access the data in your data sources.
  • There are no unsupported custom SQL transformations or functions.
  • No errors have caused the integration to fail.

The gray background icon indicates that stitching is missing. Specifically, while it's likely that there is a Table asset in Collibra that corresponds with this database table, their full names do not exactly match. For troubleshooting tips, go to Automatic stitching for technical lineage.

The BI data model

This node represents the dataset that you used to create the report in your BI tool. It is always shown in the technical lineage because it is the target of the database table and the source of the BI report. This node is always stitched because Collibra Data Lineage knows the full name of the dataset in your BI tool, and it creates the corresponding BI Data Model asset with the exact same name. This is referred to as BI stitching.

The BI report

This node (or grouping of nodes) represents the report you created in your BI tool. It is always part of the technical lineage. Like the BI data model node, this node is always stitched because Collibra Data Lineage knows the full name of the report in your BI tool, and it creates the corresponding BI Report asset with the exact same name.

Tip If you're viewing the lineage at the column level, and the attribute that the column represents is not used in the report, there is no arrow leading to the report node in the technical lineage. In that case, right-click on the data model node and click Table lineage to pull back and view the table-level lineage. The BI report node appears and you see which columns, or data attributes, are used in the report.

Automatic stitching for technical lineage

Note The example images are from the classic Technical lineage viewer UI. The latest viewer UI displays elements differently. For more information about the latest and classic Technical lineage viewer UI, go to Technical lineage viewer.

You can use the Stitching tab page to find the full path of assets in Data Catalog and data objects that were collected by the lineage harvester.

Common reasons for missing stitching

Database names don't match

During integration of your BI tool, the API may return a technical name, IP address, or host name of the database, instead of the true name of the database.

Power BI example

Let's look at the name of the unstitched database table in the following example technical lineage graph: MODEL.PRODUCT CATEGORY [ADVENTUREWORKS::database].

Image showing three nodes or groupings of nodes: the external database, BI data model, and BI report 

We can identify the following:

  • The database name: ADVENTUREWORKS
  • The schema name: MODEL
  • The table name: PRODUCT CATEGORY

Image showing the database name, schema name, and table name in the Browse tab.

Now let's find the table in the Stitching tab:

  1. Click the Settings tab.
  2. Click Show status.
  3. Click the Stitching tab.
    The Stitching tab shows a list of all tables that exist in Data Catalog and on the Collibra Data Lineage service instance.

Use the Search field to find the unstitched database table PRODUCT CATEGORY.

Image showing the Search field used to find the unstitched database table PRODUCT CATEGORY.

In the Found in column, the value "Technical Lineage" confirms that the table was found only in the technical lineage. An exactly matching asset was not identified in Collibra.

Now try to find a likely match. Look for a table that has the same name and the value "Catalog" in the Found in column. The table shown in the following image looks like a match. The schema and table names match exactly; only the database names differ.

Image showing that the schema and table names match exactly, but the database names differ.

Solution
  1. Configure the database mapping section in your source ID configuration file (if you're using the CLI lineage harvester) or the Source Configuration field (if you're using Edge) to map the technical name (ADVENTUREWORKS) to the logical name in Data Catalog (AAS-MODEL):
    {
    	"found_dbname=adventureworks;found_hostname=*": {
    		"dbname": "aas-model",
    	}
    }
    
  2. Synchronize the data source again.
    Stitching is achieved:
  3. Image showing successful stitching.

Tableau example

First, let's look at the name of the unstitched database table in the example technical lineage graph: DBADMIN.TESTING1 [111.93.0.181::database].

We can identify the following:

  • The database name: 111.93.0.181
  • The schema name: DBADMIN
  • The table name: TESTING1

Image showing the identified database name, schema name, and table name.

Now let's find the table in the Stitching tab:

  1. Click the Settings tab.
  2. Click Show status.
  3. Click the Stitching tab.

    The Stitching tab shows a list of all tables that exist in Data Catalog and on the Collibra Data Lineage service instance.

Use the Search field to find the unstitched table TESTING1.

Image showing the Search field used to find the unstitched table TESTING1.

In the Found in column, the value "Technical Lineage" confirms that the table was found only in the technical lineage. An exact match was not identified in Collibra. The table shown in the following image looks like a match. The schema and table names match exactly; only the database names differ.

Image showing that the schema and table names match exactly; only the database names differ.

Solution
  1. Configure the hostnameMapping section in your source ID configuration file (if you're using the CLI lineage harvester) or the Source Configuration field (if you're using Edge), to map the IP address (111.93.0.181) to the logical name in Data Catalog (oracle-db):
    "hostnameMapping": {
    	"found_dbname=111.93.0.181;found_hostname=*: {
    		"dbname": "oracle-db",
    		"dialect": "oracle"
    		}
    }
    
  2. Synchronize the data source again.
    Stitching is achieved
    Image showing successful stitching

System names are not specified

If you are using the CLI lineage harvester (deprecated) and you set the useCollibraSystemName property to true in your configuration file, but you don't define the collibraSystemName in the source ID configuration file, the system name in the technical lineage is "DEFAULT". In that case, stitching is missing.

Solution

  • If you are using the CLI lineage harvester (deprecated), and you set useCollibraSystemName to true in your configuration file, you must use the collibraSystemName property to specify the full name of the System asset you created when you prepared the physical data layer.
  • If you are using technical lineage via Edge, the value of the Collibra System Name field must be the same as the full name of the System asset in Data Catalog, which Edge creates automatically when you integrate your BI tool.

Power BI: Unsupported Power Query M function or calculated columns

Collibra Data Lineage does not support DAX. Therefore, calculated columns result in missing lineage, as do unsupported Power Query M functions. In these situations, the relations required to recognize the database are not exposed, resulting in "missing stitching".

In the following example image, notice that the database node, which should be situated to the left of the BI data model node, is missing from the technical lineage graph.

Image showing that the database node, which should be situated to the left of the BI data model node, is missing from the technical lineage graph.

Solution

Examine the BI dataset table, to see if you can identify a problem.

  1. Click the Settings tab.
  2. Click Show status.

    The Sources tab shows a list of all data sources on the Collibra Data Lineage service instance.
  3. Click one or more checkboxes, to show the transformations and source code fragments for specific data sources, or clear all checkboxes, to show for all data sources.

Transformations and source code fragments are shown in the transformations table below. We can quickly identify that there was an analyze error, because the MySQL.Database function is not supported.

Image showing transformations and source code fragments in the transformations table

To achieve stitching, ensure that your queries only include supported Power Query M functions. We also encourage you to create an Ideation ticket via the Collibra Ideation Portal, if you'd like to request support for a particular function.

Power BI: Object Level Security (OLS)

the Microsoft APIs do not return metadata for any data objects with Object Level Security (OLS) applied. Therefore, if you have OLS applied to one or more columns in the semantic model that was used to build your report, lineage between the Power BI Semantic Model node and the Power BI Report node is missing.

As shown in the following example image, if the semantic model metadata is not provided by the Microsoft APIs, an analysis error is produced and shown in the Technical lineage Sources tab for the unsupported semantic model.

Image showing an analysis error that is produced and shown in the Technical lineage Sources tab for the unsupported semantic model.

Tableau: Insufficient permissions

If you use a Viewer role or an Explorer role in Tableau, without access to external assets, stitching is always missing. To achieve stitching, ensure that your Tableau user has access to the external assets for which you want lineage.

For complete details on the effect of roles and permissions on technical lineage and stitching, including how to provide necessary permissions to get lineage and automatic stitching with an Explorer role,go to Tableau roles and permissions.

Tableau: Unsupported custom SQL

Tableau Catalog doesn't support all custom SQL and stored procedures. If a Tableau data object uses an unsupported custom SQL or stored procedure, the external database node, which should be situated to the left of the BI data model node, is missing in the technical lineage graph.

Image showing that the external database node, which should be situated to the left of the BI data model node, is missing in the technical lineage graph.

For complete information, see the Tableau documentation Tableau Catalog support for custom SQL.

MicroStrategy: Missing relations

If all of your assets were successfully ingested, but relations are missing between MicroStrategy Data Entity assets (meaning MicroStrategy attributes and facts) and MicroStrategy Data Attributes (meaning MicroStrategy columns) and there is no lineage – only orphaned boxes and no database nodes – it means the API call to the tables endpoint api/model/tables was unsuccessful.

Solution

Ensure that your MicroStrategy permissions are correctly configured and that your proxy (if applicable) can connect to this endpoint.

For complete information on MicroStrategy REST, see the MicroStrategy documentation.

Table assets and View assets without columns

If you ingest a table or a view that does not have any columns, the Table or View asset that is created in Data Catalog does not have any relations to Column assets. In this case, stitching cannot be achieved, because stitching is based on columns, not tables or views.