Prepare the Data Catalog physical data layer for technical lineage

Important 

In Collibra 2024.05, we launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

Important This topic does not apply if you register a data source via Edge because in that case, Collibra automatically creates the system > database > schema > table > column hierarchy.

To stitch data objects in your data sources to their corresponding assets in Collibra Data Intelligence Platform, the full names of the data objects and assets must match exactly. The full names are constructed according to the full path of the data objects in your data source:

(system name) > database name > schema name > table name > column name

However, when you register a data source via Jobserver or by running the lineage harvester, only assets of the following asset types are created in Data Catalog:

  • Schema
  • Table
  • Column

Therefore, you have to create a Database asset and create a relation between it and the relevant Schema asset, to construct the full path hierarchy required for full name matching. If you set the useCollibraSystemName property to true in your lineage harvester configuration file, you also need to create a System asset and create a relation between it and the Database asset. We refer to this as preparing the Data Catalog physical data layer.

For more information, see Automatic stitching for technical lineage.

Prerequisites

  • You have a global role with the Catalog global permission, for example, Catalog Author.
  • You have a resource role with the following resource permissions on the Schema community if you use a Jobserver and on the Database community if you use Edge.
    • Asset > add
    • Attribute > add
    • Domain > add
    • Attachment > add

Additional prerequisites for JDBC data source types

If you are working with a JDBC data source type, you also need to meet the following prerequisites:

  • You have the permissions to retrieve the metadata of the following database components through the JDBC Driver Database Metadata methods:
    • Schemas
    • Tables
    • Columns
  • You have set up the JDBC driver of your source data, for example MySQL.
  • You have registered a data source.
    Tip The full name of your Schema asset must match the exact name of the schema (including for case-sensitivity) in the data source that you register in the configuration file.
    If you use Jobservers in Collibra Console and there is no available Jobserver, the Register data source actions will be grayed out in the global create menu in Collibra.

Steps

  1. Create a System asset:
    Important This is only required if you set the useCollibraSystemName property to true in your lineage harvester configuration file.
    Tip The full name of the System asset must match (including for case-sensitivity) the exact name of the system of the data source that you register in the configuration file.
  2. Create a Database asset:
    Tip The full name of your Database asset must match (including for case-sensitivity) the exact name of the database or project, in case of Google BigQuery, that you register in the configuration file. The names are case-sensitive.
  3. Create a relation between the System asset and the Database asset using the "Technology Asset groups / is grouped by Technology Asset" relation type.
    Important This step is only relevant if you created a System asset, in step 1.
  4. Create a relation between the Database asset and the relevant Schema asset using the "Technology Asset has / belongs to Schema" relation type.

What's next?

If you haven't created a configuration file yet, you are now required to create it.

If you created the configuration file and prepared the physical data layer, you can run the lineage harvester to start the technical lineage process.

When the technical lineage process is finished and you have the required permissions, you can go to the asset page of a Table or Column asset from the data source that you added in the configuration file and visualize the technical lineage. At the same time, new relations of the type "Data Element targets / sources Data Element" between assets in Data Catalog are created.