Prepare the Data Catalog physical data layer for technical lineage

To stitch data objects in your data sources to their corresponding assets in Collibra Data Intelligence Cloud, the full names of the data objects and assets must match exactly. The full names are constructed according to the full path of the data objects in your data source:

(system name) > database name > schema name > table name > column name

However, when you register a data source via Jobserver or via the lineage harvester, only assets of the following asset types are created in Data Catalog:

  • Schema
  • Table
  • Column

Therefore, you have to create a Database asset and create a relation between it and the Schema asset, to construct the full path hierarchy required for full name matching. If you set the useCollibraSystemName property to true in your lineage harvester configuration file, you also need to create a System asset and create a relation between it and the Database asset. We refer to this as preparing the Data Catalog physical data layer.

Note This topic does not apply if you register a data source via Edge because in that case, Collibra automatically creates the system > database > schema > table > column hierarchy.

For more information, see Automatic stitching for technical lineage.

Prerequisites

  • You have a global role with the Catalog global permission, for example Catalog Author.
  • You have set up the JDBC driver of your source data, for example MySQL.
  • You have registered a data source.
    Tip The full name of your Schema asset must match the exact name of the schema (including for case-sensitivity) in the data source that you register in the configuration file.
    If you use Jobservers in Collibra Console and there is no available Jobserver, the Register data source actions will be grayed out in the global create menu in Collibra.
  • You have a resource role with the following resource permissions on the Schema community if you use a Jobserver and on the Database community if you use Edge.
    • Asset > add
    • Attribute > add
    • Domain > add
    • Attachment > add
  • You have the permissions to retrieve the metadata of the following database components through the JDBC Driver Database Metadata methods:
    • Schemas
    • Tables
    • Columns

Steps

  1. Create a System asset:
    Important This is only required if you set the useCollibraSystemName property to true in your lineage harvester configuration file.
    Tip The full name of the System asset must match (including for case-sensitivity) the exact name of the system of the data source that you register in the configuration file.
    1. Open the product for which you want to create an asset (for example, Business Glossary).
    2. On the main toolbar, click .
      The Create dialog box appears.
    3. On the Assets tab, click System.
      The Create Asset dialog box appears.
    4. Enter the required information.
      FieldDescription
      Type

      The asset type of the asset that you are creating.

      Domain

      The domain to which the asset will belong.

      Tip Ensure that the domain type of the selected domain is assigned to the selected asset type.

      Name

      A name to identify the asset.

      Tip 

      You can simultaneously create multiple assets.
      To do so, after typing the name, press Enter, and then type the next name. Depending on the settings, asset names may need to be unique in their domain. If you enter a name that already exists, it appears in the strike-through style.

    5. Click Create.
      A message stating that one or more assets are created appears in the upper-right corner of the page.
  2. Create a Database asset:
    Tip The full name of your Database asset must match (including for case-sensitivity) the exact name of the database or project, in case of Google BigQuery, that you register in the configuration file. The names are case-sensitive.
    1. Open the product for which you want to create an asset (for example, Business Glossary).
    2. On the main toolbar, click .
      The Create dialog box appears.
    3. On the Assets tab, click Database.
      The Create Asset dialog box appears.
    4. Enter the required information.
      FieldDescription
      Type

      The asset type of the asset that you are creating.

      Domain

      The domain to which the asset will belong.

      Tip Ensure that the domain type of the selected domain is assigned to the selected asset type.

      Name

      A name to identify the asset.

      Tip 

      You can simultaneously create multiple assets.
      To do so, after typing the name, press Enter, and then type the next name. Depending on the settings, asset names may need to be unique in their domain. If you enter a name that already exists, it appears in the strike-through style.

    5. Click Create.
      A message stating that one or more assets are created appears in the upper-right corner of the page.
  3. Create a relation between the System asset and the Database asset using the "Technology Asset groups / is grouped by Technology Asset" relation type.
    Important This step is only relevant if you created a System asset, in step 1.
    1. In the tab pane, click Add Characteristic.
      The Add a characteristic dialog box appears.
    2. Click Relations.
    3. Search for and click groups Technology asset.
      The Add groups Technology asset dialog box appears.
    4. Enter the required information.
      OptionDescription
      Assets

      The name of the database.

      Filter suggested assets by organization

      Option to filter the suggestions based on selected communities and domains.

      If this option is selected, the organization tree appears. You can then filter and select domains and communities.

      Start dateOptionally enter the date on which the relation between the assets becomes applicable. Leave this field empty to create a permanent relation.
      End dateOptionally enter the date on which the relation between the assets is no longer applicable. Leave this field empty to create a permanent relation.
    5. Click Save.
  4. Create a relation between the Database asset and the Schema asset using the "Technology Asset has / belongs to Schema" relation type.
    1. In the tab pane, click Add Characteristic.
      The Add a characteristic dialog box appears.
    2. Click Relations.
    3. Search for and click has schema.
      The Add has schema dialog box appears.
    4. Enter the required information.
      OptionDescription
      Assets

      The name of the schema.

      Filter suggested assets by organization

      Option to filter the suggestions based on selected communities and domains.

      If this option is selected, the organization tree appears. You can then filter and select domains and communities.

      Start dateOptionally enter the date on which the relation between the assets becomes applicable. Leave this field empty to create a permanent relation.
      End dateOptionally enter the date on which the relation between the assets is no longer applicable. Leave this field empty to create a permanent relation.
    5. Click Save.

What's next?

If you haven't created a configuration file yet, you are now required to create it.

If you created the configuration file and prepared the physical data layer, you can run the lineage harvester to start the technical lineage process.

When the technical lineage process is finished and you have the required permissions, you can go to the asset page of a Table or Column asset from the data source that you added in the configuration file and visualize the technical lineage. At the same time, new relations of the type "Data Element targets / sources Data Element" between assets in Data Catalog are created.

The lineage harvester also uses scheduled jobs to automate the technical lineage process.