Custom technical lineage

You can create a technical lineage for a particular data source, even if Collibra Data Lineage does not currently support integration of that data source. We refer to this as custom technical lineage.

Example As shown in the following image, you want to create a technical lineage that shows relations between tables and columns from system A and system B, to system C, to system D (A and B -> C -> D). System A, B and D are supported data sources, but system C is a custom application. Depending on the instruction formatting option you choose, you can create one or more JSON files that defines the metadata of system C, to generate a technical lineage.

You can use your custom technical lineage JSON files as the only lineage source, or you can harvest metadata from supported data sources, such as Oracle and Tableau, to visualize lineage alongside the custom technical lineage defined by your JSON files.

In this topic

How to create a custom technical lineage

You can create a custom technical lineage via the lineage harvester or via Edge.

JSON file formatting options: Single-file or batch definition

There are two formatting options for defining your custom technical lineage.

Important The single-file definition option will be deprecated in a future version of Collibra.

To create a custom technical lineage via the batch definition option, you define the custom technical lineage in any number of JSON files, and refer to the directory that contains the files in the lineage harvester configuration file. Collibra Data Lineage generates a technical lineage based on the definitions in your JSON files and, optionally, any source code files you've added in the same directory.

Note With the batch definition option, we no longer draw a distinction between simple and advanced custom technical lineage. You can still choose to include source code files to provide additional details on where the lineage comes from, but we no longer differentiate between a custom technical lineage that includes source code and one that does not.

Benefits

  • You can define any number of JSON files, which allows for more efficient processing.
  • Multiple organizational teams can contribute to the lineage, without having to merge their definitions into a single JSON file.
  • You can show table-level lineage, column-level lineage, and indirect lineage.
  • Stitching is natively available for any asset type.

Batch file contents

In a CUSTOM-LINEAGE folder, you need:

  • Exactly one metadata file.
    The name of the file must be metadata.json.
  • Optionally, one or more assets files. These are required if you want to achieve stitching.
    File names must follow the format: assets<something unique>.json.
  • One or more lineage files, describing table-level, column-level, and indirect lineage.
    File names must follow the format: lineage<something unique>.json.
  • Optionally, a source codes subdirectory with source code files. You refer to the source code files from within your lineage files.
    Note Collibra Data Lineage doesn't parse the files in the source codes directory, meaning it does not extract lineage from the SQL and PY files.

For more information about these JSON files, go to Custom technical lineage JSON file details.

Example Directory tree:
__CUSTOM-LINEAGE__
    ├── metadata.json
    ├── assets_domain1.json
    ├── assets1.json
    ├── lineage.abc.json
    ├── lineage-extra.json
    └── source_codes
        ├── sc1.sql
        └── sc2.py

To create a custom technical lineage via the single-file definition option, you have to define the custom technical lineage in a single JSON file and refer to the JSON file in the lineage harvester configuration file. Collibra Data Lineage generates the technical lineage based on your JSON file and, optionally, any source code files to which you refer.

Limitations

  • The custom technical lineage must be defined in a single JSON file. This can be challenging if, for example, you have several organizational teams contributing to the lineage definition.
  • You can only show column-level lineage, meaning the target of your lineage relationship always has to be a column.
  • Stitching is only available natively for Column assets. Stitching is available for other asset types, but only if those assets are discovered by another scanner.

Simple and advanced custom technical lineage

If you opt for the single-file definition option, there are two types of custom technical lineages that you can create:

  • Simple custom technical lineage, which defines a basic object hierarchy and creates a lineage between two or more data objects. To create a simple custom technical lineage, you need to include assets and lineages sections in your JSON file. You can add the transformation code in the lineages section.
  • Advanced custom technical lineage, which contains a simple custom technical lineage and uses separate source code files that include transformation details, to create the lineage. To create an advanced custom technical lineage, you need to include assets, lineages and codebase_files sections in your JSON file. You add references to the transformation code in source code files in the codebase_files section.

Transformation code in both simple and advanced custom technical lineages is shown in the source code pane at the bottom part of the technical lineage graph.

Note With the batch definition option, we no longer draw a distinction between simple and advanced custom technical lineage. You can still include, or not include, source code files to create the lineage, but we no longer differentiate between a custom technical lineage that includes source code and one that does not.

Limitations

Issue Description

The "Data Element targets / sources Data Element" relations aren't created for some asset types.

As part of the technical lineage process, Collibra Data Lineage automatically creates relations of the type "Data Element targets / sources Data Element" between assets in Collibra that represent the data objects in your external data sources. However, he "Data Element targets / sources Data Element" relation can only be created between Data Element assets or assets of child asset types, for example:

  • Column assets.
  • Data Attribute assets.
  • Any custom asset types that are a child of the Data Element asset type.

To confirm, the "Data Element targets / sources Data Element" relation can't be created between other asset types; therefore that lineage can't be shown in a business lineage for other asset types.

The Technical Lineage tab is not included on an asset page.

The Technical Lineage tab, by which you access the technical lineage viewer, is not available on the asset pages of all asset types. If you want to view the technical lineage for an asset of a type for which the Technical Lineage tab is not included on the asset page, you need to access the lineage via an asset of a type for which the tab is included on the asset page.

Keep in mind that he Technical Lineage tab is only visible if you have the following permissions:

Missing stitching for Column assets

Stitching cannot be achieved for Column assets if they are not part of the traditional Database > Schema > Table > Column hierarchy, because such columns are not returned by the API.