Create a custom technical lineage

You can create a custom technical lineage to include metadata of data sources that are not supported.

You can create two types of custom technical lineages:

Note In the local folder that you need to create, you can only have one JSON file. You can, however, add other files in the harvested directory and subdirectories and refer to those files from within the JSON file.

Prerequisites

  • You have downloaded the lineage harvester and you have the necessary system requirements to run it.
  • You have the necessary permissions for all database objects that the lineage harvester accesses.
  • You have prepared the physical data layer in Data Catalog.
    Note To stitch the data objects of data sources mentioned in the JSON file with Data Catalog assets, you first have to register those data sources in Data Catalog and you have to use a structure that matches the structure of ingested assets in Data Catalog.

Create a simple custom technical lineages

  1. Create a local folder.
  2. Create a new JSON file in the local folder.
  3. Name the JSON file lineage.json.
  4. Add the following mandatory sections to the JSON file:

    Properties

    Description

    version

    The version of the JSON architecture.

    Note Currently, you can only use version 1.0.

    tree

    This section contains tree definitions of data objects between which lineages can be defined. Each node of a tree contains the name, type and optionally children or leaves properties which form a hierarchy of data objects. You can reuse the same properties in one node to map all data objects in the hierarchy.

    Tip Usually, the structure you map is the following: system > database > schema > table > column. The system is optional, unless the useCollibraSystemName property is set to true in the lineage harvester configuration file. The Collibra Data Lineage can stitch these data objects to assets in Data Catalog. However, you can also map custom objects, for example dashboards and reports. Custom objects cannot be stitched to assets in Data Catalog.

    name

    The name of your data object. This is the system name, database name, schema name, table name or column name.

    Warning  

    • The names are case sensitive.
    • The names of data objects of the same type must be unique.

    type

    The type of your data object. For example: system, database, schema, table or column.

    children

    The sub-objects that have a hierarchical relation to the defined data object. Each child also has the name and type properties and can have children of its own, except for the penultimate child which has leaves instead of children. Leaves are children without children.

    Note Use the children property to define sub-objects, but use the leaves property if the object is on the penultimate level. For example, to define columns that have a relation to a table node.

    leaves

    The sub-objects of another sub-object that is defined in a children property, but cannot have sub-objects of their own.

    Note Technical lineage only shows relations between leaf nodes of the tree. Leaves are usually columns that have a relation to a table node in the tree structure.

    lineages

    This section contains the path from a source to a target and defines the mappings and transformations that should be processed by the Collibra Data Lineage server.

    Note If you create a lineage between data objects that are also assets in Data Catalog, the Collibra Data Lineage server automatically stitches the data objects to the assets in Data Catalog. However, you can also create a lineage between custom data objects that are not assets in Data Catalog, for example reports and dashboards.

    src_path

    The hierarchical path to the source data object. This data object is shown as a leaf in the tree node.

    <data objects>

    All data object names in the hierarchical path to the source leaf.

    Example of data objects that can be stitched: system > database > schema > table > column.

    Example of data objects that cannot be stitched: dashboard > report > column.

    trg_path

    The hierarchical path to the target data object. This data object is shown as a leaf in the tree node.

    <data objects>

    All data object names in the hierarchical path to the target leaf.

    Example of data objects that can be stitched: system > database > schema > table > column.

    Example of data objects that cannot be stitched: dashboard > report > column.

    mapping

    The mapping name. This refers to the queries used in the technical lineage.

    source_code

    The transformation code. This determines how the technical lineage is constructed.

    Tip The source code can be a SQL statement or code that manipulates data.

  5. In your configuration file, add the path to the JSON file.

Create an advanced custom technical lineage

  1. Create a local folder.
  2. Create a new JSON file in the local folder.
  3. Name the JSON file lineage.json.
  4. In the same local folder, store all of the source codes that you want to reference in the JSON file.
  5. Add the following sections to the JSON file:

    Properties

    Description

    version

    The version of the JSON architecture.

    Note Currently, you can only use version 1.0.

    tree

    This section contains tree definitions of data objects between which lineages can be defined. Each node of a tree contains the name, type and optionally children or leaves properties which form a hierarchy of data objects. You can reuse the same properties in one node to map all data objects in the hierarchy.

    Tip Usually, the structure you map is the following: system > database > schema > table > column. The system is optional, unless the useCollibraSystemName property is set to true in the lineage harvester configuration file. The Collibra Data Lineage can stitch these data objects to assets in Data Catalog. However, you can also map custom objects, for example dashboards and reports. Custom objects cannot be stitched to assets in Data Catalog.

    name

    The name of your data object. This is the system name, database name, schema name, table name or column name.

    Warning  

    • The names are case sensitive.
    • The names of data objects of the same type must be unique.

    type

    The type of your data object. For example: system, database, schema, table or column.

    children

    The sub-objects that have a hierarchical relation to the defined data object. Each child also has the name and type properties and can have children of its own, except for the penultimate child which has leaves instead of children. Leaves are children without children.

    Note Use the children property to define sub-objects, but use the leaves property if the object is on the penultimate level. For example, to define columns that have a relation to a table node.

    leaves

    The sub-objects of another sub-object that is defined in a children property, but cannot have sub-objects of their own.

    Note Technical lineage only shows relations between leaf nodes of the tree. Leaves are usually columns that have a relation to a table node in the tree structure.

    lineages

    This section contains the path from a source to a target and defines the mappings and transformations that should be processed by the Collibra Data Lineage server.

    Note If you create a lineage between data objects that are also assets in Data Catalog, the Collibra Data Lineage server automatically stitches the data objects to the assets in Data Catalog. However, you can also create a lineage between custom data objects that are not assets in Data Catalog, for example reports and dashboards.

    src_path

    The hierarchical path to the source data object. This data object is shown as a leaf in the tree node.

    <data objects>

    All data object names in the hierarchical path to the source leaf.

    Example of data objects that can be stitched: system > database > schema > table > column.

    Example of data objects that cannot be stitched: dashboard > report > column.

    trg_path

    The hierarchical path to the target data object. This data object is shown as a leaf in the tree node.

    <data objects>

    All data object names in the hierarchical path to the target leaf.

    Example of data objects that can be stitched: system > database > schema > table > column.

    Example of data objects that cannot be stitched: dashboard > report > column.

    mapping_ref

    The mapping of the source codes that are located in the same directory of the JSON file and their positions in the technical lineage.

    Note The positions are zero based. The first character in a sequence has position 0.

    source_code

    The source code for which you provide a path in the codebase_files node.

    Tip The source code can be a SQL statement or code that manipulates data.

    mapping

    The mapping name of the mapping defined in the codebase_files node.

    codebase_pos

    The positions of a source code file that is located in the same directory of the JSON file. These source code positions will be highlighted under the technical lineage of a column.

    codebase_files

    This section defines the reference to source code files that are stored in the same directory as the JSON file.

    <source code path>

    The reference to source code that is located in the same directory as the JSON file. This contains mappings of the source codes and their positions.

    mapping_refs

    The mapping of the source code for which you provided the path in <source code path>.

  6. In your configuration file, add the path to the JSON file.

What's next

When you're done configuring the JSON file, you can prepare the lineage harvester configuration file and enter the correct properties to create a technical lineage using the JSON file.