Create a JSON file with a predefined technical lineage

You can create a custom technical lineage for data sources that are not supported. You need to create a JSON file with a predefined lineage. You can create two types of predefined technical lineage:

Note You can only create a local folder with one JSON file. However, you can add other files in the harvested directory and subdirectories to which you can refer in the JSON file.

Prerequisites

  • You have downloaded the lineage harvester and you have the necessary system requirements to run it.
  • You have the necessary permissions to all database objects that the lineage harvester accesses.
  • You have prepared the physical data layer in Data Catalog.
    Note To stitch the data objects of data sources mentioned in the JSON file with Data Catalog assets, you first have to register those data sources in Data Catalog and you have to use a structure that matches the structure of ingested assets in Data Catalog.

Create a JSON file with simple predefined technical lineages

  1. Create a local folder.
  2. Configure the predefined technical lineage:
    1. Create a new JSON file in the local folder.
    2. Name the JSON file lineage.json.
    3. Add the following mandatory sections to the JSON file:

      Properties

      Description

      version

      The version of the JSON architecture.

      Note Currently, you can only use version 1.0.

      tree

      This section contains tree definitions of data objects between which lineages can be defined. Each node of a tree contains the name, type and optionally children or leaves properties which form a hierarchy of data objects. You can reuse the same properties in one node to map all data objects in the hierarchy.

      Tip Usually, the structure you map is the following: system > database > schema > table > column. The system is optional, unless the useCollibraSystemName property is set to true in the lineage harvester configuration file. The Collibra Data Lineage can stitch these data objects to assets in Data Catalog. However, you can also map custom objects, for example dashboards and reports. Custom objects cannot be stitched to assets in Data Catalog.

      name

      The name of your data object. This is the system name, database name, schema name, table name or column name.

      Warning Consider the following:

      • The names are case sensitive.
      • The names of data objects of the same type must be unique.

      type

      The type of your data object. For example: system, database, schema, table or column.

      children

      The sub-objects that have a hierarchical relation to the defined data object. Each child also has the name and type properties and can have children of its own, except for the penultimate child which has leaves instead of children. Leaves are children without children.

      Note Use the children property to define sub-objects, but use the leaves property if the object is on the penultimate level. For example, to define columns that have a relation to a table node.

      leaves

      The sub-objects of another sub-object that is defined in a children property, but cannot have sub-objects of their own.

      Note Technical lineage only shows relations between leaf nodes of the tree. Leaves are usually columns that have a relation to a table node in the tree structure.

      lineages

      This section contains the path from a source to a target and defines the mappings and transformations that should be processed by the Collibra Data Lineage server.

      Note If you create a lineage between data objects that are also assets in Data Catalog, the Collibra Data Lineage server automatically stitches the data objects to the assets in Data Catalog. However, you can also create a lineage between custom data objects that are not assets in Data Catalog, for example reports and dashboards.

      src_path

      The hierarchical path to the source data object. This data object is shown as a leaf in the tree node.

      <data objects>

      All data object names in the hierarchical path to the source leaf.

      Example of data objects that can be stitched: system > database > schema > table > column.

      Example of data objects that cannot be stitched: dashboard > report > column.

      trg_path

      The hierarchical path to the target data object. This data object is shown as a leaf in the tree node.

      <data objects>

      All data object names in the hierarchical path to the target leaf.

      Example of data objects that can be stitched: system > database > schema > table > column.

      Example of data objects that cannot be stitched: dashboard > report > column.

      mapping

      The mapping name. This refers to the queries used in the technical lineage.

      source_code

      The transformation code. This determines how the technical lineage is constructed.

      Tip The source code can be a SQL statement or code that manipulates data.

  3. Add the path to the JSON file to the configuration file.

Create a JSON file with an advanced predefined technical lineage

  1. Create a local folder.
  2. Configure the predefined technical lineage:
    1. Create a new JSON file in the local folder.
    2. Name the JSON file lineage.json.
    3. Store all source codes to which you want to reference in the JSON file in the same local folder.
    4. Add the following sections to the JSON file:

      Properties

      Description

      version

      The version of the JSON architecture.

      Note Currently, you can only use version 1.0.

      tree

      This section contains tree definitions of data objects between which lineages can be defined. Each node of a tree contains the name, type and optionally children or leaves properties which form a hierarchy of data objects. You can reuse the same properties in one node to map all data objects in the hierarchy.

      Tip Usually, the structure you map is the following: system > database > schema > table > column. The system is optional, unless the useCollibraSystemName property is set to true in the lineage harvester configuration file. The Collibra Data Lineage can stitch these data objects to assets in Data Catalog. However, you can also map custom objects, for example dashboards and reports. Custom objects cannot be stitched to assets in Data Catalog.

      name

      The name of your data object. This is the system name, database name, schema name, table name or column name.

      Warning Consider the following:

      • The names are case sensitive.
      • The names of data objects of the same type must be unique.

      type

      The type of your data object. For example: system, database, schema, table or column.

      children

      The sub-objects that have a hierarchical relation to the defined data object. Each child also has the name and type properties and can have children of its own, except for the penultimate child which has leaves instead of children. Leaves are children without children.

      Note Use the children property to define sub-objects, but use the leaves property if the object is on the penultimate level. For example, to define columns that have a relation to a table node.

      leaves

      The sub-objects of another sub-object that is defined in a children property, but cannot have sub-objects of their own.

      Note Technical lineage only shows relations between leaf nodes of the tree. Leaves are usually columns that have a relation to a table node in the tree structure.

      lineages

      This section contains the path from a source to a target and defines the mappings and transformations that should be processed by the Collibra Data Lineage server.

      Note If you create a lineage between data objects that are also assets in Data Catalog, the Collibra Data Lineage server automatically stitches the data objects to the assets in Data Catalog. However, you can also create a lineage between custom data objects that are not assets in Data Catalog, for example reports and dashboards.

      src_path

      The hierarchical path to the source data object. This data object is shown as a leaf in the tree node.

      <data objects>

      All data object names in the hierarchical path to the source leaf.

      Example of data objects that can be stitched: system > database > schema > table > column.

      Example of data objects that cannot be stitched: dashboard > report > column.

      trg_path

      The hierarchical path to the target data object. This data object is shown as a leaf in the tree node.

      <data objects>

      All data object names in the hierarchical path to the target leaf.

      Example of data objects that can be stitched: system > database > schema > table > column.

      Example of data objects that cannot be stitched: dashboard > report > column.

      mapping_ref

      The mapping of the source codes that are located in the same directory of the JSON file and their positions in the technical lineage.

      Note The positions are zero based. The first character in a sequence has position 0.

      source_code

      The source code for which you provide a path in the codebase_files node.

      Tip The source code can be a SQL statement or code that manipulates data.

      mapping

      The mapping name of the mapping defined in the codebase_files node.

      codebase_pos

      The positions of a source code file that is located in the same directory of the JSON file. These source code positions will be highlighted under the technical lineage of a column.

      codebase_files

      This section defines the reference to source code files that are stored in the same directory as the JSON file.

      <source code path>

      The reference to source code that is located in the same directory as the JSON file. This contains mappings of the source codes and their positions.

      mapping_refs

      The mapping of the source code for which you provided the path in <source code path>.

  3. Add the JSON connection information to the configuration file.

What's next

After you prepared the JSON file with a predefined technical lineage, you can prepare the lineage harvester configuration file and enter the correct properties to create a technical lineage using the JSON file.