Custom technical lineage JSON file

In the lineage.json file, you can define a basic data object hierarchy, a lineage between two or more data objects and transformations that create the custom technical lineage.

The following sections in the JSON file define different parts in the resulting Collibra technical lineage graph:

  • tree, which defines the data object hierarchy. The data objects are shown as nodes in the technical lineage graph.
  • lineages, which defines the lineage relation. The lineage relations are shown as edges in the technical lineage graph. The edges represent the data flow from a source to a target.
  • codebase_files, which points to transformation definitions in a source code file.

If you want to create a simple custom technical lineage, specify the tree and lineages sections. You can add the transformation code in the lineages section.

If you want to create an advanced custom technical lineage, specify the tree, lineages and codebase_files sections. Add references to transformation code in source code files in the codebase_files section.

Transformation code in both simple and advanced custom technical lineages is displayed at the bottom part of the Collibra technical lineage graph.

Requirements and restrictions

The source code files must be in the same directory as the lineage.json file. Otherwise, an error occurs indicating that the lineage harvester cannot find the source code files.

Sections

Sections

Description

version

The version of the JSON architecture. Specify the value of 1.0, which is the only supported version.

tree

This section contains tree definitions of data objects between which lineages can be defined. The data objects are systems, databases, schemas, tables, views, columns, dashboards and reports.

Each node of a tree contains the name, type and optionally children or leaves properties which form a hierarchy of data objects. You must define a node only once in this section. With the nested tree format, you can reuse the properties of one node for multiple children. For example, you can define a database once and use the children array to define multiple tables in the database.

Tip Usually, the structure you map is the following: system > database > schema > table > column. The system is optional, unless the useCollibraSystemName property is set to true in the lineage harvester configuration file. The Collibra Data Lineage can stitch these data objects to assets in Data Catalog. However, you can also map custom objects, for example dashboards and reports. Custom objects cannot be stitched to assets in Data Catalog.

lineages

This section contains the path from a source to a target and defines the transformation code or transformation references to be processed by the Collibra Data Lineage service.

codebase_files

This optional section defines the reference to source code files. Store the source code files that contain the transformation code in the same directory as the lineage.json file.

Include this section only when you create an advanced custom technical lineage.

tree section properties

Properties

Description
name

The name of your data object. Specify this property with the system name, database name, schema name, table name, view name or column name.

The following rules apply when you specify this property:

  • The names are case sensitive.
  • The names of children and leaves can be identical if the children and leaves with the same names are in different parent nodes.
Note If you do not want to use the system or server name of your data source to match the System asset in Data Catalog, ensure that you do not add the system data object. The useCollibraSystemName property in the lineage harvester configuration file for custom technical lineage is ignored. If you add the system data object in the JSON file, Collibra Data Lineage always uses the full path including the system (system > database > schema > table > column) for stitching, regardless of whether the useCollibraSystemName property is set to true or false.
type

The type of your data object. You can specify one of the following options: system, database, schema, table, view, column, dashboard or report.

Note If you do not want to use the system or server name of your data source to match the System asset in Data Catalog, ensure that you do not add the system data object. The useCollibraSystemName property in the lineage harvester configuration file for custom technical lineage is ignored. If you add the system data object in the JSON file, Collibra Data Lineage always uses the full path including the system (system > database > schema > table > column) for stitching, regardless of whether the useCollibraSystemName property is set to true or false.
children

The sub-objects that have a hierarchical relation to the defined data object.

Each child can contain children properties, except for the penultimate child. The penultimate children property must contain the leaves property. The leaves property cannot contain a children property.

For example, you can use the children property to define a table and use the leaves properties to define columns that have a relation to the table node.

Each child and leave have the name and type properties and the optional catalog_fullname, catalog_domain_id, catalog_asset_type_name and catalog_asset_type_uuid properties.

leaves

The sub-objects of an object that is defined in a children property, but cannot have sub-objects of their own.

A technical lineage is defined as relations between leaf nodes of the tree.

The value of the type property of the leaves property must be column or report. Indirect and table-level technical lineages are not supported. For the workarounds to create a table level or indirect technical lineage, see Programming considerations.

lineage section properties

Properties

Required Description
src_path
Yes

The hierarchical path to the source data object. This data object is defined as a leaf in the tree section.

This property represents where the data comes from for a transformation.

trg_path
Yes

The hierarchical path to the target data object. This data object is defined as a leaf in the tree section.

This property represents where the data flows to.

<data objects>
Yes

An ordered array of data object names. This array is required to define the sub-objects of the src_path and trg_path properties.

Specify the array with the data object names that start from the top of the tree section and finish at a leaf node.

This example shows data objects that can be stitched: system > database > schema > table > column.

This example shows data objects that cannot be stitched: dashboard > report > column.

Note If you do not want to use the system or server name of your data source to match the System asset in Data Catalog, ensure that you do not add the system data object. The useCollibraSystemName property in the lineage harvester configuration file for custom technical lineage is ignored. If you add the system data object in the JSON file, Collibra Data Lineage always uses the full path including the system (system > database > schema > table > column) for stitching, regardless of whether the useCollibraSystemName property is set to true or false.
mapping

Yes

Simple custom technical lineage only

The mapping name. This property specifies a name for the transformation code.

source_code

Yes

Simple custom technical lineage only

The transformation code, which determines how the technical lineage is constructed.

The transformation code can be a descriptive string or a SQL statement that manipulates data.

mapping_ref

No

Advanced custom technical lineage only

This property contains the name of the mapping reference to the transformation code in source code files. This property also contains the position and length of the transformation code to be highlighted in the technical lineage graph.

source_code

No

Advanced custom technical lineage only

The name of the source code file that contains the transformation code. The transformation code can be a SQL statement, code that manipulates data or a descriptive string.

The source code file must be in the same directory as the lineage.json file.

mapping

No

Advanced custom technical lineage only

The unique descriptor of a part of transformation code in a source code file that is in the same directory as the lineage.json file.

A source code file can contain different parts of transformation code that represent different data flows. This property indicates the referenced data flow.

The value of this property is the same as the value of the mapping_refs property in the codebase_files section.

codebase_pos

No

Advanced custom technical lineage only

The positions indicate a string of the transformation code in a source code file to be highlighted in the bottom part of the Collibra technical lineage graph. The whole lines that include the transformation code are highlighted.

The string must be a subset of the string of the transformation code that is defined by the pos_start and pos_len properties of the mapping_refs property in the codebase_files section.

pos_start

No

Advanced custom technical lineage only

The start position of the string of the transformation code to be highlighted. The start position is in characters, not bytes.

The value must be equal to or greater than the value of the pos_start property of the mapping_refs property in the codebase_files section.

pos_len

No

Advanced custom technical lineage only

The length of the string of the transformation code to be highlighted. The length is in characters, not bytes.

Specify a value in the following range:

  • Equal to or greater than 1.
  • Less than or equal to the length of the string that is defined by the pos_len property of the mapping_refs property in the the codebase_files section.

For example, if you specify "pos_start": 10 and "pos_len": 160 in the codebase_files section, specify a value for this property in the range of 0 - 149.

codebase_files section properties

Properties

Description
<source code path>

The file path to source code files that contain the transformation code. The transformation code can be a SQL statement or code that manipulates data.

The source code file must be in the same directory as the lineage.json file.

mapping_refs

The mapping of the transformation code and the position of the transformation code that is shown in the bottom part of the technical lineage graph.

This property defines a string of the transformation code in the source code file to be shown in the technical lineage graph. The string must include the string that is defined by the pos_start and pos_len properties of the mapping property in the lineage section.

<mapping>

The unique descriptor of a part of transformation code in a source code file that is in the same directory as the lineage.json file.

A source code file can contain different parts of transformation code that represent different data flows. This property indicates the referenced data flow.

The value must match the value of the mapping property in the lineage section.

pos_start

The start position of the string of the transformation code. The start position is in characters, not bytes.

Specify a value in the following range:

  • Equal to or greater than 0.
  • Less than or equal to the value of the pos_start property in the mapping property in the lineage section.
pos_len

The length of the string of the transformation code. The length is in characters, not bytes.

Specify a value in the following range:

  • Greater than or equal to 1.
  • Less than or equal to the length of the source code file minus the start position.

For example, if you specify "pos_start": 10 and the file length is 160 characters, specify a value for this property in the range of 1 - 150.

Programming considerations

Currently, there is no native support for indirect and table-level lineages. As a workaround, you can specify "type": "column" and "name": "*" for the leaves property to create a table level or indirect technical lineage. With this specification, the indirect technical lineage is shown as a solid line instead of a dashed line in the Collibra technical lineage graph, and is always shown, regardless of whether or not the Show indirect dependencies option is enable or disabled.

Example

For sample JSON files that define a simple custom technical lineage and an advanced custom technical lineage, see Custom technical lineage JSON file example.