Custom technical lineage JSON file
If you want to create a custom technical lineage, you must create the custom technical lineage JSON file and store the source files in the folder that you created when you created the Shared Storage connection.
In the lineage.json file, you can define a basic data object hierarchy, a lineage between two or more data objects and transformations that create the custom technical lineage.
The following sections in the JSON file define different parts in the resulting Collibra technical lineage graph:
tree, which defines the data object hierarchy. The data objects are shown as nodes in the technical lineage graph.lineages, which defines the lineage relation. The lineage relations are shown as edges in the technical lineage graph. The edges represent the data flow from a source to a target.codebase_files, which points to transformation definitions in a source code file.
If you want to create a simple custom technical lineage, specify the tree and lineages sections. You can add the transformation code in the lineages section.
If you want to create an advanced custom technical lineage, specify the tree, lineages and codebase_files sections. Add references to transformation code in source code files in the codebase_files section.
Transformation code in both simple and advanced custom technical lineages is displayed at the bottom part of the Collibra technical lineage graph.
Requirements and restrictions
Store the lineage.json file in the folder that you created when you created the Shared Storage connection. The source code files must be in the same directory as the lineage.json file. Otherwise, an error occurs indicating that the source code files cannot be found.
|
Sections |
Description |
|---|---|
|
version |
The version of the JSON architecture. Specify the value of |
|
This section contains tree definitions of data objects between which lineages can be defined. The data objects are systems, databases, schemas, tables, views, columns, dashboards and reports. Each node of a tree contains the name, type and optionally children or leaves properties which form a hierarchy of data objects. You must define a node only once in this section. With the nested tree format, you can reuse the properties of one node for multiple children. For example, you can define a database once and use the Tip Usually, the structure you map is the following: system > database > schema > table > column. The system is optional, unless the |
|
| lineages |
This section contains the path from a source to a target and defines the transformation code or transformation references to be processed by the Collibra Data Lineage service. |
|
codebase_files |
This optional section defines the reference to source code files. Store the source code files in the Shared Storage connection folder. Include this section only when you create an advanced custom technical lineage. |
|
Properties |
Description |
|---|---|
|
name
|
The name of your data object. Specify this property with the system name, database name, schema name, table name, view name or column name. The following rules apply when you specify this property:
Note If you do not want to use the system or server name of your data source to match the System asset in Data Catalog, ensure that you do not add the system data object. The the Collibra system name setting for custom technical lineage is ignored. If you add the system data object in the JSON file, Collibra Data Lineage always uses the full path including the system (system > database > schema > table > column) for stitching, regardless of whether Collibra system name is set to
|
|
type
|
The type of your data object. You can specify one of the following options: Note If you do not want to use the system or server name of your data source to match the System asset in Data Catalog, ensure that you do not add the system data object. The the Collibra system name setting for custom technical lineage is ignored. If you add the system data object in the JSON file, Collibra Data Lineage always uses the full path including the system (system > database > schema > table > column) for stitching, regardless of whether Collibra system name is set to
|
|
children
|
The sub-objects that have a hierarchical relation to the defined data object. Each child can contain For example, you can use the Each child and leave have the |
|
leaves
|
The sub-objects of an object that is defined in a A technical lineage is defined as relations between leaf nodes of the tree. The value of the |
|
Properties |
Required | Description |
|---|---|---|
|
src_path
|
Yes |
The hierarchical path to the source data object. This data object is defined as a leaf in the This property represents where the data comes from for a transformation. |
|
trg_path
|
Yes |
The hierarchical path to the target data object. This data object is defined as a leaf in the This property represents where the data flows to. |
|
<data objects>
|
Yes |
An ordered array of data object names. This array is required to define the sub-objects of the Specify the array with the data object names that start from the top of the This example shows data objects that can be stitched: system > database > schema > table > column. This example shows data objects that cannot be stitched: dashboard > report > column. Note If you do not want to use the system or server name of your data source to match the System asset in Data Catalog, ensure that you do not add the system data object. The the Collibra system name setting for custom technical lineage is ignored. If you add the system data object in the JSON file, Collibra Data Lineage always uses the full path including the system (system > database > schema > table > column) for stitching, regardless of whether Collibra system name is set to
|
|
mapping
|
Yes Simple custom technical lineage only |
The mapping name. This property specifies a name for the transformation code. |
|
source_code
|
Yes Simple custom technical lineage only |
The transformation code, which determines how the technical lineage is constructed. The transformation code can be a descriptive string or a SQL statement that manipulates data. |
|
mapping_ref
|
No Advanced custom technical lineage only |
This property contains the name of the mapping reference to the transformation code in source code files. This property also contains the position and length of the transformation code to be highlighted in the technical lineage graph. |
|
source_code
|
No Advanced custom technical lineage only |
The name of the source code file that contains the transformation code. The transformation code can be a SQL statement, code that manipulates data or a descriptive string.
|
|
mapping
|
No Advanced custom technical lineage only |
The unique descriptor of a part of transformation code in a source code file that is in the Shared Storage connection folder. A source code file can contain different parts of transformation code that represent different data flows. This property indicates the referenced data flow. The value of this property is the same as the value of the |
|
codebase_pos
|
No Advanced custom technical lineage only |
The positions indicate a string of the transformation code in a source code file to be highlighted in the bottom part of the Collibra technical lineage graph. The whole lines that include the transformation code are highlighted. The string must be a subset of the string of the transformation code that is defined by the |
|
pos_start
|
No Advanced custom technical lineage only |
The start position of the string of the transformation code to be highlighted. The start position is in characters, not bytes. The value must be equal to or greater than the value of the |
|
pos_len
|
No Advanced custom technical lineage only |
The length of the string of the transformation code to be highlighted. The length is in characters, not bytes. Specify a value in the following range:
For example, if you specify |
|
Properties |
Description |
|---|---|
|
<source code path>
|
The file path to source code files that contain the transformation code. The transformation code can be a SQL statement or code that manipulates data.
|
|
The mapping of the transformation code and the position of the transformation code that is shown in the bottom part of the technical lineage graph. This property defines a string of the transformation code in the source code file to be shown in the technical lineage graph. The string must include the string that is defined by the |
|
|
<mapping>
|
The unique descriptor of a part of transformation code in a source code file that is in the Shared Storage connection folder. A source code file can contain different parts of transformation code that represent different data flows. This property indicates the referenced data flow. The value must match the value of the |
|
pos_start
|
The start position of the string of the transformation code. The start position is in characters, not bytes. Specify a value in the following range:
|
|
pos_len
|
The length of the string of the transformation code. The length is in characters, not bytes. Specify a value in the following range:
For example, if you specify |
Programming considerations
Currently, there is no native support for indirect and table-level lineages. As a workaround, you can specify "type": "column" and "name": "*" for the leaves property to create a table level or indirect technical lineage. With this specification, the indirect technical lineage is shown as a solid line instead of a dashed line in the Collibra technical lineage graph, and is always shown, regardless of whether or not the Show indirect dependencies option is enable or disabled.
Example
For sample custom technical lineage definitions that define a simple custom technical lineage and an advanced custom technical lineage, go to Custom technical lineage JSON file examples.