Custom technical lineage JSON file details

This topic describes the properties that you need to include in your JSON files, for both the single-file and batch definition options.

Batch definition
Single-file definition

If you opt for the batch definition option, you need to create a folder with all of your JSON files and specify the folder in your lineage harvester configuration file. The harvester then accesses the folder, zips the content and ingests it for processing.

Which files do you need in your batch folder?

Let's say that you create a folder and name it custom-lineage. In this folder, you need the following:

Exactly one metadata file, to provide the JSON architecture version, the data source type, and asset type UUIDs of the assets you want to include in the technical lineage.
Optionally, one or more asset files, to provide a list of data objects you want to include in the technical lineage and define the data object hierarchy to achieve stitching.
One or more lineage files, to define the lineage relation between two or more data objects.
Optionally, a subfolder of source code files that contain the transformation code.

Example

__CUSTOM-LINEAGE__
    ├── assets-domain1.json
    ├── assets1.json
    ├── lineage.json
    ├── lineage-extra.json
    ├── metadata.json
    └── source_codes
        ├── sc1.sql
        └── sc2.py

Metadata file

Your metadata file has to be named metadata.json. Format the file as shown in the following image:

Example

{
  "version": 3, 
  "application_name": "databricks",
  "asset_types":{
    "Column":{"uuid": "00000000-0000-0000-0000-000000031008"},
    "Table":{"uuid": "00000000-0000-0000-0000-000000031007"},
    "Database":{"uuid": "00000000-0000-0000-0000-000000031006"},
    "Schema":{"uuid": "00000000-0000-0000-0001-000400000002"}
  }
}

Tip

Section Description

Section	Description
version	The version of the JSON architecture. For batch-file instruction, the value must be `3`.
application_name	The type of data source for which you are creating a technical lineage. This helps us to better understand your needs and make more informed decisions concerning future integrations.
asset_types	The asset types and UUIDs of the asset types you want to include in the technical lineage. Important If you choose to include asset files in your batch definition, the values (meaning the asset types) that you specify in this property must match the values that you specify in the `type` properties in your asset files. Likewise, the values that you specify in this property must match the asset types that you mention in your lineage files.

version

The version of the JSON architecture. For batch-file instruction, the value must be 3.

application_name

The type of data source for which you are creating a technical lineage.

This helps us to better understand your needs and make more informed decisions concerning future integrations.

asset_types

The asset types and UUIDs of the asset types you want to include in the technical lineage.

Important If you choose to include asset files in your batch definition, the values (meaning the asset types) that you specify in this property must match the values that you specify in the type properties in your asset files. Likewise, the values that you specify in this property must match the asset types that you mention in your lineage files.

Assets files

Optionally, you can include one or more assets files. You use asset files to provide the list of data objects you want to include in the technical lineage and define the data object hierarchy. The props property allows you to specify the full names and domain IDs of the assets.

Tip

Don't use asset files in the following scenarios:

Your data source consists of the traditional (System) > Database > Schema > Table > Column asset types and hierarchy. In that case, full names are automatically, correctly constructed.
You are working with assets that are not part of that traditional asset hierarchy (in which case, you need to use the props property to achieve stitching) and you define props in one or more lineage files.

The names of your assets files have to follow the format assets<something-unique>.json.

Asset files can consist of nodes, parent, and leaf kinds of assets. In the following example code, we used the nodes property to specify the highest levels of the data object hierarchy that we want to view in the technical lineage: Database and Schema. We then used the parent and leaf properties to build out the lower levels of the data object hierarchy: Table and Column, respectively.

parent assets represent what we traditionally refer to as the table-level lineage. leaf assets represents what we traditionally refer to as the column-level lineage.

Keep in mind that the property names nodes, parent, and leaf are designed to be non-restrictive, so you can define a hierarchy to reflect the hierarchy of any asset types (similar to the database > schema > table > column hierarchy), including your custom asset types.

Tip For examples of how to configure the props property, as shown in the following code examples, see Using the props property.

Tip

View the JSON schema for assets files

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$defs": {
        "assetData": {
            "type": "object",
            "properties": {
                "name": {
                    "type": "string"
                },
                "type": {
                    "type": "string"
                }
            },
            "required": [
                "name",
                "type"
            ]
        },
        "props": {
            "type": ["object", "null"],
            "properties": {
                "fullname": {
                    "type": "string"
                },
                "domain_id": {
                    "type": "string"
                }
            },
            "required": [
                "fullname"
            ]
        }
    },
    "anyOf": [
        {
            "type": "object",
            "properties": {
                "nodes": {
                    "type": "array",
                    "items": {
                        "$ref": "#/$defs/assetData"
                    }
                },
                "props": {
                    "$ref": "#/$defs/props"
                },
                "parent": {
                    "$ref": "#/$defs/assetData"
                },
                "leaf": {
                    "$ref": "#/$defs/assetData"
                }
            },
            "required": [
                "nodes",
                "parent",
                "leaf"
            ]
        },
        {
            "type": "object",
            "properties": {
                "nodes": {
                    "type": "array",
                    "items": {
                        "$ref": "#/$defs/assetData"
                    }
                },
                "props": {
                    "$ref": "#/$defs/props"
                },
                "parent": {
                    "$ref": "#/$defs/assetData"
                }
            },
            "required": [
                "nodes",
                "parent"
            ]
        },
        {
            "type": "object",
            "properties": {
                "nodes": {
                    "type": "array",
                    "items": {
                        "$ref": "#/$defs/assetData"
                    }
                },
                "props": {
                    "$ref": "#/$defs/props"
                }
            },
            "required": [
                "nodes"
            ]
        }
    ]
}

Property	Description
nodes	A JSON element in which you specify the highest levels of the hierarchy. In the example code, the nodes specify the hierarchy of GCS File System > GCS Bucket. Example { "nodes": [ { "name": "GCS1", "type": "GCS File System" }, { "name": "GCS-B1", "type": "GCS Bucket" } ], "props": { "fullname": "<full name of the GCS Bucket asset>", "domain_id": "<domain of the GCS Bucket asset>" } }
name	The name of the node data object. The value is case-sensitive. Case-sensitivity exception The value of the `name` property is not case-sensitive for Database, Schema, Table and Column assets. For those assets, any capitalization discrepancies are rectified during processing, and the names always appear in uppercase in the technical lineage. However, assets files are not needed or recommended for these asset types.
type	The type of data object of the specified node, for example: `System`, `Database`, `Dashboard`, or `Report`. The value is case-sensitive. Important The values (meaning the asset types) that you specify for this property must match the values that you specify in the `asset_types` property in your metadata file.
parent	A lower-level data object in a hierarchy for which the highest levels are specified in the `nodes` section. The `parent` property represents what we traditionally refer to as the table-level lineage. When specifying parent data objects, you also have to include the nodes information, as shown in the following example code. Example { "nodes": [ { "name": "GCS1", "type": "GCS File System" }, { "name": "GCS-B1", "type": "GCS Bucket" } ], "parent": { "name": "DIR1", "type": "Directory" }, "props": { "fullname": "<full name of the Directory asset>", "domain_id": "<domain of the Directory asset>" } } Important If the `useCollibraSystemName` property is set to `false` in your lineage harvester configuration file, do not specify the system data object in this section, or else stitching will fail. Tip Each parent object can contain `leaf` data objects. For example, you can use the `parent` property to specify a table, and use the `leaf` properties to specify the columns in the table.
name	The name of the parent data object. The value is case-sensitive. Case-sensitivity exception The value of the `name` property is not case-sensitive for Database, Schema, Table and Column assets. For those assets, any capitalization discrepancies are rectified during processing, and the names always appear in uppercase in the technical lineage. However, assets files are not needed or recommended for these asset types.
type	The asset type of the parent data object, for example: `Table`, `Directory`, `Dashboard`, or `Report`. The value is case-sensitive. Important The values (meaning the asset types) that you specify for this property must match the values that you specify in the `asset_types` property in your metadata file.
leaf	The lowest level data object in your hierarchy. The `leaf` property represents what we traditionally refer to as the column-level lineage. When specifying leaf data objects, you also have to include the nodes and parent information, as shown in the following example code. The names of parents and leaf data objects can be identical if the data objects with the same names are sub-objects of different `nodes` data objects. Example { "nodes": [ { "name": "GCS1", "type": "GCS File System" }, { "name": "GCS-B1", "type": "GCS Bucket" } ], "parent": { "name": "DIR1", "type": "Directory" }, "leaf": { "name": "data.xls", "type": "File" }, "props": { "fullname": "<full name of the File asset>", "domain_id": "<domain of the File asset>" } }
name	The name of the leaf data object. The value is case-sensitive. Case-sensitivity exception The value of the `name` property is not case-sensitive for Database, Schema, Table and Column assets. For those assets, any capitalization discrepancies are rectified during processing, and the names always appear in uppercase in the technical lineage. However, assets files are not needed or recommended for these asset types.
type	The asset type of the leaf data object, for example: `Column`, `Dashboard`, or `Report`. The value is case-sensitive. Important The values (meaning the asset types) that you specify for this property must match the values that you specify in the `asset_types` property in your metadata file.
props	This property allows you to specify the full name and domain ID of an asset for the purpose of stitching, regardless of asset type hierarchy. When you add the props property to define the full name of an asset, it applies to the last asset in the array. Tip For examples of how to configure the `props` property and how to use it for a custom hierarchy, see Using the props property. Important considerations You don't need to use this property for the traditional (System) > Database > Schema > Table > Column asset types and hierarchy. In fact, assets files are not needed or recommended for those asset types, as the full name is automatically, correctly constructed for that hierarchy. Instead, use this property to specify the full names of assets that are not part of that traditional asset hierarchy. You must specify in your metadata file the asset types and UUIDs of all the assets types used. If the `useCollibraSystemName` property in your lineage harvester configuration file is set to `true`, the system data object is used to stitch to the System asset in Data Catalog. If the `useCollibraSystemName` property is set to `false` in your lineage harvester configuration file, do not specify the system data object in this section, or else stitching will fail. A word about file processing order and inadvertently specifying the same asset more than once Assets files and lineage files are processed in the following order: first, all assets files in alphabetical order, followed by all lineage files in alphabetical order. If you choose to specify `props` for an asset, we recommend that you do so in either an assets file or a lineage file; not both. For any asset that is inadvertently defined more than once, the first occurrence, with respect to the processing order, is the occurrence that is used. In other words: If you inadvertently define a single asset, with `props`, in both an assets file and a lineage file, the `props` values in the assets file are used. If you inadvertently define a single asset, with `props`, more than once in a single assets file, or in multiple assets files, the first occurrence of the asset, with respect to the processing order, is used along with the `props` values defined for that occurrence of the asset.
fullname	The full name of the asset in Collibra. The value is case-sensitive.
domain_id	The reference ID of the domain in which the asset exists in Collibra.

Using the props property

The following examples offer some guidance as to when to use the props property and how to configure it.

Lineage files

You can have one or more lineage files in the folder. The names of your lineage files have to follow the format lineage<something-unique>.json.

You use the lineage file to define the lineage relation between two or more data objects. The lineage relations are shown as edges in the technical lineage graph. The edges represent the data flow from a source to a target.

This section contains the path from a source to a target and defines the transformation code or transformation references to be processed by the Collibra Data Lineage service.

Note If the useCollibraSystemName property in your lineage harvester configuration file is set to true, the system data object is used to stitch to the System asset in Data Catalog. If the useCollibraSystemName property is set to false in your lineage harvester configuration file, do not specify the system data object in these files, or else stitching will fail.

Tip

View the JSON schema for lineage files

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$defs": {
    "assetData": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "type": {
          "type": "string"
        }
      },
      "required": [
        "name",
        "type"
      ]
    },
    "props": {
      "type": ["object", "null"],
      "properties": {
        "fullname": {
          "type": "string"
        },
        "domain_id": {
          "type": "string"
        }
      },
      "required": [
        "fullname"
      ]
    }
  },
  "type": "object",
  "properties": {
    "src": {
      "anyOf": [
        {
          "type": "object",
          "properties": {
            "nodes": {
              "type": "array",
              "items": {
                "$ref": "#/$defs/assetData"
              }
            },
            "parent": {
              "$ref": "#/$defs/assetData"
            },
            "leaf": {
              "$ref": "#/$defs/assetData"
            },
            "props": {
              "$ref": "#/$defs/props"
            }
          },
          "required": [
            "nodes",
            "parent",
            "leaf"
          ]
        },
        {
          "type": "object",
          "properties": {
            "nodes": {
              "type": "array",
              "items": {
                "$ref": "#/$defs/assetData"
              }
            },
            "parent": {
              "$ref": "#/$defs/assetData"
            },
            "props": {
              "$ref": "#/$defs/props"
            }
          },
          "required": [
            "nodes",
            "parent"
          ]
        }
      ]
    },
    "trg": {
      "anyOf": [
        {
          "type": "object",
          "properties": {
            "nodes": {
              "type": "array",
              "items": {
                "$ref": "#/$defs/assetData"
              }
            },
            "parent": {
              "$ref": "#/$defs/assetData"
            },
            "leaf": {
              "$ref": "#/$defs/assetData"
            },
            "props": {
              "$ref": "#/$defs/props"
            }
          },
          "required": [
            "nodes",
            "parent",
            "leaf"
          ]
        },
        {
          "type": "object",
          "properties": {
            "nodes": {
              "type": "array",
              "items": {
                "$ref": "#/$defs/assetData"
              }
            },
            "parent": {
              "$ref": "#/$defs/assetData"
            },
            "props": {
              "$ref": "#/$defs/props"
            }
          },
          "required": [
            "nodes",
            "parent"
          ]
        }
      ]
    },
    "source_code": {
      "type": "object",
      "properties": {
        "path": {
          "type": "string"
        },
        "highlights": {
          "type": ["array", "null"],
          "items": {
            "type": "object",
            "properties": {
              "start": {
                "type": "integer"
              },
              "len": {
                "type": "integer"
              }
            },
            "required": [
              "len",
              "start"
            ]
          }
        }
      },
      "required": [
        "path"
      ]
    }
  },
  "required": [
    "src",
    "trg"
  ]
}

Example

[
  {
    "src": {
      "nodes": [{"name":"DB1", "type": "Database"}, {"name": "SCH1", "type": "Schema"}],
      "parent": {"name": "TB1", "type": "Table"},
      "leaf": {"name": "COL1", "type": "Column"},
      "props": {
	  "fullname": "<full name of the leaf asset>",
	  "domain_id": "<domain of the leaf asset>"
	  },
    },
    "trg": {
      "nodes": [{"name":"DB1", "type": "Database"}, {"name": "SCH1", "type": "Schema"}],
      "parent": {"name": "TB2", "type": "Table"},
      "props": {
	  "fullname": "<full name of the parent asset>",
	  "domain_id": "<domain of the parent asset>"
    },
    "source_code" : {
      "path": "<folder name>/sc1.sql", 
      "highlights": [{"start": 71, "len": 69 }, ...],
      "transformation_display_name": "middle bubble"
    }
  }
 }
]

Properties	Description
src	The hierarchical path to the source data object. This property represents where the data comes from for a transformation. Important The source of a lineage can only be a parent or a leaf. Example { "src": { "nodes": [{"name":"DB1", "type": "Database"}, {"name": "SCH1", "type": "Schema"}], "parent": {"name": "TB1", "type": "Table"}, "leaf": {"name": "COL1", "type": "Column"} }
trg	The hierarchical path to the target data object. This property represents where the data flows to. Important The target can be a parent or a leaf; however, if the source is a parent, the target must be a parent. Tip If the target asset is a parent asset and the source asset is a leaf asset, we refer to the lineage as "indirect lineage". If the target asset is a parent asset and the source asset is a parent asset, we refer to the lineage as "table-level lineage". Example { "trg": { "nodes": [{"name":"DB1", "type": "Database"}, {"name": "SCH1", "type": "Schema"}], "parent": {"name": "TB2", "type": "Table"} }
props	An optional property that allows you to specify the full name and domain of an asset, for the purpose of stitching. This property is not required for Database, Schema, Table and Column asset types. A word about file processing order and inadvertently specifying the same asset more than once Assets files and lineage files are processed in the following order: first, all assets files in alphabetical order, followed by all lineage files in alphabetical order. If you choose to specify `props` for an asset, we recommend that you do so in either an assets file or a lineage file; not both. For any asset that is inadvertently defined more than once, the first occurrence, with respect to the processing order, is the occurrence that is used. In other words: If you inadvertently define a single asset, with `props`, in both an assets file and a lineage file, the `props` values in the assets file are used. If you inadvertently define a single asset, with `props`, more than once in a single assets file, or in multiple assets files, the first occurrence of the asset, with respect to the processing order, is used along with the `props` values defined for that occurrence of the asset.
source_code	The transformation code that determines how the technical lineage is constructed. This can be a descriptive string or a SQL statement that manipulates data. This section is optional.
path	The path and name of the source code file that contains the transformation code. The path relative to the source_codes folder, which is in the same folder as the lineage JSON files.
highlights	This optional property identifies a string of transformation code in a source code file to be highlighted in the source code pane at the bottom part of the technical lineage graph. The entire lines that include the transformation code are highlighted. The string must be a subset of the string of transformation code that is defined by the `start` and `len` properties.
start	The start position of the string of the transformation code to be highlighted. The start position is in characters, not bytes.
len	The length of the string of the transformation code to be highlighted. The length is in characters, not bytes.
transformation_display_name	The name of the transformation when looking at the transformations view in the technical lineage viewer.

Source codes subfolder and files

You can provide a subfolder of source code files that define the transformation details. The source code folder and your JSON files must be in the CUSTOM_LINEAGE folder, along with the JSON files. If it's not, an error occurs indicating that the lineage harvester cannot find the source code files.

The source code paths are relative to the CUSTOM_LINEAGE folder.

Example

source_codes/sc1.sql
source_codes/another-subfolder/sc2.sql

Important Paths must not contain occurrences of ./. The following will fail:

source_codes/./sc1.sql

What happens if I choose not to provide source code files?

If you are using the lineage harvester and there are no source code files to analyze, the batch stats are empty, as shown below. The lineage relations are still created, but because batch stats are directly linked to the source codes, if source code files are not provided, this is expected.

Batch stats:
	Parsing errors: 0
	Analysis errors: 0
	Done: 1

The Done: 1 result is a dummy entry, so that the source appears in the Sources tab page.

Example JSON files

For some example JSON files, go to Custom technical lineage JSON file examples.

If you opt for the single-file definition option, you use a lineage.json file to define the lineage between two or more data objects, and optionally include transformations details to create the custom technical lineage.

The following sections in the JSON file define different parts in the resulting Collibra technical lineage graph:

tree, which defines the data object hierarchy. The data objects are shown as nodes in the technical lineage graph.
lineages, which defines the lineage relation. The lineage relations are shown as edges in the technical lineage graph. The edges represent the data flow from a source to a target.
codebase_files, which points to the source code files that include transformation details.

To create a simple custom technical lineage, you need to include assets and lineages sections in your JSON file. You can add the transformation code in the lineages section.

To create an advanced custom technical lineage, you need to include assets, lineages and codebase_files sections in your JSON file. You add references to the transformation code in source code files in the codebase_files section.

Transformation code in both simple and advanced custom technical lineages is shown in the source code pane at the bottom part of the technical lineage graph.

Requirements and restrictions

The source code files must be in the same directory as the lineage.json file. Otherwise, an error occurs indicating that the lineage harvester cannot find the source code files.

Sections
Sections	Description
version	The version of the JSON architecture. Specify the value of `1.0`, which is the only supported version.
tree	This section contains tree definitions of data objects between which lineages can be defined. The data objects are systems, databases, schemas, tables, views, columns, dashboards and reports. Each node of a tree contains the name, type and optionally children or leaves properties which form a hierarchy of data objects. You must define a node only once in this section. With the nested tree format, you can reuse the properties of one node for multiple children. For example, you can define a database once and use the `children` array to define multiple tables in the database. Tip Usually, the structure you map is the following: system > database > schema > table > column. The system is optional, unless the `useCollibraSystemName` property is set to `true` in your lineage harvester configuration file. Collibra Data Lineage can stitch these data objects to assets in Data Catalog. However, you can also map custom objects, for example dashboards and reports. Custom objects cannot be stitched to assets in Data Catalog. Important If the `useCollibraSystemName` property is set to `false` in your lineage harvester configuration file, do not specify the system data object in this section, or else stitching will fail.
lineages	This section contains the path from a source to a target and defines the transformation code or transformation references to be processed by the Collibra Data Lineage service. Important If the `useCollibraSystemName` property is set to `false` in your lineage harvester configuration file, do not specify the system data object in this section, or else stitching will fail.
codebase_files	This optional section defines the reference to source code files. Store the source code files that contain the transformation code in the same directory as the lineage.json file. Include this section only when you create an advanced custom technical lineage.

tree section properties
Properties	Description
name	The name of your data object. Specify this property with the system name, database name, schema name, table name, view name or column name. The following rules apply when you specify this property: The names are case-sensitive. You cannot, however, have two nodes with the same name, but different case, under the same parent node. The names of children and leaves can be identical if the children and leaves with the same names are in different parent nodes.
type	The type of your data object. You can specify one of the following options: `system`, `database`, `schema`, `table`, `view`, `column`, `dashboard` or `report`. If the `useCollibraSystemName` property in your lineage harvester configuration file is set to `true`, the system data object is used to stitch to the System asset in Data Catalog. If the `useCollibraSystemName` property is set to `false` in your lineage harvester configuration file, do not specify the system data object in this section, or else stitching will fail.
children	The sub-objects that have a hierarchical relation to the defined data object. Each child can contain `children` properties, except for the penultimate child. The penultimate `children` property must contain the `leaves` property. The `leaves` property cannot contain a `children` property. For example, you can use the `children` property to define a table and use the `leaves` properties to define columns that have a relation to the table node. Each child and leave have the `name` and `type` properties and the optional `catalog_fullname`, `catalog_domain_id`, `catalog_asset_type_name` and `catalog_asset_type_uuid` properties.
leaves	The sub-objects of an object that is defined in a `children` property, but cannot have sub-objects of their own. A technical lineage is defined as relations between leaf nodes of the tree. The value of the `type` property of the `leaves` property must be `column` or `report`. Indirect and table-level technical lineages are not supported. For the workarounds to create a table level or indirect technical lineage, see Programming considerations.

lineage section properties
Properties	Required	Description
src_path	Yes	The hierarchical path to the source data object. This data object is defined as a leaf in the `tree` section. This property represents where the data comes from for a transformation.
trg_path	Yes	The hierarchical path to the target data object. This data object is defined as a leaf in the `tree` section. This property represents where the data flows to.
<data objects>	Yes	An ordered array of data object names. This array is required to define the sub-objects of the `src_path` and `trg_path` properties. Specify the array with the data object names that start from the top of the `tree` section and finish at a leaf node. This example shows data objects that can be stitched: system > database > schema > table > column. This example shows data objects that cannot be stitched: dashboard > report > column. If the `useCollibraSystemName` property in your lineage harvester configuration file is set to `true`, the system data object is used to stitch to the System asset in Data Catalog. If the `useCollibraSystemName` property is set to `false` in your lineage harvester configuration file, do not specify the system data object in this section, or else stitching will fail.
mapping	Yes Simple custom technical lineage only	The mapping name. This property specifies a name for the transformation code.
source_code	Yes Simple custom technical lineage only	The transformation code, which determines how the technical lineage is constructed. The transformation code can be a descriptive string or a SQL statement that manipulates data.
mapping_ref	No Advanced custom technical lineage only	This property contains the name of the mapping reference to the transformation code in source code files. This property also contains the position and length of the transformation code to be highlighted in the technical lineage graph.
source_code	No Advanced custom technical lineage only	The name of the source code file that contains the transformation code. The transformation code can be a SQL statement, code that manipulates data or a descriptive string. The source code file must be in the same folder as the lineage.json file.
mapping	No Advanced custom technical lineage only	The unique descriptor of a part of transformation code in a source code file that is in the same directory as the lineage.json file. A source code file can contain different parts of transformation code that represent different data flows. This property indicates the referenced data flow. The value of this property is the same as the value of the `mapping_refs` property in the `codebase_files` section.
codebase_pos	No Advanced custom technical lineage only	The positions indicate a string of the transformation code in a source code file to be highlighted in the bottom part of the Collibra technical lineage graph. The whole lines that include the transformation code are highlighted. The string must be a subset of the string of the transformation code that is defined by the `pos_start` and `pos_len` properties of the `mapping_refs` property in the `codebase_files` section.
pos_start	No Advanced custom technical lineage only	The start position of the string of the transformation code to be highlighted. The start position is in characters, not bytes. The value must be equal to or greater than the value of the `pos_start` property of the `mapping_refs` property in the `codebase_files` section.
pos_len	No Advanced custom technical lineage only	The length of the string of the transformation code to be highlighted. The length is in characters, not bytes. Specify a value in the following range: Equal to or greater than 1. Less than or equal to the length of the string that is defined by the `pos_len` property of the `mapping_refs` property in the the `codebase_files` section. For example, if you specify `"pos_start": 10` and `"pos_len": 160` in the `codebase_files` section, specify a value for this property in the range of 0 - 149.

codebase_files section properties
Properties	Description
<source code path>	The file path to source code files that contain the transformation code. The transformation code can be a SQL statement or code that manipulates data. The source code file must be in the same directory as the lineage.json file.
mapping_refs	The mapping of the transformation code and the position of the transformation code that is shown in the bottom part of the technical lineage graph. This property defines a string of the transformation code in the source code file to be shown in the technical lineage graph. The string must include the string that is defined by the `pos_start` and `pos_len` properties of the `mapping` property in the `lineage` section.
<mapping>	The unique descriptor of a part of transformation code in a source code file that is in the same directory as the lineage.json file. A source code file can contain different parts of transformation code that represent different data flows. This property indicates the referenced data flow. The value must match the value of the `mapping` property in the `lineage` section.
pos_start	The start position of the string of the transformation code. The start position is in characters, not bytes. Specify a value in the following range: Equal to or greater than 0. Less than or equal to the value of the `pos_start` property in the `mapping` property in the `lineage` section.
pos_len	The length of the string of the transformation code. The length is in characters, not bytes. Specify a value in the following range: Greater than or equal to 1. Less than or equal to the length of the source code file minus the start position. For example, if you specify `"pos_start": 10` and the file length is 160 characters, specify a value for this property in the range of 1 - 150.

Programming considerations

Currently, there is no native support for indirect and table-level lineages. As a workaround, you can specify "type": "column" and "name": "*" for the leaves property to create a table level or indirect technical lineage. With this specification, the indirect technical lineage is shown as a solid line instead of a dashed line in the Collibra technical lineage graph, and is always shown, regardless of whether or not the Show indirect dependencies option is enable or disabled.

Example

For some example JSON files, go to Custom technical lineage JSON file examples.

Custom technical lineage JSON file details

Which files do you need in your batch folder?

Metadata file

Assets files

Important considerations

A word about file processing order and inadvertently specifying the same asset more than once

Using the props property

Via API

Via Data Catalog

Lineage files

A word about file processing order and inadvertently specifying the same asset more than once

Source codes subfolder and files

What happens if I choose not to provide source code files?

Example JSON files

Requirements and restrictions

Sections

tree section properties

lineage section properties

codebase_files section properties

Programming considerations

Example