Custom technical lineage JSON file examples

In this section, we provide three detailed examples of how to configure your metadata, assets and lineage JSON files for the batch definition option.

Example 1 is the least complex. It shows a configuration for working with data sources that conform to the traditional (System) > Database > Schema > Table > Column hierarchy. Examples 2 and 3 are more complex examples.

Helpful considerations to keep in mind

  • Don't use asset files in the following scenarios:

    • Your data source consists of the traditional (System) > Database > Schema > Table > Column asset types and hierarchy. In that case, full names are automatically, correctly constructed.
    • You are working with assets that are not part of that traditional asset hierarchy (in which case, you need to use the props property to achieve stitching) and you define props in one or more lineage files.
  • Don't use the props property for the traditional (System) > Database > Schema > Table > Column hierarchy.

Examples

This section shows some example lineage.json files for simple custom technical lineage and advanced custom technical lineage.

Each example can be used to generate technical lineage graphs in Collibra to represent the IOT_JSON and IOT_DEVICES_PER_COUNTRY tables with the following columns:

IOT_JSON

IOT_DEVICES_PER_COUNTRY

CCA3

COUNTRY

DEVICE_ID

NUMBER_DEVICES

Example JSON file for a simple custom technical lineage

In the following example, the tree section defines the IOT_JSON and IOT_DEVICES_PER_COUNTRY tables and columns. The tables are in a schema named COLLIBRA. The COLLIBRA schema is in a database named COLLIBRA and a system named Databricks.

Important If you define the System asset in your lineage.json file, the useCollibraSystemName property in your lineage harvester configuration file must be set to true; otherwise, relations will not be created between the relevant assets in Collibra and stitching will fail.

To show the transformation code at the bottom of the technical lineage graph, specify the mapping and source_code properties in the lineages section.

{ 
  "version": "1.0",
  "tree": [
	{ 
	    "name": "Databricks", 
           "type": "system",
	    "children": [
	       { 
		   "name": "COLLIBRA", 
		   "type": "database",
		   "children": [
       	      { 
	                  "name": "COLLIBRA", 
	                  "type": "schema",
	                  "children": [
		             { 
		                 "name": "IOT_JSON", 
		                 "type": "table",
		                 "leaves": [
		                    { 
			                "name": "CCA3", 
			                "type": "column"
			            },
			            { 
			                "name": "DEVICE_ID", 
			                "type": "column"
			            }
			         ]
		             },
		             { 
		                 "name": "IOT_DEVICES_PER_COUNTRY",
			         "type": "table",
			         "leaves": [
			            { 
			                 "name": "COUNTRY", 
			                 "type": "column"
			            },
			            { 
			                "name": "NUMBER_DEVICES",  
			                "type": "column"
			            }
			        ] 
	                    }
		        ]
		    }
	          ]
	       }
           ]
       } 
  ],
  "lineages": [
	 {
         "src_path": [
	     {
	         "system": "Databricks"
	     },
	     {
	         "database": "COLLIBRA"
            },
	     {
	         "schema": "COLLIBRA"
	     },
	     {
	         "table": "IOT_JSON"
	     },
	     {
	         "column": "CCA3"
	     }
	  ],
	  "trg_path": [
	     {
	         "system": "Databricks"
	     },
	     {
	         "database": "COLLIBRA"
	     },
	     {
	         "schema": "COLLIBRA"
	     },
	     {
	         "table": "IOT_DEVICES_PER_COUNTRY"
	     },
	     {
	         "column": "COUNTRY"
	     }
	  ],
	  "mapping": "dev_no_bat_per_country_view",
	  "source_code": "INSERT INTO ... SELECT CCA3 AS COUNTRY...FROM IOT_JSON"
 	 }
  ]
}

Example JSON file for an advanced custom technical lineage

In the following example, the tree section defines the IOT_JSON and IOT_DEVICES_PER_COUNTRY tables and columns. The tables are in a schema named COLLIBRA. The COLLIBRA schema is in a database named COLLIBRA and a system named Databricks.If you define the System asset in your lineage.json file, the useCollibraSystemName property in your lineage harvester configuration file must be set to true; otherwise, relations will not be created between the relevant assets in Collibra and stitching will fail.

{
  "version": "1.0",
  "tree": [
     { 
         "name": "Databricks", 
	  "type": "system",
	  "children": [
	     { 
	         "name": "COLLIBRA", 
	         "type": "database",
	         "children": [
                   { 
	                "name": "COLLIBRA", 
	                "type": "schema",
	                "children": [
	                   {
		               "name": "IOT_JSON",
		               "type": "table",
		               "leaves": [
		                  { 
		                      "name": "CCA3", 
			              "type": "column"
			          },
			          { 
			              "name": "DEVICE_ID", 
			              "type": "column"
			          }
			       ] 
			   },
			   { 
			       "name": "IOT_DEVICES_PER_COUNTRY", 
			       "type": "table",
			       "leaves": [
			          { 
			              "name": "COUNTRY",
			              "type": "column"
			          },
			          { 
			              "name": "NUMBER_DEVICES", 
			              "type": "column"
			          }
		              ] 
                         }
                     ]
                  }
               ] 
            }
         ] 
      }
  ],
  "lineages": [
     {
         "src_path": [
	     {
                "system": "Databricks"
            },
	     {
	         "database": "COLLIBRA"
	     },
	     {
	         "schema": "COLLIBRA"
	     },
	     {
	         "table": "IOT_JSON"
	     },
	     {
	         "column": "CCA3"
	     }
	  ],
	  "trg_path": [
	     {
	         "system": "Databricks"
	     },
	     {
	         "database": "COLLIBRA"
	     },
	     {
	         "schema": "COLLIBRA"
	     },
	     {
	         "table": "IOT_DEVICES_PER_COUNTRY"
	     },
	     {
	         "column": "COUNTRY"
	     }
	 ],
	 "mapping_ref": 
	    {
	        "source_code": "transforms.sql",
	        "mapping": "dev_no_bat_per_country_view",
	        "codebase_pos": [
	           { 
	              "pos_start": 71, "pos_len": 69
	           } 
               ]
           } 
      }
  ],
  "codebase_files": 
    {
       "transforms.sql": 
	   {
	       "mapping_refs": 
	          {
	              "dev_no_bat_per_country_view": 
	          {
	              "pos_start": 0,
	              "pos_len": 246
	          }
	       }
	   }
    }
  }

Example technical lineage graphs

Both example lineage.json files generate the following technical lineage graph, which contains 2 nodes and 1 edge.

The following technical lineage graph is generated by using the example lineage.json file for an advanced custom technical lineage. The bottom part shows the transformation code that generated the data flow.

In the lineages section, the pos_start property is specified with 71 and the pos_len property is specified with 69. The specifications indicate that the transformation code that starts at position 71 and the following 69 characters are highlighted in blue. Line 2 in the technical lineage graph contains the highlighted transformation code.