The technical lineage graph

The technical lineage graph consists of nodes and edges. Each node represents a corresponding object in a data source. Each edge shows a relation between nodes.

Nodes and edges in the technical lineage graph show how data flows from source to destination. Understanding the nodes and edges better, enriches your technical lineage experience.

Consider the following visual elements in the technical lineage graph:

Relation types

The technical lineage graph shows relations between columns in the graph. The Collibra Data Lineage creates and shows the following relation type between stitched assets and other data objects:

Head

Role

Co-role

Tail

ID

Data Element

targets

sources

Data Element

00000000-0000-0000-0000-000000007069

Messages

The technical lineage graph might show different messages to alert you. The following messages are the most common:

Message

Description

No object found, try using a wildcard %

When a data object name was entered in the search field on the Browse tab pane, this message is shown if the data object does not exist or a system name was entered.

The following rules apply when you search for a data object:

  • Use the percent sign (%) wildcard character if needed.
  • Enter a database, schema, table or column name.
  • Do not enter a system name.

Nodes count exceeds the limit 350.

Edges count exceeds the limit 1,000.

The technical lineage graph exceeds the limit of 350 nodes or 1,000 edges and is too large to display. This happens, for example, if you have a table with many columns and you try to show the technical lineage of all columns in a table in one graph.

Note You cannot manually change this limit.

Depth was auto-adjusted to <number>. Graph was too large to display at once.

The technical lineage graph exceeds the edge limit, which results in the automatic adjustment of the flow depth for the graph. The adjusted depth value is determined by the number of the edges that exceed the maximum edge limit.

When the flow depth is automatically adjusted to a lower value than the actual graph size, you can find the icon in the technical lineage graph. To view the truncated lineage, right click the innermost node, and select Table lineage from the menu. The lineage information of the selected table is displayed.

The current asset doesn't have a technical lineage yet.

This message is shown if you didn't create a technical lineage for the data source of the asset.

Use the Browse tab pane to navigate through the data object for which a technical lineage graph is available.

Technical lineage cannot be shown.

The technical lineage graph cannot be shown, because there are too many data objects. This happens, for example, when you created a technical lineage for multiple data source and you click All data objects in the Browse tab pane.

Use the Browse tab pane to view specific parts of the technical lineage graph or click the suggested data objects to see their graph.

Colors

The technical lineage graph shows different colors to indicate which data objects are stitched to assets in Data Catalog and which are not.

Background colors

The background color of a node indicates whether or not the data object was stitched to an asset in Data Catalog, and whether something went wrong.

A node has one of three background colors:

Color

Description

Yellow

Data objects from your data source that are stitched to assets in Data Catalog

Gray

Data objects, for example temporary tables and columns, that Collibra Data Lineage collects from your data sources, but are not stitched to assets in Data Catalog.

Note Collibra Data Lineage does not support stitching for Looker assets.

Red

Attributes that are automatically assigned to a data object, because of missing DDL statements. If you want to remove objects with a red background, change the statements and rerun the lineage harvester or synchronize the technical lineage again if you use technical lineage via Edge.

Since a technical lineage shows how data flows from source to destination, it is possible to see a lineage graph with both yellow, red and gray nodes.

Example The following technical lineage graph shows two nodes with a gray background and three nodes with a yellow background. Node 1 and 4 contain data objects that are not stitched to assets in Data Catalog while nodes 2, 3 and 5 contain existing assets in Data Catalog that were stitched to the corresponding data objects when you created the technical lineage.

Font colors

The font color of data objects in the technical lineage graph indicates whether or not there is a relation between this data object and one or more other data objects.

A node has one of two font colors:

Color

Description

Black

At least one direct or indirect relation exists between the data object and another.

Tip When a column flows from one table to another, the lineage reflects the direct dependency between the column in the source table and the column in the target table. This is considered a direct lineage. An indirect lineage, on the other hand, shows indirect dependencies. For example, if a JOIN clause is used in a query, the columns in the resulting view are generated by the JOIN clause; in other words, by an indirect dependency, not an actual flow of data.

Gray

No relation exists between the data object and another.

Example The following technical lineage graph shows three nodes. The node 1 contains data objects that have no incoming or outgoing edges to other data objects in the technical lineage. Nodes 2 and 3 only contain data objects that have a relation to other data objects in the technical lineage.

Icons

Collibra uses various icons in the technical lineage graph.

Icon

Description

The name of a table was found by the full-text search in the source code on which the analysis failed. Consequently, the lineage flow of the table is probably incomplete.

If you click Show failed SQLs on the right click menu of the table, the failed SQL queries appear in the source code pane at the bottom of the page.

The lineage is cyclic, for example A → B → C → A. It only appears if you enabled the only ending points option in the Settings tab pane.

A relation for the data objects exists, but it isn't shown, for example because you set the technical lineage flow depth to a lower value than the actual graph size.

Example The following Technical lineage graph shows two nodes. The first node has an icon to indicate that the lineage flow you currently see is probably incomplete. The second node has three data objects that have a relation to other data objects, but the edges that represent that relation are not shown.

Arrows

Arrows are incoming or outgoing edges that show how the data flows from source to destination. They represent relations of the type "Data Element sources / targets Data Element".

There are two ways in which an arrow can be shown:

Arrow type

Description

Single

Shows the full lineage without skipping certain data objects.

Double

Shows that there are hidden data objects in the technical lineage graph. This happens when only the endpoints of the technical lineage flow are shown.

Example The following Technical lineage graph shows three nodes. Edges with double arrows are shown between node 1 and 3. These edges indicate that there are other nodes between these nodes in the full technical lineage flow. Node 2 has outgoing edges with single arrows. These edges indicate that there is a direct relation between node 2 and 3.

Collapsed attributes menu

If you select a specific column in a table with multiple columns, you can click Collapsed attributes [menu] to show all columns, collapse all columns or only show selected columns in the same table.

Right-click menu

If you right-click a node, you can perform several specific actions on that node.

Functionality

Description

Column/Table lineage Switch to the technical lineage graph of the selected column or table.
Transformation (IN)

Show the transformation logic of the incoming source code fragments in the source code pane.

Transformation (OUT)

Show the transformation logic of the outgoing source code fragments in the source code pane.

Lineage tree

Show an alternative way to view the flow of data objects, called the lineage tree. The lineage tree is particularly useful if there are many nodes in a lineage. It enables you to see the entire lineage in one pop-up, which means you no longer have to scroll through the technical lineage graph to see the full lineage.

The lineage tree uses arrows to visualize the traceability of data objects:

  • Green arrows represent outgoing edges.
  • Black arrows represent incoming edges.

Custom features

When the lineage flow of the table is incomplete or there is an issue in the source code of a data object, the right-click menu shows the Show failed SQLs option. If you click this option, the source code pane opens and shows the SQL queries that failed.

SELECT statements that result in "-RES" tables in the lineage

If you have SQL SELECT statements like the following, the results are not put into a table because they are not used in a DDL or DML query, such as INSERT or CREATE VIEW AS.

SELECT username, email
FROM dbo.users

In such cases, Collibra Data Lineage creates a dummy table, so that a complete lineage can be achieved. The dummy table has the name of the SQL file, and is appended with "-RES", as follows: "<filename>.SQL-RES".

To avoid the need for Collibra Data Lineage to create a dummy, you can add an INSERT or CREATE VIEW statement before your SELECT statement, for example:

CREATE VIEW user_info AS
SELECT username, email
FROM dbo.users

The resulting lineage is as follows: