Prepare the SQL directory

To create technical lineage for JDBC data sources by using the Shared Storage Connection, you must provide SQL files that include your SQL queries. Collibra Data Lineage processes the metadata based on your queries to create the technical lineage.

For more information about the connection types for different data sources, go to Supported data sources for technical lineage.

Before you begin

Create a Shared Storage Connection.

Steps

  1. Create your SQL files. Ensure that the following requirements are met for the SQL files:

    • The SQL files must be UTF-8 encoded.
    • For better ingestion, include one SQL statement in one SQL file.
    • Collibra Data Lineage processes the SQL files in alphabetical order. The SQL files that include the Data Definition Language (DDL) statements must be processed before the SQL files that include the Data Manipulation Language (DML) statements. To ensure this order, name the SQL files such that those containing DDL statements come before those containing DML statements alphabetically.
    • The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the Database and Schema fields in the technical lineage for SqlDirectory capability. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the Database and Schema fields in the capability for stitching. For more information, go to Add a technical lineage capability to an Edge site and Automatic stitching for technical lineage.

    For more information, go to Supported SQL syntax.

  2. Store the SQL files in the folder that you created when you created the Shared Storage connection.

Example 1 SQL statements do not include schema and database names

This example shows the SQL files that include the queries on the Persons and JobInformation tables and the JobTitleView view. The SQL statements don't contain the database and schema values, so technical lineage via Edge uses the values of the Database and Schema fields in the technical lineage for SqlDirectory capability for stitching. The SQL files are named in a way that ensures the DDL statements are processed before the DML statement.

  • The ddl-persons.sql file
    CREATE TABLE Persons (
        PersonID int,
        LastName varchar(255),
        FirstName varchar(255),
        Address varchar(255),
        City varchar(255)
    );	
  • The ddl-jobinformation.sql file
    CREATE TABLE JobInformation (
        PersonID int,
        Department varchar(255),
        Title varchar(255)
    );
  • The view-jobtitle.sql file
    CREATE VIEW JobTitleView AS
    SELECT
        Persons.PersonID,
        Persons.FirstName,
        Persons.LastName,
        JobInformation.Title
    from
        Persons
        INNER JOIN JobInformation ON Persons.PersonID = JobInformation.PersonId

Example 2 SQL statements include schema and database names

This example shows SQL files that include the queries on the Persons and JobInformation tables and the JobTitleView view. The SQL statements contain the database and schema names for each table and view, and Collibra Data Lineage uses them for stitching. The SQL files are named in a way that ensures the DDL statements are processed before the DML statement.

  • The ddl-db1-schemaA-persons.sql file
    CREATE TABLE DB1.SchemaA.Persons (
        PersonID int,
        LastName varchar(255),
        FirstName varchar(255),
        Address varchar(255),
        City varchar(255)
    );
  • The ddl-db2-schemaB-jobinformation.sql file
    CREATE TABLE DB2.SchemaB.JobInformation (
        PersonID int,
        Department varchar(255),
        Title varchar(255)
    );
  • The view-db2-schemaC-jobtitleview.sql file
    CREATE VIEW DB2.SchemaC.JobTitleView AS
    SELECT
        Persons.PersonID,
        Persons.FirstName,
        Persons.LastName,
        JobInformation.Title
    from
       DB1.SchemaA.Persons
       INNER JOIN DB2.SchemaB.JobInformation ON Persons.PersonID = JobInformation.PersonId

What's next

Add the Technical Lineage for SqlDirectory capability.