Prepare an SQL directory

To create technical lineage for JDBC data sources by using the folder connection type, you must provide SQL files that include your SQL queries. Collibra Data Lineage processes the metadata based on your queries to create the technical lineage.

For more information about the connection types for different data sources, go to Supported data sources for technical lineage.

Note For best technical lineage results, use the JDBC connection to ingest JDBC sources when possible, rather than using the folder connection type with the SQL files.

Steps

  1. Create a local folder.
  2. Create your SQL files. Ensure that the following requirements are met for the SQL files:

    • The SQL files must be UTF-8 encoded.
    • The SQL files can't have white spaces in their names.
    • For better ingestion, include one SQL statement in one SQL file.
    • Collibra Data Lineage processes the SQL files in alphabetical order. The SQL files that include the Data Definition Language (DDL) statements must be processed before the SQL files that include the Data Manipulation Language (DML) statements. To ensure this order, name the SQL files such that those containing DDL statements come before those containing DML statements alphabetically.
    • The database and schema names in the SQL statements in your SQL files take precedence over the values that you provide for the database and schema properties in the lineage harvester configuration file. If your SQL statements contain database and schema names, Collibra Data Lineage uses them for stitching. If your SQL statements do not contain database and schema names, Collibra Data Lineage uses the values of the database and schema properties in the configuration file for stitching. For more information, go to lineage harvester configuration file and Automatic stitching for technical lineage.
    • For Collibra Data Lineage to correctly highlight the transformation logic in the Source code pane, we strongly recommend that your SQL files have Unix line endings. Non-Unix line endings, for example Carriage Return (CR) and Line Feed (LF) line breaks, do not influence the extracted lineage and can result in incorrect highlighting.

    For more information, go to Supported SQL syntax.

  3. Store the SQL files in the local folder.
Note When using a SQL directory, it is important to provide the database model of all involved databases in the SQL statements. Analysis errors can occur if the complete Data Definition Language (DDL) is not provided. You can provide the DDL via one of the following means:
  • Add the SQL files that contain the CREATE TABLE statements in the SQL directory and ensure that they are analysed first.
  • Use the shared database model feature, to provide table-definition details when it is available in a different source.

Example 1 SQL statements do not include schema and database names

This example shows the SQL files that include the queries on the Persons and JobInformation tables and the JobTitleView view. The SQL statements don't contain the database and schema values, so the lineage harvester uses the values of the database and schema properties that you specify in the lineage harvester configuration file for stitching. The SQL files are named in a way that ensures the DDL statements are processed before the DML statement.

  • The ddl-persons.sql file
    CREATE TABLE Persons (
        PersonID int,
        LastName varchar(255),
        FirstName varchar(255),
        Address varchar(255),
        City varchar(255)
    );	
  • The ddl-jobinformation.sql file
    CREATE TABLE JobInformation (
        PersonID int,
        Department varchar(255),
        Title varchar(255)
    );
  • The view-jobtitle.sql file
    CREATE VIEW JobTitleView AS
    SELECT
        Persons.PersonID,
        Persons.FirstName,
        Persons.LastName,
        JobInformation.Title
    from
        Persons
        INNER JOIN JobInformation ON Persons.PersonID = JobInformation.PersonId

Example 2 SQL statements include schema and database names

This example shows SQL files that include the queries on the Persons and JobInformation tables and the JobTitleView view. The SQL statements contain the database and schema names for each table and view, and Collibra Data Lineage uses them for stitching. The SQL files are named in a way that ensures the DDL statements are processed before the DML statement.

  • The ddl-db1-schemaA-persons.sql file
    CREATE TABLE DB1.SchemaA.Persons (
        PersonID int,
        LastName varchar(255),
        FirstName varchar(255),
        Address varchar(255),
        City varchar(255)
    );
  • The ddl-db2-schemaB-jobinformation.sql file
    CREATE TABLE DB2.SchemaB.JobInformation (
        PersonID int,
        Department varchar(255),
        Title varchar(255)
    );
  • The view-db2-schemaC-jobtitleview.sql file
    CREATE VIEW DB2.SchemaC.JobTitleView AS
    SELECT
        Persons.PersonID,
        Persons.FirstName,
        Persons.LastName,
        JobInformation.Title
    from
       DB1.SchemaA.Persons
       INNER JOIN DB2.SchemaB.JobInformation ON Persons.PersonID = JobInformation.PersonId

What's next

Add your data source information in the lineage harvester configuration file.