Technical lineage for Snowflake ingestion methods
To create technical lineage for Snowflake, you can use the following connection types:
- The JDBC connection. With this connection type, you can choose to use the SQL or SQL-API Snowflake ingestion modes.
- The folder connection type if you use the lineage harvester or the Shared Storage connection if you use technical lineage via Edge.
You can use different ingestion modes and connection types to collect and process the metadata of your Snowflake data sources with one technical lineage license. For example, you can use both the SQL-API ingestion mode and the folder or Shared Storage connection type. In this way, technical lineage is created based on the query execution and also provides a full coverage of stored procedures.
The JDBC connection type
You can use the JDBC connection type to establish connection to your Snowflake data sources. Collibra Data Lineage collects and processes the metadata from the data sources to create technical lineage.
With the JDBC connection type, you can choose to use the SQL or SQL-API Snowflake ingestion modes. These modes are complementary and are designed to address different needs and use cases.
SQL Snowflake ingestion mode
With this ingestion mode, Collibra Data Lineage retrieves lineage from the database schema, views, and stored procedures, providing a design lineage. You can understand the data flow at the schema level from the generated technical lineage.
SQL-API Snowflake ingestion mode
Note If a metadata batch contains multiple versions of a schema, lineage is shown only for the latest version.
Introduced in the 2023.02 release, the SQL-API mode retrieves lineage from views and executed database queries, providing an operational style of lineage. This mode accesses much more information and may take longer for lineage processing.
Stored procedures are supported. However, if a stored procedure is defined but not executed, the generated technical lineage does not include lineage for that stored procedure.
The technical lineage is based on Snowflake's interpretation of modified objects. Therefore, Collibra Data Lineage cannot show lineage for queries that Snowflake does not interpret or interprets differently than expected. For example, technical lineage does not include indirect lineage, as Snowflake does not interpret indirect lineage. Indirect lineage is the lineage that includes a column that does not appear in the target table but is used as a filter for data moving to the target table.
Additionally, if database queries contain conditional statements, the technical lineage includes lineage only for the conditions that were executed. Only the executed path of a CASE WHEN/THEN
or IF
statement is shown in lineage for each executed query instance.
If you use the lineage harvester, set the mode
property in the lineage harvester configuration file to indicate which ingestion mode you want to use.
If you use technical lineage via Edge, use the Ingestion Method field in the technical lineage for Snowflake capability to select the ingestion mode you want to use.
The folder or Shared Storage connection type
With this connection type, you must prepare the SQL queries. The SQL queries can come from a log, stored procedure definitions, and so on. Collibra Data Lineage processes each conditional statement to create the technical lineage for all possible conditions.
If you use the lineage harvester, you must prepare a SQL directory and add your SQL queries to the folder.
If you use technical lineage via Edge, you must add your SQL queries to the Shared Storage connection folder and use the Technical Lineage for SqlDirectory capability to create the technical lineage.
See the following table for a summary of the connection types and ingestion modes.
Connection type |
Ingestion mode |
Details | Release date |
---|---|---|---|
JDBC | SQL |
Collibra Data Lineage extracts metadata and information about the Snowflake database schemas and views to calculate lineage. This is the default mode. You can use the technical lineage to understand the flow of data at the schema level. |
2020 |
SQL-API |
Collibra Data Lineage parses SQL from views and schemas, and additionally gets lineage information from the ACCESS_HISTORY system view, which is a log of all queries that are run on the system. The SQL-API mode supports stored procedures and other orchestration methods, for example, application queries and ad-hoc queries. You can use the technical lineage to see the operational lineage from executed queries. |
2023.02 | |
Folder or Shared Storage connection |
Not applicable | Collibra Data Lineage retrieves lineage from the SQL queries that you upload to a SQL directory. The technical lineage captures all lineage paths from the SQL queries. |
Folder - 2020 Shared Storage connection - 2023.05 |
For more information about the supported queries and transformation, go to Supported transformation details.
For an overview of the steps to create technical lineage, go to Create a technical lineage via the lineage harvester and Create a technical lineage via Edge.
For more information about Snowflake, go to Snowflake Documentation.