Technical lineage for Snowflake ingestion methods
When creating a technical lineage for your Snowflake data sources, you can choose between a number of connection types and ingestion modes.
Note Collibra Data Lineage does not create lineage between "baseSources" and "directSources". For complete information, see the knowledge base article in the Collibra Support Portal.
The JDBC connection type
With the JDBC connection type, you can choose to use the SQL or SQL-API Snowflake ingestion modes. These modes are complementary and are designed to address different needs and use cases.
Tip You cannot configure two separate lineage Edge capabilities for the same Snowflake connection, for example to extract lineage from the same Snowflake source - one using the SQL ingestion mode and the other using the SQL-API ingestion mode. As a workaround, consider creating a second Edge connection and adding the second capability to that connection.
SQL Snowflake ingestion mode
With this ingestion mode, Collibra Data Lineage retrieves lineage from the database schema, views, and stored procedures, providing a design lineage. The generated technical lineage provides a schema-level view of the data flow.
SQL-API Snowflake ingestion mode
Note If a metadata batch contains multiple versions of a schema, lineage is shown only for the latest version.
The SQL-API mode retrieves lineage from views and executed database queries, providing an operational style of lineage. This mode accesses much more information and may take longer for lineage processing.
Stored procedures are supported; however, if a stored procedure is defined but not executed, lineage is not generated for the stored procedure.
The technical lineage is based on Snowflake's interpretation of modified objects. Therefore, Collibra Data Lineage cannot show lineage for queries that Snowflake does not interpret or interprets differently than expected. For example, the technical lineage doesn't include indirect lineage, as Snowflake doesn't interpret indirect lineage. Indirect lineage includes column that don't appear in the target table but are used as filters for data moving to the target table.
If database queries contain conditional statements, the technical lineage includes lineage only for the conditions that are executed. Only the executed path of a CASE WHEN/THEN or IF statement is shown in lineage for each executed query instance.
If you use Edge, use the Ingestion Method field when adding the Edge capability, to select the ingestion mode you want to use.
If you use the CLI lineage harvester (deprecated), set the mode property in the lineage harvester configuration file to indicate which ingestion mode you want to use.
Folder, Shared Storage, and Cloud Storage connections
For these connection types, you have to prepare the SQL queries. The SQL queries can come from a log, stored procedure definitions, and so on. Collibra Data Lineage processes each conditional statement to create the technical lineage for all possible conditions.
- If you use the CLI lineage harvester (deprecated), you have to prepare a SQL directory and add your SQL queries to the folder.
- If you use technical lineage via Edge:
- For Shared Storage connection, you have to put your SQL files in your Shared Storage connection folder.
- For Cloud Storage connection, you have put your SQL files in your cloud-based storage system. The files must be in one of the following:
- An AWS S3 bucket.
- An Azure Data Lake Storage container.
- A Google Cloud Storage bucket.
See the following table for a summary of the connection types and ingestion modes.
|
Connection type |
Ingestion mode |
Details |
|---|---|---|
| JDBC | SQL |
Collibra Data Lineage extracts metadata and information about the Snowflake database schemas and views to calculate lineage. This is the default mode. The generated technical lineage provides a schema-level view of the data flow. |
| SQL-API |
Collibra Data Lineage parses SQL from views and schemas, and gets lineage information from the ACCESS_HISTORY system view, which is a log of all queries that are run on the system. The SQL-API mode supports stored procedures and other orchestration methods, for example, application queries and ad-hoc queries. You can use the technical lineage to see the operational lineage from executed queries. |
|
|
Not applicable | Collibra Data Lineage retrieves lineage from the SQL queries that you upload to a SQL directory. The technical lineage captures all lineage paths from the SQL queries. |
| Cloud Storage | Not applicable | Collibra Data Lineage retrieves lineage from the SQL queries that you upload to your cloud-based storage system. The technical lineage captures all lineage paths from the SQL queries. |
For more information about the supported queries and transformation, go to Snowflake: Supported transformation details.
For an overview of the steps to create technical lineage, go to Steps overview: Integrate Snowflake via Edge.
For more information about Snowflake, go to Snowflake documentation.