Harvesting materialized views that were generated via an external script

Warning The CLI lineage harvester is now deprecated and will officially reach its end-of-life on July 31, 2026. To ensure a smooth transition, we encourage you to begin creating technical lineage via Edge, if you haven't already.

The CLI lineage harvester can harvest materialized views that are native to a data source, meaning the data flow is performed by SQL code stored in the data source. If, however, an external script - not the database itself - creates your materialized views, the lineage harvester cannot see the data flow automatically. To fix this, you can use a script to generate the missing SQL code on the fly.

Alternatively, you can create a custom technical lineage.

Create a script to generate a list of SQL queries

For each pair of source (view) and target (materialized view table), create a script as follows:

Copy
INSERT INTO 'dhw.sales.mv_customers'
    SELECT * FROM 'dhw.sales.v_customers';

The generated SQL queries then need to be harvested by the CLI lineage harvester. There are two options for this, depending on where you choose to store the generated SQL code.

Option 1: Store SQL in text files

If your external script saves the INSERT INTO statements as physical text files, you can ingest them in Collibra as a separate source.

  • Method: Use an additional SqlDirectory type source in your configuration.
  • Benefit: This is ideal if you already have a process that logs SQL transformations into a folder or repository.

Option 2: Store SQL in a database table

If your external script writes the SQL code into a table within your data source, you don't need to create separate files.

  1. Method: You modify your harvesting query so that it pulls the code directly from that table.
  2. On-the-fly logic: In this scenario, the SQL queries are generated or retrieved by the harvesting query itself during the sync process.
  3. Benefit: You do not need to manage or store the generated SQL queries anywhere outside of the database.
Copy
SELECT
    t.table_name,
    t.ddl as sourceCode,
    CONCAT(t.table_schema, '.', t.table_name) as groupName,
    t.table_schema as schemaName
FROM `##PROJECT_ID##`.`##DSNAME##`.`INFORMATION_SCHEMA.TABLES` t
WHERE t.table_type IN ('MATERIALIZED VIEW','VIEW')

UNION ALL

SELECT
    CONCAT('m', t.table_name),
    CONCAT('INSERT INTO `m', t.table_name, '` SELECT * FROM `', t.table_name, '`') as sourceCode,
    CONCAT('Generated m', t.table_schema, '.', t.table_name) as groupName,
    t.table_schema as schemaName
FROM `##PROJECT_ID##`.`##DSNAME##`.`INFORMATION_SCHEMA.TABLES` t
WHERE
    t.table_type IN ('VIEW')
    AND STARTS_WITH(t.table_name, 'v_')

The second SELECT generates the necessary INSERT INTOs for all views in your data source that have a name starting with v_.