Harvesting materialized views that were generated via an external script
The CLI lineage harvester can harvest materialized views that are native to a data source, meaning the data flow is performed by SQL code stored in the data source. If, however, an external script - not the database itself - creates your materialized views, the lineage harvester cannot see the data flow automatically. To fix this, you can use a script to generate the missing SQL code on the fly.
Alternatively, you can create a custom technical lineage.
Create a script to generate a list of SQL queries
For each pair of source (view) and target (materialized view table), create a script as follows:
INSERT INTO 'dhw.sales.mv_customers'
SELECT * FROM 'dhw.sales.v_customers';
The generated SQL queries then need to be harvested by the CLI lineage harvester. There are two options for this, depending on where you choose to store the generated SQL code.
Option 1: Store SQL in text files
If your external script saves the INSERT INTO statements as physical text files, you can ingest them in Collibra as a separate source.
- Method: Use an additional SqlDirectory type source in your configuration.
- Benefit: This is ideal if you already have a process that logs SQL transformations into a folder or repository.
Option 2: Store SQL in a database table
If your external script writes the SQL code into a table within your data source, you don't need to create separate files.
- Method: You modify your harvesting query so that it pulls the code directly from that table.
- On-the-fly logic: In this scenario, the SQL queries are generated or retrieved by the harvesting query itself during the sync process.
- Benefit: You do not need to manage or store the generated SQL queries anywhere outside of the database.
SELECT
t.table_name,
t.ddl as sourceCode,
CONCAT(t.table_schema, '.', t.table_name) as groupName,
t.table_schema as schemaName
FROM `##PROJECT_ID##`.`##DSNAME##`.`INFORMATION_SCHEMA.TABLES` t
WHERE t.table_type IN ('MATERIALIZED VIEW','VIEW')
UNION ALL
SELECT
CONCAT('m', t.table_name),
CONCAT('INSERT INTO `m', t.table_name, '` SELECT * FROM `', t.table_name, '`') as sourceCode,
CONCAT('Generated m', t.table_schema, '.', t.table_name) as groupName,
t.table_schema as schemaName
FROM `##PROJECT_ID##`.`##DSNAME##`.`INFORMATION_SCHEMA.TABLES` t
WHERE
t.table_type IN ('VIEW')
AND STARTS_WITH(t.table_name, 'v_')
The second SELECT generates the necessary INSERT INTOs for all views in your data source that have a name starting with v_.