Download SQL files to the lineage harvester folder
You can download the SQL files of a data source that is stored locally and cannot be accessed via the network. The lineage harvester then stores the data source information in a ZIP file.
To create a technical lineage for these data sources, you only have to include the ID of the data source and the path to the ZIP file in the configuration file.
Note Click here to see a list of all supported data sources.
Prerequisites
- You have downloaded the lineage harvester and you have the necessary system requirements to run it.
- You have the necessary permissions to all database objects that the lineage harvester accesses.
Tip
Some data sources require specific permissions.
Data source type permissions:
Ensure that you meet the Azure Data Factory prerequisites.You need read access on the SYS schema.You need read access on the SYS schema and the View Definition Permission in your SQL Server.You need read access on information_schema:- bigquery.datasets.get
- bigquery.tables.get
- bigquery.tables.list
- bigquery.jobs.create
- bigquery.routines.get
- bigquery.routines.list
GRANT SELECT, at table level. Grant this to every table for which you want to create a technical lineage.You need read access on information_schema. Only views that you own are processed.SELECT, at table level. Grant this to every table for which you want to create a technical lineage.The role of the user that you specify in theusernameproperty in lineage harvester configuration file must be the owner of the views in PostgreSQL.A role with the LOGIN option.SELECT WITH GRANT OPTION, at Table level.CONNECT ON DATABASENote The following permissions are the same, regardless of the ingestion mode:SQLorSQL-API.You need a role that can access the Snowflake shared read-only database. To access the shared database, the account administrator must grant the IMPORTED PRIVILEGES privilege on the shared database to the user that runs the lineage harvester.Tip If the default role in Snowflake does not have the IMPORTED PRIVILEGES privilege, you can use thecustomConnectionPropertiesproperty in the lineage harvester configuration file to assign the appropriate role to the user. For example:"customConnectionProperties": "role=METADATA"You need read access on the DBC.You need read access to the following dictionary views:- all_tab_cols
- all_col_comments
- all_objects
- ALL_DB_LINKS
- all_mviews
- all_source
- all_synonyms
- all_views
You need read access on definition_schema.- Your user role must have privileges to export assets.
- You must have read permission on all assets that you want to export.
- You have added the Matillion certificate to a Java truststore.
- You have at least a Matillion Enterprise license.
Steps
- Start the lineage harvester to create an empty lineage harvester configuration file by entering the following command:
- Windows:
.\bin\lineage-harvester.bat
- For other operating systems:
chmod +x bin/lineage-harvesterand thenbin/lineage-harvester
An empty configuration file is created in the config folder.
- Windows:
- Save the configuration file in the config directory in the lineage harvester folder.
- Prepare the configuration file.
Tip Use the configuration file generator to easily create a configuration file.
- When prompted, enter the passwords to connect to Collibra and your data sources. Do one of the following:
-
Enter the passwords in the console.The passwords are encrypted and stored in /config/pwd.conf.
- Provide the passwords via command line.The passwords are stored locally and not in your lineage harvester folder.
-
Enter the passwords in the console.
- Start the lineage harvester again and do one of the following:
- To download the SQL files of all data sources in the configuration file, run the following command:
./bin/lineage-harvester load-sources
- To download the SQL files of specific data sources in the configuration file, run the following command:
./bin/lineage-harvester load-sources -s "ID of the data source"
Tip This command allows you to download specific SQL files in the configuration file, without refreshing other SQL files. This reduces the time you need to download your SQL files, since you only download specific ones without affecting the others. If you want to download SQL files of multiple data sources, add-s "ID of another data source"per data source to the command.
The lineage harvester downloads the SQL files of the data sources and stores them in a ZIP file per data source in the lineage harvester output folder. - To download the SQL files of all data sources in the configuration file, run the following command:
What's next?
You can now prepare a configuration file for theSQL files of data sources that you want to include in your technical lineage.