Download SQL files to the lineage harvester folder
You can download the SQL files of a data source that is stored locally and cannot be accessed via the network. The lineage harvester then stores the data source information in a ZIP file.
To create a technical lineage for these data sources, you only have to include the ID of the data source and the path to the ZIP file in the configuration file.
Prerequisites
- You have downloaded the lineage harvester and you have the necessary system requirements to run it.
- You have the necessary permissions to all database objects that the lineage harvester accesses.
Tip
Some data sources require specific permissions.
Data source type permissions:
You need read access on the SYS schema.
You need read access on the SYS schema and the View Definition Permission in your SQL Server.
You need read access on information_schema.
You need read access on information_schema. Only views that you own are processed.
SELECT, at table level. Grant this to every table for which you want to create a technical lineage.
A role with the LOGIN option.
Only SQL statements.
CONNECT ON DATABASE
You need a role that can access the Snowflake shared read-only database. To access the shared database, the account administrator must grant IMPORTED PRIVILEGES on the shared database to the user that runs the lineage harvester.
Tip If the role is not assigned in Snowflake, you can use thecustomConnectionPropertiesproperty in the lineage harvester configuration file to assign the Default role to the user. For example:"customConnectionProperties": "role=default"You need read access on the DBC.
You need read access to the following dictionary views:
- all_tab_cols
- all_col_comments
- all_objects
- ALL_DB_LINKS
- all_mviews
- all_source
- all_synonyms
- all_views
You need read access on definition_schema.
You need Admin permission on all objects that you want to harvest.
You have added the Matillion certificate to a Java truststore.
You have at least a Matillion Enterprise license.
You need a role with user access to the server from which you want to ingest:
- You have a system-level role, which is at least a System user role.
- You have an item-level role, which is at least a Content Manager role.
You need a role with user access to the relevant server and be able to access the metadata that is stored there.
Make sure that the lineage harvester can reach Power BI by registering Power BI in Azure and setting the necessary permission to harvest the metadata.
We highly recommend that you read about supported authentication methods before you register Power BI in Microsoft Azure. For more details, see Register Power BI in Microsoft Azure and set permissions.
You need to following minimum roles and permissions to harvest Tableau metadata:- You have a View permission on Tableau projects, workbooks and data sources you want to ingest.
- You have a Viewer or Explorer (can publish) role with access to the Tableau REST API.
For a full ingestion, we recommend the following roles and permissions in Tableau:- You have at least a View permission on Tableau projects, workbooks and data sources you want to ingest.
- You have the Explorer role with the Data Management Add-on.
Steps
- Run the following command line to start the lineage harvester:
- Windows:
.\bin\lineage-harvester.bat
- For other operating systems:
chmod +x bin/lineage-harvesterand thenbin/lineage-harvester
An empty configuration file is created in the config folder.
- Windows:
- Save the configuration file in the config directory in the lineage harvester folder.
- Prepare the configuration file.
Tip Use the configuration file generator to easily create a configuration file.
- When prompted, enter the passwords to connect to Collibra and your data sources. Do one of the following:
-
Enter the passwords in the console.The passwords are encrypted and stored in /config/pwd.conf.
- Provide the passwords via command line.The passwords are stored locally and not in your lineage harvester folder.
-
Enter the passwords in the console.
- Start the lineage harvester again and do one of the following:
- To download the SQL files of all data sources in the configuration file, run the following command:
./bin/lineage-harvester load-sources
- To download the SQL files of specific data sources in the configuration file, run the following command:
./bin/lineage-harvester load-sources -s "ID of the data source"
Tip This command allows you to download specific SQL files in the configuration file, without refreshing other SQL files. This reduces the time you need to download your SQL files, since you only download specific ones without affecting the others. If you want to download SQL files of multiple data sources, add-s "ID of another data source"per data source to the command.
The lineage harvester downloads the SQL files of the data sources and stores them in a ZIP file per data source in the lineage harvester output folder. - To download the SQL files of all data sources in the configuration file, run the following command:
What's next?
You can now prepare a configuration file for theSQL files of data sources that you want to include in your technical lineage.