Download SQL files to the lineage harvester folder

You can download the SQL files of a data source that is stored locally and cannot be accessed via the network. The lineage harvester then stores the data source information in a ZIP file.

To create a technical lineage for these data sources, you only have to include the ID of the data source and the path to the ZIP file in the configuration file.

Note Click here to see a list of all supported data sources.

Prerequisites

You have downloaded the lineage harvester and you have the necessary system requirements to run it.

You have the necessary permissions to all database objects that the lineage harvester accesses.

Tip

Some data sources require specific permissions.

You need read access on the SYS schema.

You need read access on the SYS schema and the View Definition Permission in your SQL Server.

You need read access on information_schema.

You need read access on information_schema. Only views that you own are processed.

SELECT, at table level. Grant this to every table for which you want to create a technical lineage.

A role with the LOGIN option.

Only SQL statements.

CONNECT ON DATABASE

You need a role that can access the Snowflake shared read-only database. To access the shared database, the account administrator must grant IMPORTED PRIVILEGES on the shared database to the user that runs the lineage harvester.

Tip If the role is not assigned in Snowflake, you can use the customConnectionProperties property in the lineage harvester configuration file to assign the Default role to the user. For example:
"customConnectionProperties": "role=default"

You need read access on the DBC.

You need read access to the following dictionary views:

all_tab_cols
all_col_comments
all_objects
ALL_DB_LINKS
all_mviews
all_source
all_synonyms
all_views

You need read access on definition_schema.

You need Admin permission on all objects that you want to harvest.

You have added the Matillion certificate to a Java truststore.

You have at least a Matillion Enterprise license.

You need a role with user access to the server from which you want to ingest:

You have a system-level role, which is at least a System user role.
You have an item-level role, which is at least a Content Manager role.

You need a role with user access to the relevant server and be able to access the metadata that is stored there.

Make sure that the lineage harvester can reach Power BI by registering Power BI in Azure and setting the necessary permission to harvest the metadata.

We highly recommend that you read about supported authentication methods before you register Power BI in Microsoft Azure. For more details, see Register Power BI in Microsoft Azure and set permissions.

You need to following minimum roles and permissions to harvest Tableau metadata:

You have a View permission on Tableau projects, workbooks and data sources you want to ingest.
You have a Viewer or Explorer (can publish) role with access to the Tableau REST API.

For a full ingestion, we recommend the following roles and permissions in Tableau:

You have at least a View permission on Tableau projects, workbooks and data sources you want to ingest.
You have the Explorer role with the Data Management Add-on.

Steps

Run the following command line to start the lineage harvester:
- Windows: .\bin\lineage-harvester.bat
- For other operating systems: chmod +x bin/lineage-harvester and then bin/lineage-harvester
An empty configuration file is created in the config folder.
Save the configuration file in the config directory in the lineage harvester folder.
Prepare the configuration file.
Tip Use the configuration file generator to easily create a configuration file.
When prompted, enter the passwords to connect to Collibra and your data sources. Do one of the following:
- Enter the passwords in the console.
  The passwords are encrypted and stored in /config/pwd.conf.
- Provide the passwords via command line.
  The passwords are stored locally and not in your lineage harvester folder.
Start the lineage harvester again and do one of the following:
- To download the SQL files of all data sources in the configuration file, run the following command:
```
./bin/lineage-harvester load-sources
```
- To download the SQL files of specific data sources in the configuration file, run the following command:
```
./bin/lineage-harvester load-sources -s "ID of the data source"
```
  Tip This command allows you to download specific SQL files in the configuration file, without refreshing other SQL files. This reduces the time you need to download your SQL files, since you only download specific ones without affecting the others. If you want to download SQL files of multiple data sources, add -s "ID of another data source" per data source to the command.

What's next?

You can now prepare a configuration file for theSQL files of data sources that you want to include in your technical lineage.