Download SQL files to the lineage harvester folder

You can download the SQL files of a data source that is stored locally and cannot be accessed via the network. The lineage harvester then stores the data source information in a ZIP file.

To create a technical lineage for these data sources, you only have to include the ID of the data source and the path to the ZIP file in the configuration file.

Note Click here to see a list of all supported data sources.

Prerequisites

You have downloaded the lineage harvester and you have the necessary system requirements to run it.

You have the necessary permissions to all database objects that the lineage harvester accesses.

Tip

Some data sources require specific permissions.

Ensure that you meet the Azure Data Factory prerequisites.

You need read access on the SYS schema.

You need read access on the SYS schema and the View Definition Permission in your SQL Server.

You need read access on information_schema:

bigquery.datasets.get
bigquery.tables.get
bigquery.tables.list
bigquery.jobs.create
bigquery.routines.get
bigquery.routines.list

GRANT SELECT, at table level. Grant this to every table for which you want to create a technical lineage.

You need read access on information_schema. Only views that you own are processed.

SELECT, at table level. Grant this to every table for which you want to create a technical lineage.

The role of the user that you specify in the username property in lineage harvester configuration file must be the owner of the views in PostgreSQL.

A role with the LOGIN option.

SELECT WITH GRANT OPTION, at Table level.

CONNECT ON DATABASE

Note The following permissions are the same, regardless of the ingestion mode: SQL or SQL-API.

You need a role that can access the Snowflake shared read-only database. To access the shared database, the account administrator must grant the IMPORTED PRIVILEGES privilege on the shared database to the user that runs the lineage harvester.

Tip If the default role in Snowflake does not have the IMPORTED PRIVILEGES privilege, you can use the customConnectionProperties property in the lineage harvester configuration file to assign the appropriate role to the user. For example:
"customConnectionProperties": "role=METADATA"

You need read access on the DBC.

You need read access to the following dictionary views:

all_tab_cols
all_col_comments
all_objects
ALL_DB_LINKS
all_mviews
all_source
all_synonyms
all_views

You need read access on definition_schema.

Your user role must have privileges to export assets.
You must have read permission on all assets that you want to export.

You have added the Matillion certificate to a Java truststore.
You have at least a Matillion Enterprise license.

Steps

Start the lineage harvester to create an empty lineage harvester configuration file by entering the following command:
- Windows: .\bin\lineage-harvester.bat
- For other operating systems: chmod +x bin/lineage-harvester and then bin/lineage-harvester
An empty configuration file is created in the config folder.
Save the configuration file in the config directory in the lineage harvester folder.
Prepare the configuration file.
Tip Use the configuration file generator to easily create a configuration file.
When prompted, enter the passwords to connect to Collibra and your data sources. Do one of the following:
- Enter the passwords in the console.
  The passwords are encrypted and stored in /config/pwd.conf.
- Provide the passwords via command line.
  The passwords are stored locally and not in your lineage harvester folder.
Start the lineage harvester again and do one of the following:
- To download the SQL files of all data sources in the configuration file, run the following command:
```
./bin/lineage-harvester load-sources
```
- To download the SQL files of specific data sources in the configuration file, run the following command:
```
./bin/lineage-harvester load-sources -s "ID of the data source"
```
  Tip This command allows you to download specific SQL files in the configuration file, without refreshing other SQL files. This reduces the time you need to download your SQL files, since you only download specific ones without affecting the others. If you want to download SQL files of multiple data sources, add -s "ID of another data source" per data source to the command.

What's next?

You can now prepare a configuration file for theSQL files of data sources that you want to include in your technical lineage.