Run the lineage harvester

After you have prepared the lineage harvester configuration file, run the lineage harvester to create the technical lineage.

Before you begin

If you use a proxy server, connect to the proxy server. For more information, go to Connecting to a proxy server.

Requirements and permissions

Collibra Platform.
You have purchased Collibra Data Lineage. Ensure that you use the lineage harvester version 2024.05 or newer.
A global role with the following global permissions:
- Catalog, for example Catalog Author
- Data Stewardship Manager
- Manage all resources
- System administration
- Technical lineage
A resource role with the following resource permissions on the community level in which you created the domain:
- Asset > Add
- Attribute > Add
- Domain > Add
- Attachment > Add

Necessary permissions to all database objects that the lineage harvester accesses.

Steps

Start the lineage harvester by entering the full-sync command.
- To process data from all data sources in the configuration file:
  For windows:
```
.\bin\lineage-harvester.bat full-sync
```
  For other operating systems:
```
./bin/lineage-harvester full-sync
```
- To process data from specific data sources in the configuration file:
  For windows:
```
.\bin\lineage-harvester.bat full-sync -s "ID of the data source"
```
  For other operating systems:
```
./bin/lineage-harvester full-sync -s "ID of the data source"
```
Tip For more information and command options, go to Lineage harvesting app command options and arguments.
Note
If you have Snowflake data sources in your lineage harvester configuration file, set the JAVA_OPTS environment variable first. For example, to process data from all data sources including the Snowflake data sources, take the following steps:
On Windows
1. Enter one of the following commands:
  If you use OpenJDK 16:
  set JAVA_OPTS="-Djdk.module.illegalAccess=permit"
  If you use OpenJDK 17 or higher:
  set JAVA_OPTS="--add-opens=java.base/java.nio=ALL-UNNAMED"
2. In the same command line, enter the following command:
  .\bin\lineage-harvester.bat full-sync
Note The set command is specific to the Windows Command Shell. The command is different if you are using PowerShell.
On Linux
Enter the following command:
- If you use OpenJDK 16:
  JAVA_OPTS="-Djdk.module.illegalAccess=permit" ./bin/lineage-harvester full-sync
- If you use OpenJDK 17 or higher:
  JAVA_OPTS="--add-opens=java.base/java.nio=ALL-UNNAMED" ./bin/lineage-harvester full-sync
When prompted, enter the passwords to connect to Collibra and your data sources. Do one of the following:
- Enter the passwords in the console.
  The passwords are encrypted and stored in /config/pwd.conf.
- Provide the passwords via command line.
  The passwords are stored locally and not in your lineage harvester folder.
If you are creating technical lineage for dbt Cloud and prompted to enter your API token, enter the token value for the service token that you specified for the tokenName property in the lineage harvester configuration file for dbt Cloud.

What's next

The lineage harvester sends the data source information to the Collibra Data Lineage service by using Collibra REST API, where it is parsed and analyzed. As a result, the technical lineage is created and shown in Data Catalog. You can view the technical lineage. For more information, go to Technical lineage viewer.

You can check the progress of the technical lineage creation in Activities in your Collibra Platform environment. The Results field indicates how many relations were imported into Data Catalog. Go to the status page to see the log files of the SQL analysis.

If the lineage harvester log shows an error message or the harvesting process fails, you can use the technical lineage common errors and issues in Collibra Support Portal to fix the error.

If you want to synchronize the data sources on fixed times, you can use scheduled jobs.