Run the lineage harvester
After you have prepared the lineage harvester configuration file, run the lineage harvester to create the technical lineage.
Before you begin
If you use a proxy server, connect to the proxy server. For more information, go to Connecting to a proxy server.
Requirements and permissions
- Collibra Data Intelligence Platform.
- You have purchased Collibra Data Lineage.
- A global role with the following global permissions:
- Catalog, for example Catalog Author
- Data Stewardship Manager
- Manage all resources
- System administration
- Technical lineage
- A resource role with the following resource permissions on the community level in which you created the domain:
- Asset: add
- Attribute: add
- Domain: add
- Attachment: add
- Necessary permissions to all database objects that the lineage harvester accesses.
Steps
- Start the lineage harvester by entering the
full-sync
command.- To process data from all data sources in the configuration file:
For windows:
.\bin\lineage-harvester.bat full-sync
For other operating systems:./bin/lineage-harvester full-sync
- To process data from specific data sources in the configuration file:
For windows:
.\bin\lineage-harvester.bat full-sync -s "ID of the data source"
For other operating systems:./bin/lineage-harvester full-sync -s "ID of the data source"
Tip For more information and command options, go to Lineage harvesting app command options and arguments.NoteIf you have Snowflake data sources in your lineage harvester configuration file, set the
JAVA_OPTS
environment variable first. For example, to process data from all data sources including the Snowflake data sources, take the following steps:On Windows-
Enter one of the following commands:
- If you use OpenJDK 16:
set JAVA_OPTS="-Djdk.module.illegalAccess=permit"
- If you use OpenJDK 17 or higher:
set JAVA_OPTS="--add-opens=java.base/java.nio=ALL-UNNAMED"
- If you use OpenJDK 16:
-
In the same command line, enter the following command:
.\bin\lineage-harvester.bat full-sync
Note The
set
command is specific to the Windows Command Shell. The command is different if you are using PowerShell.On LinuxEnter the following command:
- If you use OpenJDK 16:
JAVA_OPTS="-Djdk.module.illegalAccess=permit" ./bin/lineage-harvester full-sync
- If you use OpenJDK 17 or higher:
JAVA_OPTS="--add-opens=java.base/java.nio=ALL-UNNAMED" ./bin/lineage-harvester full-sync
- To process data from all data sources in the configuration file:
- When prompted, enter the passwords to connect to Collibra and your data sources. Do one of the following:
-
Enter the passwords in the console.The passwords are encrypted and stored in /config/pwd.conf.
- Provide the passwords via command line.The passwords are stored locally and not in your lineage harvester folder.
-
Enter the passwords in the console.
- If you are creating technical lineage for dbt Cloud and prompted to enter your API token, enter the token value for the service token that you specified for the
tokenName
property in the lineage harvester configuration file for dbt Cloud.
What's next
The lineage harvester sends the data source information to the Collibra Data Lineage service by using Collibra REST API, where it is parsed and analyzed. As a result, the technical lineage is created and shown in Data Catalog. You can view the technical lineage. For more information, go to Technical lineage viewer.
You can check the progress of the technical lineage creation in Activities in your Collibra Data Intelligence Platform environment. The Results field indicates how many relations were imported into Data Catalog. Go to the status page to see the log files of the SQL analysis.
If the lineage harvester log shows an error message or the harvesting process fails, you can use the technical lineage common errors and issues in Collibra Support Portal to fix the error.
If you want to synchronize the data sources on fixed times, you can use scheduled jobs.