Run the lineage harvester
After you specify the lineage harvester configuration file, run the lineage harvester to create the technical lineage.
Before you begin
If you use a proxy server, connect to the proxy server. For more information, go to Connecting to a proxy server.
Steps
- Start the lineage harvester by entering the
full-synccommand.- To process data from all data sources in the configuration file:
For windows:
.\bin\lineage-harvester.bat full-sync
For other operating systems:./bin/lineage-harvester full-sync
- To process data from specific data sources in the configuration file:
For windows:
.\bin\lineage-harvester.bat full-sync -s "ID of the data source"
For other operating systems:./bin/lineage-harvester full-sync -s "ID of the data source"
Note If you have Snowflake data sources in your lineage harvester configuration file, set theJAVA_OPTSenvironment variable with thefull-synccommand. For example, to process data from all data sources including the Snowflake data sources, enter the following command:For windows:
JAVA_OPTS='--add-opens java.base/java.nio=ALL-UNNAMED' .\bin\lineage-harvester.bat full-sync
For other operating systems:
JAVA_OPTS='--add-opens java.base/java.nio=ALL-UNNAMED' ./bin/lineage-harvester full-sync
For more information, see Lineage harvesting app command options and arguments. - To process data from all data sources in the configuration file:
- When prompted, enter the passwords to connect to Collibra and your data sources. Do one of the following:
-
Enter the passwords in the console.The passwords are encrypted and stored in /config/pwd.conf.
- Provide the passwords via command line.The passwords are stored locally and not in your lineage harvester folder.
-
Enter the passwords in the console.
What's next
The lineage harvester sends the data source information to the Collibra Data Lineage service by using Collibra REST API, where it is parsed and analyzed. As a result, the technical lineage is created and shown in Data Catalog. You can view the technical lineage. For more information, go to Technical lineage viewer.
You can check the progress of the technical lineage creation in Activities in your Collibra Data Intelligence Cloud environment. The Results field indicates how many relations were imported into Data Catalog. Go to the status page to see the log files of the SQL analysis.
If the lineage harvester log shows an error message or the harvesting process fails, you can use the technical lineage troubleshooting guide or Collibra Support Portal to fix your issue.
If you want to synchronize the data sources on fixed times, you can use scheduled jobs.