Run the lineage harvester

After you have prepared the lineage harvester configuration file, run the lineage harvester to create the technical lineage.

Before you begin

If you use a proxy server, connect to the proxy server. For more information, go to Connecting to a proxy server.

Requirements and permissions

  • Collibra Data Intelligence Platform.
  • You have purchased Collibra Data Lineage.
  • A global role with the following global permissions:
    • Catalog, for example Catalog Author
    • Data Stewardship Manager
    • Manage all resources
    • System administration
    • Technical lineage
  • A resource role with the following resource permissions on the community level in which you created the domain:
    • Asset: add
    • Attribute: add
    • Domain: add
    • Attachment: add
  • Necessary permissions to all database objects that the lineage harvester accesses.

Steps

  1. Start the lineage harvester by entering the  full-sync command.
    • To process data from all data sources in the configuration file:
      For windows:
      .\bin\lineage-harvester.bat full-sync
      For other operating systems:
      ./bin/lineage-harvester full-sync
    • To process data from specific data sources in the configuration file:
      For windows:
      .\bin\lineage-harvester.bat full-sync -s "ID of the data source"
      For other operating systems:
      ./bin/lineage-harvester full-sync -s "ID of the data source"
    Tip For more information and command options, go to Lineage harvesting app command options and arguments.
    Note 

    If you have Snowflake data sources in your lineage harvester configuration file, set the JAVA_OPTS environment variable first. For example, to process data from all data sources including the Snowflake data sources, take the following steps:

  2. When prompted, enter the passwords to connect to Collibra and your data sources. Do one of the following:
    • Enter the passwords in the console.
      The passwords are encrypted and stored in /config/pwd.conf.
    • Provide the passwords via command line.
      The passwords are stored locally and not in your lineage harvester folder.
  3. If you are creating technical lineage for dbt Cloud and prompted to enter your API token, enter the token value for the service token that you specified for the tokenName property in the lineage harvester configuration file for dbt Cloud.

What's next

The lineage harvester sends the data source information to the Collibra Data Lineage service by using Collibra REST API, where it is parsed and analyzed. As a result, the technical lineage is created and shown in Data Catalog. You can view the technical lineage. For more information, go to Technical lineage viewer.

You can check the progress of the technical lineage creation in Activities in your Collibra Data Intelligence Platform environment. The Results field indicates how many relations were imported into Data Catalog. Go to the status page to see the log files of the SQL analysis.

If the lineage harvester log shows an error message or the harvesting process fails, you can use the technical lineage common errors and issues in Collibra Support Portal to fix the error.

If you want to synchronize the data sources on fixed times, you can use scheduled jobs.