About the lineage harvester installation

You use the lineage harvester to collect source code from your data sources and create new relations between data elements from your data source and existing assets in Data Catalog.

The lineage harvester runs close to the data source and can harvest transformation logic like SQL scripts and ETL scripts from a specific location, for example a database table or a folder on a file system.

Note Collibra Data Lineage is a cloud-only feature.

Requirements

Type Requirements
Software

Minimum requirements:

  • Java Runtime Environment version 11 or newer or OpenJDK 11 or newer.

Recommended requirements:

  • Java Runtime Environment version 11 or newer or OpenJDK 11 or newer.
Hardware

Minimum requirements:

  • 2 GB RAM
  • 1 GB free disk space

Recommended requirements:

  • 4 GB RAM
    Tip 4 GB RAM is sufficient in most cases, but more memory could be needed for larger harvesting tasks. For instructions on how to increase the maximum heap size, see Technical lineage general troubleshooting.
  • 20 GB free disk space
Network

Firewall rules so that the lineage harvester can connect to:

  • The host names of all data sources in the lineage harvester configuration file.
  • All Collibra Data Lineage servers in your geographic location:
    • 15.222.200.199 (techlin-aws-ca)
    • 18.198.89.106 (techlin-aws-eu)
    • 54.242.194.190 (techlin-aws-us)
    • 51.105.241.132 (techlin-azure-eu)
    • 20.102.44.39 (techlin-azure-us)
    • 35.197.182.41 (techlin-gcp-au)
    • 34.152.20.240 (techlin-gcp-ca)
    • 35.205.146.124 (techlin-gcp-eu)
    • 34.87.122.60 (techlin-gcp-sg)
    • 35.234.130.150 (techlin-gcp-uk)
    • 34.73.33.120 (techlin-gcp-us)

    Note The lineage harvester connects to different servers based on your geographic location and cloud provider. If your location or cloud provider changes, the lineage harvester rescans all your data sources. You have to whitelist all Collibra Data Lineage servers in your geographic location. In addition, we highly recommend that you always whitelist the techlin-aws-us server as a backup, in case the lineage harvester cannot connect to other Collibra Data Lineage servers.

Note The lineage harvester uses port 443.

Installing the lineage harvester

If you purchased Collibra Data Lineage, you can access the lineage harvester on the downloads page. To install the lineage harvester, do the following:

  1. Download the lineage harvester.
  2. Unzip the archive.
    You can now access the lineage harvester folder.
  3. Run the following command line to start the lineage harvester:
    • Windows: .\bin\lineage-harvester.bat
    • For other operating systems: chmod +x bin/lineage-harvester and then bin/lineage-harvester
    An empty configuration file is created in the config folder.
  4. The lineage harvester is installed automatically. You can check the installation by running ./bin/lineage-harvester --help.

Note We highly recommend to always install and use the latest available lineage harvester.