Lineage harvester system requirements

Warning The CLI lineage harvester is now deprecated and will officially reach its end-of-life on July 31, 2026. To ensure a smooth transition, we encourage you to begin creating technical lineage via Edge, if you haven't already.

To install and run the lineage harvester, you have to meet the following requirements.

Software requirements

Java Runtime Environment version (JRE) 17 or newer, or OpenJDK 17 or newer.

Hardware requirements

You need to meet the hardware requirements to install and run the lineage harvester.

Minimum hardware requirements

You need the following minimum hardware requirements:

  • 2 GB RAM
  • 1 GB free disk space

Recommended hardware requirements

The minimum requirements are most likely insufficient for production environments. We recommend the following hardware requirements:

  • 4 GB RAM
    Tip 4 GB RAM is sufficient in most cases, but more memory could be needed for larger harvesting tasks. For instructions on how to increase the maximum heap size, see the advice on how to resolve Java heap space errors, in the Collibra Support Portal.
  • 20 GB free disk space

Specific requirements for Power BI

  • If you have more than 50,000 actively used workspaces, we recommend increasing the heap size to at least 6 GB.
  • Ensure that the Power BI certificate exists in the Java TrustStore.

Network requirements

The lineage harvester uses the HTTPS protocol by default and uses port 443.

You need firewall rules so that the lineage harvester can connect to:

  • The host names of all data sources in your lineage harvester configuration file.
  • All Collibra Data Lineage service instances in your geographic location:
    RegionDNS name
    aws-catechlin-ca-central-1.collibra.com
    aws-eutechlin-eu-central-1.collibra.com
    aws-metechlin-me-central-1.collibra.com
    aws-sgtechlin-ap-southeast-1.collibra.com
    aws-ustechlin-us-east-1.collibra.com
    gcp-autechlin-australia-southeast1.collibra.com
    gcp-catechlin-northamerica-northeast1.collibra.com
    gcp-eutechlin-europe-west1.collibra.com
    gcp-sgtechlin-asia-southeast1.collibra.com
    gcp-uktechlin-europe-west2.collibra.com
    gcp-ustechlin-us-east1.collibra.com
    Note 

    The lineage harvester connects to different Collibra Data Lineage service instances based on your geographic location and cloud provider. If your location or cloud provider changes, the lineage harvester rescans all your data sources.

    • You have to allow all Collibra Data Lineage service instances in your geographic location.
    • Regardless of your geographic location, you have to allow one of the following. This is required to retrieve the API Key. No other data is transferred.
    • RegionDNS name
      aws-catechlin-ca-central-1.collibra.com
      aws-eutechlin-eu-central-1.collibra.com
      aws-ustechlin-us-east-1.collibra.com
      gcp-ustechlin-us-east1.collibra.com

    In addition, we highly recommend that you always allow the "techlin-us-east-1" instance as a backup for license verification, in case the lineage harvester cannot connect to other Collibra Data Lineage service instances.

    RegionDNS name
    aws-ustechlin-us-east-1.collibra.com