Lineage harvester system requirements

To install and run the lineage harvester, you have to meet the following requirements.

Software requirements

Java Runtime Environment version (JRE) 17 or newer, or OpenJDK 17 or newer.

Note To create technical lineage for SAP HANA by using the lineage harvester version 2024.04 or newer, ensure that you use JRE 17 or newer or OpenJDK 17 or newer. However, if you use an older version of the lineage harvester, a version earlier than JRE 15 or OpenJDK 15 is required.

Hardware requirements

You need to meet the hardware requirements to install and run the lineage harvester.

Minimum hardware requirements

You need the following minimum hardware requirements:

  • 2 GB RAM
  • 1 GB free disk space

Recommended hardware requirements

The minimum requirements are most likely insufficient for production environments. We recommend the following hardware requirements:

  • 4 GB RAM
    Tip 4 GB RAM is sufficient in most cases, but more memory could be needed for larger harvesting tasks. For instructions on how to increase the maximum heap size, see the advice on how to resolve Java heap space errors, in the Collibra Support Portal.
  • 20 GB free disk space

Network requirements

The lineage harvester uses the HTTPS protocol by default and uses port 443.

You need the following minimum network requirements:

  • Firewall rules so that the lineage harvester can connect to:
    • The host names of all data sources in your lineage harvester configuration file.
    • All Collibra Data Lineage service instances in your geographic location:
      RegionCurrent service instances

      New service instances

       IP addressDNS nameDNS name
      aws-ca15.222.200.199techlin-aws-ca.collibra.comtechlin-ca-central-1.collibra.com
      aws-eu18.198.89.106techlin-aws-eu.collibra.comtechlin-eu-central-1.collibra.com
      aws-sg13.228.38.245techlin-aws-sg.collibra.comtechlin-ap-southeast-1.collibra.com
      aws-us54.242.194.190techlin-aws-us.collibra.comtechlin-us-east-1.collibra.com
      gcp-au35.197.182.41techlin-gcp-au.collibra.comtechlin-australia-southeast1.collibra.com
      gcp-ca34.152.20.240techlin-gcp-ca.collibra.comtechlin-northamerica-northeast1.collibra.com
      gcp-eu35.205.146.124techlin-gcp-eu.collibra.comtechlin-europe-west1.collibra.com
      gcp-sg34.87.122.60techlin-gcp-sg.collibra.comtechlin-asia-southeast1.collibra.com
      gcp-uk35.234.130.150techlin-gcp-uk.collibra.comtechlin-europe-west2.collibra.com
      gcp-us34.73.33.120techlin-gcp-us.collibra.comtechlin-us-east1.collibra.com
      Important We are migrating the Collibra Data Lineage service instances to new DNS names. The migration will be completed by October 31, 2024. You can already start referring to the new DNS names in addition to the existing ones. If you have network infrastructure that requires traffic to be explicitly configured, we recommend that you adjust your configurations to also accommodate the new DNS names by October 31, 2024, to mitigate any interruptions of service. Removing any existing DNS names is not required yet during this migration, and should not be done before the end of October 2024.

      The IP addresses for the new DNS names are different from the existing ones. We recommend that you only use the new DNS names in your network configurations, as the IP addresses for the new instances are subject to change periodically. If you need to use the IP addressed for the new instances in your network configuration, we recommend using a command line utility like nslookup to query the DNS and obtain the mapping between domain name and IP address. For more information, see the Technical Lineage DNS Change article in the Support Portal.

      Note 

      The lineage harvester connects to different Collibra Data Lineage service instances based on your geographic location and cloud provider. If your location or cloud provider changes, the lineage harvester rescans all your data sources.

      • You have to allow all Collibra Data Lineage service instances in your geographic location.
      • Regardless of your geographic location, you have to allow one of the following. This is required to retrieve the API Key. No other data is transferred.
      • RegionCurrent service instances

        New service instances

         IP addressDNS nameDNS name
        aws-ca15.222.200.199techlin-aws-ca.collibra.comtechlin-ca-central-1.collibra.com
        aws-eu18.198.89.106techlin-aws-eu.collibra.comtechlin-eu-central-1.collibra.com
        aws-us54.242.194.190techlin-aws-us.collibra.comtechlin-us-east-1.collibra.com
        gcp-us34.73.33.120techlin-gcp-us.collibra.comtechlin-us-east1.collibra.com

      In addition, we highly recommend that you always allow the techlin-aws-us instance as a backup, in case the lineage harvester cannot connect to other Collibra Data Lineage service instances.

      RegionCurrent service instances

      New service instances

       IP addressDNS nameDNS name
      aws-us54.242.194.190techlin-aws-us.collibra.comtechlin-us-east-1.collibra.com