Install the lineage harvester

Before you can use the lineage harvester, you need to download and install it. You can download the lineage harvester from the Collibra Community downloads page.

Requirements and permissions

  • Collibra Data Intelligence Platform.
  • You have purchased Collibra Data Lineage.
  • A global role with the following global permissions:
    • Catalog, for example Catalog Author
    • Data Stewardship Manager
    • Manage all resources
    • System administration
    • Technical lineage
  • A resource role with the following resource permissions on the community level in which you created the domain:
    • Asset: add
    • Attribute: add
    • Domain: add
    • Attachment: add
  • You meet the minimum system requirements.
  • You have added Firewall rules so that the lineage harvester can connect to:
    • The host names of all databases in the lineage harvester configuration file.
    • All Collibra Data Lineage service instances within your geographical location:
    • RegionCurrent service instances

      New service instances

       IP addressDNS nameDNS name
      aws-ca15.222.200.199techlin-aws-ca.collibra.comtechlin-ca-central-1.collibra.com
      aws-eu18.198.89.106techlin-aws-eu.collibra.comtechlin-eu-central-1.collibra.com
      aws-sg13.228.38.245techlin-aws-sg.collibra.comtechlin-ap-southeast-1.collibra.com
      aws-us54.242.194.190techlin-aws-us.collibra.comtechlin-us-east-1.collibra.com
      gcp-au35.197.182.41techlin-gcp-au.collibra.comtechlin-australia-southeast1.collibra.com
      gcp-ca34.152.20.240techlin-gcp-ca.collibra.comtechlin-northamerica-northeast1.collibra.com
      gcp-eu35.205.146.124techlin-gcp-eu.collibra.comtechlin-europe-west1.collibra.com
      gcp-sg34.87.122.60techlin-gcp-sg.collibra.comtechlin-asia-southeast1.collibra.com
      gcp-uk35.234.130.150techlin-gcp-uk.collibra.comtechlin-europe-west2.collibra.com
      gcp-us34.73.33.120techlin-gcp-us.collibra.comtechlin-us-east1.collibra.com
      Important We are migrating the Collibra Data Lineage service instances to new DNS names. The migration will be completed by October 31, 2024. You can already start referring to the new DNS names in addition to the existing ones. If you have network infrastructure that requires traffic to be explicitly configured, we recommend that you adjust your configurations to also accommodate the new DNS names by October 31, 2024, to mitigate any interruptions of service. Removing any existing DNS names is not required yet during this migration, and should not be done before the end of October 2024.

      The IP addresses for the new DNS names are different from the existing ones. We recommend that you only use the new DNS names in your network configurations, as the IP addresses for the new instances are subject to change periodically. If you need to use the IP addressed for the new instances in your network configuration, we recommend using a command line utility like nslookup to query the DNS and obtain the mapping between domain name and IP address. For more information, see the Technical Lineage DNS Change article in the Support Portal.

      Note The lineage harvester connects to different instances based on your geographic location and cloud provider. If your location or cloud provider changes, the lineage harvester rescans all your data sources. You have to allow all Collibra Data Lineage service instances in your geographic location. In addition, we highly recommend that you always allow the techlin-aws-us instance as a backup, in case the lineage harvester cannot connect to other Collibra Data Lineage service instances.

Steps

  1. Download the newest lineage harvester.
  2. Unzip the archive.
    You can now access the lineage harvester folder. The lineage harvester folder name is unique per version.
  3. Start the lineage harvester to create an empty lineage harvester configuration file by entering the following command:
    • Windows: .\bin\lineage-harvester.bat
    • For other operating systems: chmod +x bin/lineage-harvester and then bin/lineage-harvester
    An empty configuration file is created in the config folder.
  4. The lineage harvester is installed automatically. You can check the installation by running ./bin/lineage-harvester --help.

What's next?

Prepare the lineage harvester configuration file.