Lineage harvester system requirements
You need to meet the system requirements to be able to install and run the lineage harvester.
Software requirements
Java Runtime Environment version 11.0.18 or newer, or OpenJDK 11.0.18 or newer.
JAVA_OPTS environment variable when you run the lineage harvester. For example, to process data from all data sources including the Snowflake data sources, take the following steps: -
Enter one of the following commands:
- If you use OpenJDK 16:
set JAVA_OPTS="-Djdk.module.illegalAccess=permit" - If you use OpenJDK 17 or higher:
set JAVA_OPTS="--add-opens=java.base/java.nio=ALL-UNNAMED"
- If you use OpenJDK 16:
-
In the same command line, enter the following command:
.\bin\lineage-harvester.bat full-sync
Note The set command is specific to the Windows Command Shell. The command is different if you are using PowerShell.
Enter the following command:
- If you use OpenJDK 16:
JAVA_OPTS="-Djdk.module.illegalAccess=permit" ./bin/lineage-harvester full-sync - If you use OpenJDK 17 or higher:
JAVA_OPTS="--add-opens=java.base/java.nio=ALL-UNNAMED" ./bin/lineage-harvester full-sync
Hardware requirements
You need to meet the hardware requirements to install and run the lineage harvester.
Minimum hardware requirements
You need the following minimum hardware requirements:
- 2 GB RAM
- 1 GB free disk space
Recommended hardware requirements
The minimum requirements are most likely insufficient for production environments. We recommend the following hardware requirements:
- 4 GB RAMTip 4 GB RAM is sufficient in most cases, but more memory could be needed for larger harvesting tasks. For instructions on how to increase the maximum heap size, see Technical lineage general troubleshooting.
- 20 GB free disk space
Network requirements
The lineage harvester uses the HTTPS protocol by default and uses port 443.
You need the following minimum network requirements:
- Firewall rules so that the lineage harvester can connect to:
- The host names of all data sources in the lineage harvester configuration file.
- All Collibra Data Lineage service instances in your geographic location:
- 15.222.200.199 (techlin-aws-ca.collibra.com)
- 18.198.89.106 (techlin-aws-eu.collibra.com)
- 13.228.38.245 (techlin-aws-sg.collibra.com)
- 54.242.194.190 (techlin-aws-us.collibra.com)
- 51.105.241.132 (techlin-azure-eu.collibra.com)
- 20.102.44.39 (techlin-azure-us.collibra.com)
- 35.197.182.41 (techlin-gcp-au.collibra.com)
- 34.152.20.240 (techlin-gcp-ca.collibra.com)
- 35.205.146.124 (techlin-gcp-eu.collibra.com)
- 34.87.122.60 (techlin-gcp-sg.collibra.com)
- 35.234.130.150 (techlin-gcp-uk.collibra.com)
- 34.73.33.120 (techlin-gcp-us.collibra.com)
Note The lineage harvester connects to different Collibra Data Lineage service instances based on your geographic location and cloud provider. If your location or cloud provider changes, the lineage harvester rescans all your data sources. You have to allow all Collibra Data Lineage service instances in your geographic location. In addition, we highly recommend that you always allow the techlin-aws-us instance as a backup, in case the lineage harvester cannot connect to other Collibra Data Lineage service instances.