System Requirements

Before you install Collibra Data Quality & Observability, you need all of the following information to ensure an easy, successful installation process. This section only focuses on the requirements of Collibra DQ and does not take into account the connections to the data sources to ingest data.

Supported Web Browsers

Browser Version
Google Chrome (recommended) 70.0.3538.102 or newer
Mozilla Firefox 52.8.0 or newer
Safari 12.0.1 or newer

Encryption

Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction

PostgreSQL version

  • Collibra Data Quality & Observability comes prepackaged with version 11.4 of PostgreSQL.
  • Version 9.6.5 and above is supported if wanting to use an external metastore.

Collibra recommends installing the DQ Metastore in an external PostgreSQL metastore.

Installation packages/files (BOM)

  • demoscripts.tar.gz
  • log4j*
  • owlcheck
  • owl-core-2.1.0-jar-with-dependencies.jar
  • owl-webapp-2.1.0.jar
  • owl-agent-2.1.0.jar
  • setup.sh
  • owl-postgres.tar.gz
  • notebooks.tar.gz
  • owlmanage.sh

Installation-specific requirements

Important In the upcoming 2025.02 version of Collibra Data Quality & Observability, you must upgrade to Java 17 and Spark 3.5.3 to install and use Collibra Data Quality & Observability 2025.02.

Java and Spark compatibility matrix
Collibra Data Quality & Observability versionJava 8Java 11Java 17Spark versionsAdditional notes
2025.01 and earlier

Yes

Yes

No

  • 2.3.0 (Java 8 only)
  • 2.4.5 (Java 8 only)
  • 3.0.1 (Java 8 and 11)
  • 3.1.2 (Java 8 and 11)
  • 3.2.2 (Java 8 and 11)
  • 3.4.1 (Java 11 only)
 
2025.02

No

No

Yes

3.5.3 only 
2025.03

No

No

Yes

3.5.3 only 
2025.04

Yes

Yes

Yes

  • 2.3.0 (Java 8 only)
  • 2.4.5 (Java 8 only)
  • 3.0.1 (Java 8 and 11)
  • 3.1.2 (Java 8 and 11)
  • 3.2.2 (Java 8 and 11)
  • 3.4.1 (Java 11 only)
  • 3.5.3 (Java 17 only)

Important 
The Java 8 and 11 build profiles only contain the 2025.02 release and critical bug fixes addressed in 2025.03 and 2025.04. They do not contain any feature enhancements from the 2025.03 or 2025.04 releases.

Only the Java 17 build profile contains feature enhancements and bug fixes listed in the 2025.04 release notes.

2025.05

No

No

Yes

3.5.3 onlyFixes for Java 8 and 11 build profiles will be available only for critical and high-priority defects.
2025.06

No

No

Yes

3.5.3 onlyFixes for Java 8 and 11 build profiles will be available only for critical and high-priority defects.
2025.07

No

No

Yes

3.5.3 onlyFixes for Java 8 and 11 build profiles will be available only for critical and high-priority defects.
2025.08

No

No

Yes

3.5.3 onlyFixes for Java 8 and 11 build profiles will be available only for critical and high-priority defects.
2025.09

No

No

Yes

3.5.3 onlyFixes for Java 8 and 11 build profiles will be available only for critical and high-priority defects.

User permissions

Log in with a user account that has privileges to:

  • Create directories
  • Launch scripts
  • Start java, Spark, and Collibra DQ processes

SUDO is required if you are including the PostgreSQL metastore in the installation. SUDO is not required if you are using an external PostgreSQL metastore (recommended).

In addition, configure ULIMIT settings to 4096 or higher. DQ services typically consume approximately 428 threads, and each DQ job consumes an additional 400 threads. Setting ULIMIT to 4096 allows for approximately nine concurrent DQ jobs on a Standalone install.

System requirements

Supported operating systems

  • Red Hat Enterprise Linux 8.x
  • Red Hat Enterprise Linux 9.x

Hardware requirements

Small Tier - 16 Core, 128G RAM (r5.4xlarge / E16s v3)

Component RAM Cores
Web 2g 2
Postgres 2g 2
Spark 100g 10
Overhead 10g 2

Medium Tier - 32 Core, 256G RAM (r5.8xlarge / E32s v3)

Component RAM Cores
Web 2g 2
Postgres 2g 2
Spark 250g 26
Overhead 10g 2

Large Tier - 64 Core, 512G RAM (r5.16xlarge / E64s v3)

Component RAM Cores
Web 4g 3
Postgres 4g 3
Spark 486g 54
Overhead 18g 4

Important Collibra DQ requires a limit of 2TBs for large tier jobs. For DQ jobs that exceed 2TBs, you must filter down columns or rows.

Estimates

Sizing should allow headroom and based on peak concurrency and peak volume requirements. If concurrency is not a requirement, you just need to size for peak volume (largest tables). Best practice to efficiently scan is to scope the job by selecting critical columns. See Scaling your DQ Job for more information.

Bytes per Cell Rows Columns Gigabytes Gigabytes for Spark (3x)
16 1,000,000.00 25 0.4 1.2
16 10,000,000.00 25 4 12
16 100,000,000.00 25 40 120
16 1,000,000.00 50 0.8 2.4
16 10,000,000.00 50 8 24
16 100,000,000.00 50 80 240
16 1,000,000.00 100 1.6 4.8
16 10,000,000.00 100 16 48
16 1,000,000,000.00 100 1600 4800
16 100,000,000.00 100 160 480
16 1,000,000.00 200 3.2 9.6
16 10,000,000.00 200 32 96
16 100,000,000.00 200 320 960
16 1,000,000,000.00 200 3200 9600

Cluster

If your program requires more horsepower or (Spark) workers than the example tiers above which is fairly common in Fortune 500 companies than you should consider the horizontal and ephemeral scale of a cluster. Common examples include Amazon EMR and Cloudera CDP. Collibra DQ is built to scale up horizontally and can scale to hundreds of nodes.

Network requirements

Default Ports used by Collibra DQ

  • 5432 – PostgreSQL
  • 9000 – DQ Web
  • 9101 – Exposes the Health Check API to check that the DQ Agent is running and stable.

Other

  • If your current Spark version is 3.2.2 or older, Collibra strongly recommends upgrading to Spark 3.4.1 to address various critical vulnerabilities present in Spark core library, including Log4J. To determine which Spark version you are using, sign into your Collibra DQ instance and click the Collibra DQ version info in the upper-right corner of any page. The Spark Version lists your current Spark version.
    • If you are not already using Spark 3.4.1, follow the steps outlined in Upgrading Spark versions.

User permissions

Prerequisites

  • Kubernetes cluster -- EKS, GKE, AKS, Openshift, Rancher
  • Helm(v3)
  • kubectl
  • Cloud command line SDK, such as gcloud CLI, AWS CLI or similar
  • External PostgreSQL DB version 11.9 and above, storage size 100GB, cores 4 to 8 memory to 4 to 8 GB
  • Private container registry -- to store images
  • LoadBalancer -- IngressController -- Ingress
  • Egress networking access
  • Helm Chart
  • Images, image access key
  • Minimum pod requirement -- 2 cores, 2GB RAM
  • If you bring in your own Spark executor pod launch template, ensure that the service account used to launch Spark executor pods has the permission to do so. Refer to the executor launch template for more information.

System requirements

Supported Kubernetes versions

Collibra Data Quality & Observability supports Kubernetes versions 1.29 through 1.31.

Note As of February 2025, we recommend upgrading to Kubernetes version 1.29 or newer, as version 1.28 has reached its end of life.

Application system requirements

Component Processor Memory Storage
Collibra DQ Web 1 core 2 GB 10 MB PVC
DQ Agent 1 core 1 GB 100 MB PVC
DQ Metastore 1 core 2 GB 10 GB PVC
Spark* 2 cores 2 GB -

Note  * This is the minimum quantity of resources required to run an a Spark job in Kubernetes. This amount of resources would only provide the ability to scan a few megabytes of data with no more than a single job running at a given time. Proper sizing of the compute space must take into account the largest dataset that may be scanned, as well as the desired concurrency.

Network service considerations

DQ Web is the only required component that needs to be directly accessed from outside of Kubernetes. History Server is the only other component that can be accessed directly by users, however, it is optional.

If the target Kubernetes platform supports a LoadBalancer service type, you can configure the Helm Chart to directly deploy the externally accessible endpoint.

Note  For testing purposes, you can also configure the Helm chart to deploy a NodePort service type.

For the Ingress service type, deploy Collibra DQ without an externally accessible service and then attach the Ingress service separately. This applies when you use a third-party Ingress controller such as NGINX, Contour, etc.

Note  The Helm Chart is able to deploy an Ingress on GKE and EKS platforms, however, there is a wide variety of possible Ingress configurations that have not been tested.