Deploy on Self-hosted Kubernetes
After you have performed the steps in Install on self-hosted Kubernetes, you can begin the deployment of Collibra DQ.
Download Helm Chart Files
After you have your Collibra DQ license, you will receive an email from Collibra that includes the Helm Charts as zip files. Unzip the files on a Linux-compatible deployment location.
The root directory, dq
, contains the following directories and files:
drwxrwxr-x -- templates
. Directory that contains a set of YAML files, including:- k8s-enpoint.secret.yaml
- _helpers.tpl
- hadoop-conf.yaml
- serviceaccount.yaml
- krb5-conf.yaml
- rbac.yaml
- dq-secret.yaml
- keytab-secret.yaml
drwxrwxr-x -- charts
. Directory that contains details for the core containers, including.- metastore
- owl-livy
- owl-agent
- owl-web
- spark-history-server
-rw-rw-r-- Chart.yaml
. File that contains the DQ Helm chart meta-information.-rw-rw-r-- values.yaml
. File that contains the DQ configuration values.
You may pass parameters using one of two methods:
- Using the
values.yaml
file (or) - Using the Helm set commands.
Note The Helm set commands take precedence over the values.yaml
file.
Minimal Install Settings
This example provided in this section installs the DQ Web, DQ Agent, and DQ Metastore. Collibra DQ is inaccessible until you manually add an Ingress or another type of externally accessible service.
Warning
All of the following examples will pull containers directly from the Collibra DQ secured container registry. In most cases, InfoSec policies require that containers are sourced from a private container repository controlled by the local Cloud Ops team. Make sure to to add --set global.image.repo=</url/of/private-repo>
so that you use only approved containers.
Note The DQ Metastore container must start first as the other containers use it to write data. On your initial deployment, the other containers might start before the metastore and fail.
helm upgrade --install --namespace <namespace> \
--set global.version.dq=<cdq-version> \
--set global.version.spark=<cdq-spark-version> \
--set global.configMap.data.license_key=<cdq-license-key> \
--set global.configMap.data.license_name=<your-license-name> \
--set global.web.admin.email=${email} \
--set global.web.admin.password=${password} \
--set global.web.service.type=ClusterIP \
--set global.image.repo=<pathTolmageRepo> \
<deployment-name> \
/path to the helm chart root folder>
Value | Description |
---|---|
<namespace>
|
Enter the namespace that you created for this deployment. |
<cdq-version>
|
Enter the version from the web image suffix. For example, 2023.11 from the image, dq-web:2023.11. |
<cdq-spark-version>
|
Enter the Spark version from the Spark image suffix. For example, 3.4.1-2023.11 from the image, spark:3.4.1-2023.11. |
<cdq-license-key>
|
Enter the license key provided to you by Collibra. |
<your-license-name>
|
Enter the license name provided to you by Collibra. |
${email}
|
Enter the default admin user email associated with the admin account. |
${password}
|
Enter the default admin user password for the admin account. The password must adhere to the following password policy:
Note If a password that does not meet the password policy is entered, the install process proceeds as though the password is accepted, but the admin user becomes locked out. If this occurs, rerun the Helm command with a password that meets the password policy and restart the web pod. |
<pathTolmageRepo>
|
This is your private registry key, where the Collibra images are available. When this is not provided, you will pull the images from the Collibra image registry, for which you should create a pull secret with the repo key provided by Collibra. See Install on Self-hosted Kubernetes for more details. |
<deployment-name>
|
Any name of your choice for this deployment. |
Note If you optionally pass credentials for the Postgres Metastore, ensure that you do not use the $
symbol in the global.metastore.pass
variable, as it is an unsupported special character for Postgres Metastore passwords.
The number of possible customizations is quite extensive and provides a great deal of flexibility for a wide variety of platforms. However, when deploying on a known platform (including EKS, GKE, and AKS), the number of required inputs is limited. In common cases, run a single CLI command including basic parameters like disable history server, configure the storage bucket for logs, specify the image repository, and so on.
Including an Externally Accessible Service
The following examples install the Collibra DQ with an externally accessible service.
Minimal Install with an Externally Accessible Service
The following example performs the minimal install and adds a preconfigured NodePort or LoadBalancer service to provide access to the Web.
Warning A LoadBalancer service type requires that the Kubernetes platform is integrated with a Software Defined Network solution. This will generally be true for the Kubernetes services offered by major cloud vendors. Private cloud platforms more commonly use Ingress controllers. Check with the infrastructure team before attempting to use LoadBalancer service type.
helm upgrade --install --namespace <namespace> \
--set global.version.dq=<cdq-version> \
--set global.version.spark=<cdq-spark-version> \
--set global.configMap.data.license_key=<cdq-license-key> \
--set global.configMap.data.license_name=<your-license-name> \
--set global.web.admin.email=${email} \
--set global.web.admin.password=${password} \
--set global.web.service.type=<NodePort || LoadBalancer> \
<deployment-name> \
/path/to/chart/dq
Install with SSL Enabled
The following example performs the install with an externally accessible service, but with SSL enabled.
Note
Ensure you have already deployed a keystore containing a key to the target namespace with a secret name that matches the global.web.tls.key.secretName
argument (dq-ssl-secret by default). Also, ensure that the secret's key name matches the global.web.tls.key.store.name
argument (dqkeystore.jks by default).
helm upgrade --install --namespace <namespace> \
--set global.version.dq=<cdq-version> \
--set global.version.spark=<cdq-spark-version> \
--set global.configMap.data.license_key=<cdq-license-key> \
--set global.configMap.data.license_name=<your-license-name> \
--set global.web.admin.email=${email} \
--set global.web.admin.password=${password} \
--set global.web.service.type=<NodePort || LoadBalancer> \
--set global.web.tls.enabled=true \
--set global.web.tls.key.secretName=dq-ssl-secret \
--set global.web.tls.key.alias=<key-alias> \
--set global.web.tls.key.type=<JKS || PKCS12> \
--set global.web.tls.key.pass=<keystore-pass> \
--set global.web.tls.key.store.name=keystore.jks \
<deployment-name> \
/path/to/chart/dq
Install on OpenShift 4.x with SSL Enabled
The following example performs the install with an externally accessible service on an OpenShift project with RunAsAny
enforced, with SSL enabled.
helm upgrade --install --namespace <namespace> \
--set global.version.dq=<cdq-version> \
--set global.version.spark=<cdq-spark-version> \
--set global.configMap.data.license_key=<cdq-license-key> \
--set global.configMap.data.license_name=<your-license-name> \
--set global.web.admin.email=${email} \
--set global.web.admin.password=${password} \
--set global.web.service.type=<NodePort || LoadBalancer> \
--set global.web.tls.enabled=true \
--set global.web.tls.key.secretName=dq-ssl-secret \
--set global.web.tls.key.alias=<key-alias> \
--set global.web.tls.key.type=<JKS || PKCS12> \
--set global.web.tls.key.pass=<keystore-pass> \
--set global.web.tls.key.store.name=keystore.jks \
--set global.security.securityContextConstraint.runAsAny=true \
<deployment-name> \
/path/to/chart/dq
Install with History Server for GCS Log Storage
The following example performs the install with an externally accessible service and Spark History Server enabled. In this example, the target log storage system is GCS.
Note For Collibra DQ to be able to write Spark logs to GCS, create a secret from the JSON key file of a service account that has access to the log bucket. For more detailed information, see System Requirements.
helm upgrade --install --namespace <namespace> \
--set global.version.dq=<cdq-version> \
--set global.version.spark=<cdq-spark-version> \
--set global.configMap.data.license_key=<cdq-license-key> \
--set global.configMap.data.license_name=<your-license-name> \
--set global.web.admin.email=${email} \
--set global.web.admin.password=${password} \
--set global.web.service.type=<NodePort || LoadBalancer> \
--set global.spark_history.enabled=true \
--set global.spark_history.logDirectory=gs://logs/spark-history/ \
--set global.spark_history.service.type=<NodePort || LoadBalancer> \
--set global.cloudStorage.gcs.enableGCS=true \
<deployment-name> \
/path/to/chart/dq
Install with History Server for S3 Log Storage
The following example performs the install with an externally accessible service and Spark History Server enabled. In this example, the target log storage system is S3.
Note For Collibra DQ to be able to write Spark logs to S3, make sure that an Instance Profile IAM Role with access to the log bucket is attached to all nodes serving the target namespace. For more detailed information, see System Requirements.
Important We currently only support one set of S3 credentials to be configured in the Spark Server at a time within a single DQ Job. If you are using a Spark History Server bucket and an S3 connection, the S3 credentials must also have access to the S3 Spark History Server bucket.
helm upgrade --install --namespace <namespace> \
--set global.version.dq=<cdq-version> \
--set global.version.spark=<cdq-spark-version> \
--set global.configMap.data.license_key=<cdq-license-key> \
--set global.configMap.data.license_name=<your-license-name> \
--set global.web.admin.email=${email} \
--set global.web.admin.password=${password} \
--set global.web.service.type=<NodePort || LoadBalancer> \
--set global.spark_history.enabled=true \
--set global.spark_history.logDirectory=s3a://logs/spark-history/ \
--set global.spark_history.service.type=<NodePort || LoadBalancer> \
--set global.cloudStorage.s3.enableS3=true \
<deployment-name> \
/path/to/chart/dq
Install with External DQ Metastore
The following example performs the install with an externally accessible service and an external metastore, for example AWS RDS, Google Cloud SQL, or just PostgresSQL on its own instance.
Warning Collibra DQ currently supports PostgreSQL 9.6 and newer.
helm upgrade --install --namespace <namespace> \
--set global.version.dq=<cdq-version> \
--set global.version.spark=<cdq-spark-version> \
--set global.configMap.data.license_key=<cdq-license-key> \
--set global.configMap.data.license_name=<your-license-name> \
--set global.web.admin.email=${email} \
--set global.web.admin.password=${password} \
--set global.web.service.type=<NodePort || LoadBalancer> \
--set global.metastore.enabled=false
--set global.configMap.data.metastore_url=jdbc:postgresql://<host>:<port>/<database>
--set global.configMap.data.metastore_user=<user> \
--set global.configMap.data.metastore_pass=<password> \
<deployment-name> \
/path/to/chart/dq
Warning The $
symbol is not a supported special character in your Postgres Metastore password.
Install with History Server for Windows Azure Storage Blob (WASB) Log Storage
The following example performs the install with an externally accessible service and Spark History Server enabled. In this example, the target log storage system is Windows Azure Storage Blob (WASB). However, wherever you see azureblobstorage
, you can insert your own storage solution (for example, ADLS, S3, or GCS).
Note To access Azure Blob Storage, you must have the correct permissions to the Storage Account Key. For more information, go to Connecting to Azure Blob Storage.
helm upgrade --install --namespace <namespace> \
--set global.version.dq=<cdq-version> \
--set global.version.spark=<cdq-spark-version> \
--set global.configMap.data.license_key=<cdq-license-key> \
--set global.configMap.data.license_name=<your-license-name> \
--set global.web.admin.email=${email} \
--set global.web.admin.password=${password} \
--set global.web.service.type=<NodePort || LoadBalancer> \
--set global.spark_history.enabled=true \
--set global.spark_history.logDirectory=wasbs://spark-history[email protected]/ \
--set global.spark_history.service.type=<NodePort || LoadBalancer> \
--set global.cloudStorage.wasbs.enableWASBS=true \
--set global.cloudStorage.wasbs.storageContainerName=spark-history-logs \
--set global.cloudStorage.wasbs.storageAccountName=azureblobstorage \
--set global.cloudStorage.wasbs.storageAccountKey=XXXXXXXXXXXXXXXXXXXXXXXXXXX
<deployment-name> \
/path/to/chart/dq
Troubleshooting
Troubleshooting self-hosted Kubernetes Install