Before You Install on Self-hosted Kubernetes
Prerequisites
- Review the self-hosted Kubernetes architecture described in Installing Collibra Data Quality & Observability on Self-hosted Kubernetes.
- Verify that your system meets the System Requirements.
Obtain credentials
Kubernetes stores credentials in the form of secrets. Secrets are base64 encoded files that you can mount into application containers and that application components can reference at runtime. You use pull secrets to access secured container registries to obtain application containers.
Note Deploying containers directly from the Collibra image repository is not recommended. You should only access the Collibra image registry for the initial download and validation of Docker images. After this, you should upload and store images to your private registry to provide you control over when the images are updated and eliminate any operational dependencies on Collibra's repository.
SSL certificates
To enable SSL for secure access to DQ Web, a keystore that contains a signed certificate, keychain, and private key is required. This keystore must be available in the target namespace before you deploy Collibra DQ.
Note By default, Collibra DQ looks for a secret called dq-ssl-secret
to find the keystore.
Note Although it is possible to deploy with SSL disabled, is not recommended.
Cloud storage credentials
If you enable History Server, a distributed filesystem is required. Currently, Collibra DQ supports S3 and GCS for Spark history log storage.
Note Azure Blob and HDFS on the near term roadmap.
Target storage system | Credentials requirements |
---|---|
S3 | An IAM Role with access to the target bucket needs to be attached to the Kubernetes nodes of the namespace where Collibra DQ is being deployed. |
GCS | You must create a secret from the JSON key file of a service account with access to the log bucket. The secret must be available in the namespace before you deploy Collibra DQ. By default, Collibra DQ looks for a secret called spark-gcs-secret , if GCS is enabled for Spark history logs. You can change this via a helm chart argument. |
Container pull secret
Collibra Data Quality & Observability containers are stored in a secured repository in Google Container Registry. For Collibra DQ to successfully pull the containers when deployed, a pull secret with access to the container registry must be available in the target namespace.
Note By default, Collibra DQ looks for a pull secret named dq-pull-secret
. You can change this via a helm chart argument.
Spark service account
To enable DQ Agent and the Spark driver to create and destroy compute containers, you must have a service account with a role that allows get/list/create/delete operations on pods/services/secrets/configMaps in the target namespace. By default, Collibra DQ attempts to create the required service account and the required RoleBinding to the default Edit role. Edit is a role that is generally available in a Kubernetes cluster. If the Edit role is not available, you can manually create it.
Configure access to the target platform
To deploy anything to a Kubernetes cluster, the first step is to install the required client utilities and configure access:
- kubectl: The main method of communication with a Kubernetes cluster. All configuration or introspection tasks will be preformed using kubectl.
- helm v3: Used to deploy the Collibra DQ Helm chart without hand coding manifests.
After you install the utilities, the next step is to configure a kube-context that points to and authenticates to the target platform. On cloud platforms like GKE and EKS, this process is completely automated through their respective CLI utilities.
aws eks --region <region-code> update-kubeconfig --name <cluster_name>
gcloud container clusters get-credentials <cluster-name>
In private clouds, this process will vary from organization to organization, however, the platform infrastructure team should be able to provide the target kube-context entry.
Prepare secrets
Once access to the target platform is confirmed, you can prepare the namespace.
kubectl create namespace <namespace>
Note The namespace may already be allocated by the platform team.
Create a container pull secret
Note For details on how to install Collibra DQ on Kubernetes with docker containers, see Install on Self-hosted Kubernetes .
JSON key file credential
kubectl create secret docker-registry dq-pull-secret \
--docker-server=<cdq-registry-server> \
--docker-username=_json_key \
--docker-email=<service-account-email> \
--docker-password="$(cat /path/to/key.json)" \
--namespace <namespace>
Short lived access token
kubectl create secret docker-registry dq-pull-secret \
--docker-server=<cdq-registry-server> \
--docker-username=oauth3accesstoken \
--docker-email=<service-account-email> \
--docker-password="<access-token-text>" \
--namespace <namespace>
Warning GCP Oauth tokens are usually only good for 1 hour. This type of credential is excellent if the goal is to pull containers into a private registry. It can be used as the pull secret to access containers directly, however, the secret would have to be recreated with a fresh token before restarting any of the Collibra DQ components.
Create a GCS credential secret
kubectl create secret generic spark-gcs-secret \
--from-file /path/to/keystore.jks \
--namespace <namespace>
Warning The file name that you use in the --from-file
argument should be spark-gcs-secret. If the file name is anything else, you must include an additional argument specifying the gcs secret name in the Helm command.