Release 2023.11
Release Information
- Release date of Collibra Data Quality & Observability 2023.11: November 20, 2023
- Publication dates:
- Release notes: November 8, 2023
- Documentation Center: November 13, 2023
Highlights
- Pushdown
We're excited to announce that Pushdown for BigQuery is now generally available! Pushdown is an alternative compute method for running DQ jobs, where Collibra DQ submits all of the job's processing directly to a SQL data warehouse, such as BigQuery. When all of your data resides in BigQuery, Pushdown reduces the amount of data transfer, eliminates egress latency, and removes the Spark compute requirements of a DQ job.
- UI Redesign
New installs of Collibra DQ come with REACT_MUI and UX_REACT_ON admin flags set to TRUE by default. Additionally, if a pre-existing install of Collibra DQ had these flags set to FALSE, they are now set to TRUE. While you can still modify these flags from the Admin Console


- Spark Version Update
We’ve upgraded our out-of-the-box Apache Spark version from 3.2.0 to 3.4.1. We strongly encourage organizations on Standalone deployments of Collibra DQ to upgrade to the latest Spark package to utilize of the new features and address some of the major vulnerabilities with Spark 3.2 or earlier versions. Additionally, Collibra DQ support for Spark 2.x will be limited as of Collibra DQ 2024.01, as Spark 2.x has reached its end of life.
- Collibra AI
We're delighted to announce that Collibra AI is now available for private beta testing! Collibra AI introduces automated SQL rule writing capabilities on the Rule Workbench and Dataset Overview that help you accelerate the discovery, curation, and visualization of your data. Contact your Collibra CSM for more details about participating in this exciting private beta.
Important
Changes for Kubernetes Deployments
We've updated the Helm Chart name from owldq
to dq
. For Helm-based upgrades, point to the new Helm chart while maintaining the same release name. Please update your Helm install command by referring to the renamed parameters in the values.yaml file. It is also important to note that the pull secret has changed from owldq-pull-secret
to dq-pull-secret
.
Further, following deployment, your existing remote agent name will change. For example, if your agent name is owldq-owl-agent-collibra-dq, the new agent name will be dq-agent-collibra-dq. If your organization uses APIs for development, ensure that you upgrade AGENT name configurations in your environments.
Lastly, when you deploy using the new Helm Charts, new service (Ingress/Load Balancer) names are created. This changes the IP address of the service and requires you to reconfigure your Load Balancer with the new IP.
Please see the expandable sections below for more details about specific changes.
Old Key | Renamed Key |
---|---|
global.version.owl | global.version.dq |
global.image.owlweb | global.image.web |
global.image.owlagent | global.image.agent |
Parameter | Old Default Value | New Default Value |
---|---|---|
global.mainChart | owldq | dq |
global.image.pullSecret.name | owldq-pull-secret | dq-pull-secret |
global.web.key.secretName | owldq-ssl-secret | dq-ssl-secret |
global.image.web.name | owl-web | dq-web |
global.image.agent.name | owl-agent | dq-agent |
global.image.livy.name | owl-livy | dq-livy |
global.image.spark.name | spark | dq-spark |
Note
If your organization has a standalone deployment of Collibra DQ with SSL enabled for DQ Web, and both DQ Web and DQ Agent are on the same VM or server, we recommend upgrading directly to Collibra DQ 2023.11.3 patch version instead of 2023.11. For more information, see the Maintenance Updates section below.
Migration Updates
Important This section only applies if you are upgrading from a version older than Collibra DQ 2023.09 on Spark Standalone. If you have already followed these steps during a previous upgrade, you do not have to do this again.
We have migrated our code to a new repository for improved internal procedures and security. Because owl-env.sh jar files are now prepended with dq-*
instead of owl-*
, if you have automation procedures in place to upgrade Collibra DQ versions, you can use the RegEx replace regex=r"owl-.*-202.*-SPARK.*\.jar|dq-.*-202.*-SPARK.*\.jar"
to update the jars.
Additionally, please note the following:
- Standalone Upgrade Steps When upgrading from a Collibra DQ version before 2023.09 to a Collibra DQ version 2023.09 or later on Spark Standalone, the upgrade steps have changed.
- Open a terminal session.
- Move the old jars from the owl/bin folder with the following commands.
- Copy the new jars into the owl/bin folder from the extracted package.
- Copy the latest
owlcheck
andowlmanage.sh
to /opt/owl/bin directory. - Start the Collibra DQ Web application.
- Start the Collibra DQ Agent.
- Validate the number of active services.
mv owl-webapp-<oldversion>-<spark301>.jar /tmp
mv owl-agent-<oldversion>-<spark301>.jar /tmp
mv owl-core-<oldversion>-<spark301>.jar /tmp
mv dq-webapp-<newversion>-<spark301>.jar /home/owldq/owl/bin
mv dq-agent-<newversion>-<spark301>.jar /home/owldq/owl/bin
mv dq-core-<newversion>-<spark301>.jar /home/owldq/owl/bin
Tip You may also need to run chmod +x owlcheck owlmanage.sh
to add execute permission to owlcheck
and owlmanage.sh
.
./owlmanage.sh start=owlweb
./owlmanage.sh start=owlagent
ps -ef | grep owl
Liveness Probe Updates
- Cloud Native
- When a Kubernetes pod service becomes unstable, a new liveness probe automatically deletes the pod to ensure the DQ agent stays alive and running. No further action is necessary for Cloud Native deployments; this note is strictly for informational purposes only.
- Standalone
- Because the implementation of the liveness probe for Kubernetes required a change in the
owlmanage.sh
file for Standalone installations, you need to follow the steps below to upgrade a Standalone deployment.- Important
If your organization has a Standalone installation of Collibra DQ, you must copy the latestowlmanage.sh
to /opt/owl/bin directory, as the file has changed.
- Because the implementation of the liveness probe for Kubernetes required a change in the
- Open a terminal session.
- Move the old dq-agent jar from the owl/bin folder with the following command.
- Copy the new dq-agent jar into the owl/bin folder from the extracted package.
- Copy the latest
owlmanage.sh
to /opt/owl/bin directory. - Start the Collibra DQ Agent.
- Validate the number of active services.
mv dq-agent-<oldversion>-<spark301>.jar /tmp
mv dq-agent-<newversion>-<spark301>.jar /home/owldq/owl/bin
Tip You may also need to run chmod +x owlmanage.sh
to add execute permission to owlmanage.sh
.
./owlmanage.sh start=owlagent
ps -ef | grep owl
New Features
Pushdown
- When running Profiling on Pushdown jobs, advanced level profiling is now an opt-in feature and does not run by default. Advanced Profile determines whether a string field contains various string numerics, calculates TopN, BottomN, and TopN Shapes, and detects the scale and precision of double fields. We've also included the Profile String Length setting in the Advanced Profile option on the Explorer
Settings modal.
Enhancements
Capabilities
- When using the Alert Builder, you can now create an alert for when a job run completes successfully. When you add an alert, you can now choose from two options:
- Dataset Run Alerts let you set an alert for when a job run meets a certain condition.
- Job Status lets you set an alert for when a job run completes.
- You can now configure arbitrary users as part of root user groups for cloud native deployments of Collibra DQ on Openshift.
- You can now use OAuth 2.0 to authenticate Trino connections.
- We've moved the /v2/getprofiledeltasbyrunid API to the V3 Profile API GET /v3/profile/deltas.
Pushdown
- You can now archive source break records from Redshift Pushdown jobs.
Integration
- The Unassigned DQ Job domain now has the Parent Community name appended to it. Given the unique community name constraint when using more than one tenant or instance, this change allows for the unique naming of Business Units. The new Community structure is as follows:
- Parent Community
- Business Unit Community
OR - Unassigned DQ Job + Parent Community
- DQ Job Domain
- DQ Job Asset
- Rulebook (definitions)
- DQ Rule Assets
- Rulebook (scores)
- DQ Metric Assets
- DQ Job Domain
- Business Unit Community
- Parent Community
- When on a dataset-level page, such as Findings or Profile, you can now select "Reset Integration" from the Integration dropdown menu on the metadata bar to realign mappings that may have changed in Collibra Platform. If you get an error message when your mappings are aligned, this action can reset the integration and allow you to proceed with Collibra DQ metadata ingestion.
- When a dataset integration with Collibra Platform is enabled, you can now view the Community, Sub-Community, and Domain hierarchy to which the integration is mapped in Collibra Platform from the metadata bar on all dataset-level pages. This increases the transparency of where Assets are created in Collibra Platform without needing to navigate away from Collibra DQ.
- Additionally, you can click any of the breadcrumbs to open the Community, Sub-Community, or Domain in Collibra Platform.
- When running Pushdown jobs with integrations enabled, the results now integrate into Collibra Platform successfully. Previously, you had to manually enable or disable the integration each time a Pushdown job ran.
- After integrating a dataset from the Admin Console, Findings or Dataset Manager pages, you can now click the "Data Quality Job" link in the new View in DIC column to open its corresponding Asset page in Collibra Platform.
- The GET /dgcjson endpoint is now included in the main Integrations API as GET /dgc/integrations/getdgcjson. Previously, this was located under the "UI Internal" section of Swagger.
Tip When an integration error occurs, check that the Community, Domain, and Asset names in Collibra Platform don't already exist from a previous integration.
DQ Cloud
- Collibra Edge now reflects Collibra DQ's new default Spark version 3.4.1.
Fixes
Capabilities
- When running a job to check for outliers where the lookback value is set to something other than the default 5, the minhistory now updates to the correct value in the metastore. (ticket #124063)
Note Manual overrides of -dlminhist on the command line do not save in the metastore.
- When using the new Collibra DQ UI, run date information now displays correctly. Previously, the job configuration of manual DQ job runs would override with hardcoded dates, which caused the upcoming scheduling date to only reflect the hardcoded date. (ticket #126345)
- When changing the assignment for a dataset with the new UI turned on for an SSO instance of Collibra DQ, existing SAML assignments now load correctly on the Findings and Assignments pages. (ticket 128063)
- When assigning users to the following roles, they can now access all appropriate Admin screens (ticket #128115):
- ROLE_CONNECTION_MANAGER
- ROLE_DATA_GOVERNANCE_MANAGER
- ROLE_DATASET_MANAGER
- ROLE_OWL_ROLE_MANAGER
- ROLE_USER_MANAGER
Platform
- When editing the Batch Name field of a Dataset Run Alert, the batch name distribution list no longer updates if the Batch Name field is empty or blank
" "
. (ticket #126763, 126796) - When reviewing the Outlier tab on the Findings page, outlier findings now expand correctly when you drill down into them. (ticket #126065)
- When creating an alert, you can now enter the special characters
! # $ % & ' * + - / = ? ^ _ ` . { | } ~
in an email address for the alert recipient. (ticket #126763)
DQ Integration
- When viewing user-defined or adaptive rules, Passing Fraction now reflects the points deducted from an individual rule, rather than the total rule score. Previously, the total breaks for the rule type of user-defined rules were used to generate the Passing Fraction, rather than the individual rule breaks. (tickets #124217, 127700)
- When using the configuration wizard to map your single-tenant Collibra DQ environment, you can now link your integration to an existing community and create new communities as part of your integration. (tickets #122948, 123426, 126227, 127044, 128345)
- When integrating a Collibra DQ dataset and setting up dimensions in Collibra Platform, the columns now display correctly in Collibra Platform after running the dataset in Collibra DQ. (ticket #126379)
Pushdown
- When adding outliers to a Pushdown dataset and running a job, the outlier configurations now render properly. Previously, when editing a job, one or more of the outliers did not display as expected. (ticket #124736)
DQ Cloud
- Fixed an issue that caused the Collibra DGC MDL Proxy to run out of memory under certain conditions.
DQ Security
Note
We've removed all existing classic UI JS libraries and their references from the Updated UI to address and prevent any potential security vulnerabilities.
The following image shows a chart of Collibra DQ security vulnerabilities arranged by release version.
The following image shows a table of Collibra DQ security metrics arranged by release version.
Updated UI
In addition to broader user interface and user experience enhancements, we've also added some impressive new features! The following table showcases some of the highlights.
Component | Description | Available in Classic |
---|---|---|
Metadata Bar |
The Metadata Bar is a dataset anchor that simplifies the navigation to some of your most frequently used pages, such as Dataset Overview, Profile, Rules, and Findings. It also provides quick insight into your dataset, such as the number of active rules, the data source from which the dataset was created, and whether or not your job is scheduled to run automatically. When an integration is set up, the metabar also allows you to easily enable or disable dataset metadata integrations into Collibra Platform. You can access the Metadata Bar on any dataset-level page, including:
|
No |
Dataset Overview |
Dataset Overview lets you query your dataset to discover key data points and insights, which you can convert to data quality rules entirely within the Dataset Overview modal. With the power to write SQL to query your dataset, you can accelerate the process of data discovery and reveal important insights in real time. Dataset Overview also allows private beta participants to leverage Collibra AI to automatically write and troubleshoot SQL for faster rule writing and advanced exploration of your dataset. See the Collibra AI private beta documentation to learn more. |
No |
Explorer |
The new workflow simplifies the process of creating a DQ job to run against a dataset. With just a few clicks, you can create a basic profile job in a matter of seconds instead of minutes. For a more advanced scan, the step-by-step guide walks you through the process, eliminating many of the more tedious elements of the classic Explorer. |
Yes |
Findings |
The new Findings page will feel similar to the classic page, but with a few important changes:
|
Yes |
Profile | While many of the same column- and dataset-level insights are unchanged from the classic UI, the presentation of information is now modernized for a crisper experience. | Yes |
Rules |
Dataset Rules lists all previously saved rules for a given dataset and provides an overview of their details, such as the definitions of SQL conditions, rule types, and whether or not rules pass validation checks. From here, you can access the Rule Workbench to create or edit a rule. The Rule Workbench replaces the classic Rule Builder, fusing an elegant SQL command line interface with the preview and advanced setting capabilities you expect from a modern SQL builder. Like the Dataset Overview, you can also use Collibra AI generated SQL to write and troubleshoot rules on the Rule Workbench. See the Collibra AI private beta documentation to learn more. We've also split Data Class and Template rules into their own pages to emphasize that they are independent of jobs and datasets and improve their overall organization. |
No |
Alerts | The new Alert Builder gives you an at-a-glance overview of all alerts for a particular dataset and simplifies adding new alerts and editing existing ones. | Yes |
Dataset Manager |
Dataset Manager provides a list of all datasets in your Collibra DQ environment, as well as a variety of management options, such as bulk actions, assigning datasets to data categories, and the ability to filter datasets by a variety of criteria. |
Yes |
Column Manager | Column Manager is a detailed breakdown of all the columns in the datasets in your Collibra DQ environment that shows key data points like data type, various ratios, and the Pass/Fail status of a given column. You can also bulk apply rules, data classes, and sensitive labels to selected columns. | No |
Report Dashboards |
The Reports section now has two new dashboards available in the updated UI:
|
No |
Connections |
The updated Connections page in the Admin Console does away with the connection tiles of the classic page in favor of a highly searchable and sortable paginated table format. The new page also features two tabs for Connections and the Drivers stored in your Collibra DQ environment. Additionally, when you add or edit a connection, the connection template is now organized in three tabs for Connection Details, Driver Properties, and Connection Variables. With these sections now clearly delineated, the process of creating or updating a connection is now much cleaner. |
Yes |
Admin Console | The Admin Console now lists each admin activity for better organization and simpler navigation than the tiles of the classic UI. | Yes |
Maintenance Updates
Explorer
- When using the SQL compiler on the dataset overview for remote files, the Compile button is disabled because the execution of data files at the Spark layer is unsupported.
- You cannot currently upload temp files from the new File Explorer page. This may be addressed in a future release.
- The Formatted view tab on the File Explorer page only supports CSV files.
- When creating a job, the Estimate Job step from the classic Explorer is no longer a required step. However, if incorrect parameters are set, the job may fail when you run it. If this is the case, return to the Sizing step and click Estimate next to Job Size before you Run the job.
Connections
- When adding a driver, if you enter the name of a folder that does not exist, a permission issue prevents the creation of a new folder.
- A workaround is to use an existing folder.
Admin
- When adding another external assignment queue from the Assignment Queue page, if an external assignment is already configured, the Test Connection and Submit buttons are disabled for the new connection. Only one external assignment queue can be configured at the same time.
- Due to security requirements, we've removed the ability for application administrators to add new local users from the User Management page in the Admin Console. All new users must use the Register link on the Collibra DQ sign in screen.
- When auto-approve is not configured, admin users can still manually approve new user requests and add roles to the new user from the User Management page.
Profile
- When adding a distribution rule from the Profile page of a dataset, the Combined and Individual options incorrectly have "OR" and "AND" after them.
- When using the Profile page, Min Length and Max Length does not display the correct string length. This will be addressed in an upcoming release.
Rules
- When creating a quick rule from the Data Preview tab of the Findings, Profile, or Rules pages, the Preview Limit and Run Time Limit do not honor the application default limits of 6 and 30, respectively. Instead, the Preview Limit and Run Time Limit are both incorrectly set to 0.
- While this will be addressed in the January (2024.01) release, a workaround is to manually edit these fields from the Rule Workbench
Settings modal.
- While this will be addressed in the January (2024.01) release, a workaround is to manually edit these fields from the Rule Workbench
Alerts
- Batch email updates are not currently working in the beta UI. This will be addressed in the January (2024.01) release.
- When editing the Batch Name of a job alert, there is a limitation that prevents you from editing the email address field associated with the batch alert.
Scorecards
- When creating a new scorecard from the Page dropdown menu, because of a missing function, you cannot currently create a scorecard.
- While a fix for this is planned for the September (2023.09) release, a workaround is to select the Create Scorecard workflow from the three dots menu instead.
Navigation
- The Dataset Overview function on the Metadata Bar is not available for remote files.
- The Dataset Overview modal throws errors for the following connection types:
- BigQuery (Pushdown and Pullup)
- Athena CDATA
- Oracle
- SAP HANA
- The Dataset Overview function throws errors when you run SQL queries on datasets from S3 and BigQuery connections.
Maintenance Updates
2023.11.3
- While configuring SSL after a fresh Standalone install or upgrade to Collibra DQ version 2023.11, the DQ Agent and Web now start as expected. (ticket #131078)
- With this fix, the DQ Agent now uses port 9101 by default to expose the Health Check API.
Note Ensure you select the latest corresponding Helm Chart when taking a maintenance update for Cloud Native deployments.
2023.11.4
- When synchronizing DQ rules without business units configured in Collibra DQ, you can now synchronize them to both root and sub-communities. (ticket #127044, 128138)