Release Notes

Important 

Disclaimer - Failure to upgrade to the most recent release of the Collibra Service may adversely impact the security, reliability, availability, integrity, performance or support (including Collibra’s ability to meet its service levels) of the Service. Collibra hereby disclaims all liability, express or implied, for any reduction in the security, reliability, availability, integrity, performance or support of the Service to the extent the foregoing would have been avoided had you allowed Collibra to implement the most current release of the Service when scheduled by Collibra. Further, to the extent your failure to upgrade the Service impacts the security, reliability, availability, integrity or performance of the Service for other customers or users of the Service, Collibra may suspend your access to the Service until you have upgraded to the most recent release.

Release 2024.02

Release Information

  • Expected release date of Collibra Data Quality & Observability 2024.02: February 26, 2024
  • Publication dates:
    • Release notes: January 22, 2024
    • Documentation Center: February 4, 2024

Highlights

    Archive Break Records

    Pullup
    When rule breaks are stored in the PostgreSQL Metastore with link IDs assigned, you can now download a CSV file containing the details of the rule breaks and link ID columns via the Findings page Rules tab Actions Rule Breaks modal.

    Pushdown
    In order to completely remove sensitive data from the PostgreSQL Metastore, you can now enable Data Preview from Source in the Archive Break Records section of the Explorer Settings. When you enable Data Preview from Source, data preview records do not store in the PostgreSQL Metastore.

    Previews of break records associated with Rules, Outliers, Dupes, and Shapes breaks on the Findings page reflect the current state of the records as they appear in your data source. With this option disabled, the preview records that display in the web app are snapshots of the PostgreSQL Metastore records at runtime. This option is disabled by default.

    Additionally, with Archive Break Records enabled and a link ID column assigned, you can now download a CSV or JSON file containing the details of the breaks and link ID columns via the Findings page Rules, Outliers, Dupes, or Shapes tab Actions Rule Breaks modal.

    Lastly, when Archive Break Records is enabled, you can now optionally enter an alternative dataset-level schema name to store source break records, instead of the schema provided in the connection.
    Integration
    You can now see the Overview DQ Score on an Asset when searching via Data Marketplace. This improves your ability to browse the data quality scores of Assets without opening their Asset Pages.

Important 
Changes for Kubernetes Deployments
As of Collibra DQ version 2023.11, we've updated the Helm Chart name from owldq to dq. For Helm-based upgrades, point to the new Helm Chart while maintaining the same release name. Please update your Helm install command by referring to the renamed parameters in the values.yaml file. It is also important to note that the pull secret has changed from owldq-pull-secret to dq-pull-secret.

Further, following deployment, your existing remote agent name will change. For example, if your agent name is owldq-owl-agent-collibra-dq, the new agent name will be dq-agent-collibra-dq. If your organization uses APIs for development, ensure that you upgrade AGENT name configurations in your environments.

Lastly, when you deploy using the new Helm Charts, new service (Ingress/Load Balancer) names are created. This changes the IP address of the service and requires you to reconfigure your Load Balancer with the new IP.

Please see the expandable sections below for more details about specific changes.

Note 
If your organization has a standalone deployment of Collibra DQ with SSL enabled for DQ Web, and both DQ Web and DQ Agent are on the same VM or server, we recommend upgrading directly to Collibra DQ 2023.11.3 patch version instead of 2023.11. For more information, see the Maintenance Updates section below.

Migration Updates

Important This section only applies if you are upgrading from a version older than Collibra DQ 2023.09 on Spark Standalone. If you have already followed these steps during a previous upgrade, you do not have to do this again.

We have migrated our code to a new repository for improved internal procedures and security. Because owl-env.sh jar files are now prepended with dq-* instead of owl-*, if you have automation procedures in place to upgrade Collibra DQ versions, you can use the RegEx replace regex=r"owl-.*-202.*-SPARK.*\.jar|dq-.*-202.*-SPARK.*\.jar" to update the jars.

Additionally, please note the following:

  • Standalone Upgrade Steps
  • When upgrading from a Collibra DQ version before 2023.09 to a Collibra DQ version 2023.09 or later on Spark Standalone, the upgrade steps have changed.

Enhancements

Capabilities

  • SQL assistant for data quality (beta) now allows you to select between four new options to generate prompts for:
    • Categorical: Writes a SQL query to detect categorical outliers.
    • Dupe: Writes a SQL query to detect duplicate values.
    • Record: Writes a SQL query to find values that appear on a previous day but not for the next day.
    • Pattern: Writes a SQL query to find infrequent combinations that appear less than 5 percent of the time in the columns you specify.
  • When using the Dataset Overview, you can now click the -q button to load the contents of the dataset source query into the SQL editor.
  • When using the Dataset Overview, you can now use Find and Replace to find any string in the SQL editor and replace it with another.
  • When a finding is assigned to a ServiceNow incident and the ServiceNow connection has Publish Only enabled on the ServiceNow Configuration modal in the Admin screens, this finding record is pushed to ServiceNow as it was in previous versions, but the status is no longer linked. This means that you can adjust the statuses on the ServiceNow Incident as you wish and the finding in DQ. Whereas before, the ServiceNow Incident had to be closed in order for the DQ finding to be resolved.
  • From the Settings page in Explorer, you can now select the Core Fetch Mode option to allow SQL queries with spaces to run successfully. When selected, this option adds -corefetchmode to the command line to enable the core to fetch the query from the load options table and override the -q.
  • When attempting to connect to NetApp or Amazon S3 endpoints in URI format with the HTTPS option selected, you can now add the following properties to the Properties tab on Amazon S3 connection templates to successfully create connections:
    • For Amazon S3 endpoint URI: s3-endpoint=s3
    • For NetApp: s3-endpoint=netapp
  • When using the Pulse View, you can now select a few new options from the Show Failed dropdown menu, including Failed Job Runs and Failing Scores. Previously, the Show Failed option only displayed job runs that previously failed.
  • You can now use uppercasing in secondary datasets and rule references.
  • You can now configure arbitrary users as part of the root user group for DQ pod deployment.
  • Due to security concerns, we have removed the license key from the job logs.

Platform

  • We've upgraded the following drivers to their latest versions:
  • Driver Version
    Databricks 2.6.36
    Google BigQuery

    1.5.2.1005

    Dremio 24.3.0
    Snowflake 3.14.4
  • You can now enable multi-tenancy for a notebook API.
  • We now apply the same Spark CVE fixes that are applied to Cloud Native deployments of Collibra DQ to Standalone deployments.

Pushdown

  • From the Settings page on Explorer, you can now select Date or DateTime (TimeStamp) from the Date Format dropdown menu to substitute the runDate and runDateEnd at runtime.
  • To conserve memory and processing resources, the results query now rolls up outliers and shapes, and the link IDs no longer persist to the Metastore.
  • All rules from the legacy Rule Library function correctly for Snowflake and Databricks Pushdown except for Having_Count_Greater_Than_One and Two_Decimal_Places when Link ID is enabled. See the Known Limitations section below for more information.
  • You can now use cross-dataset rules that traverse across connections on the same data source.

Fixes

Capabilities

  • While editing the command line of a job containing an outlier by replacing -by HOUR with -tbin HOUR, the command line no longer reverts to its original state after profiling completes. (ticket #126764)
  • When exporting the job log details to CSV, Excel, PDF, or Print from the Jobs page, the exported data now contains all rows of data. (ticket #129832)
    • Additionally, when exporting the job log details to PDF from the Jobs page, the PDF file now contains the correct column headers and data. (ticket #129832)
  • When working with the Alert Builder, you no longer see a “No Email Servers Configured” message despite having correctly configured SMTP settings. (ticket #127520)

DQ Integration

  • When integrating data from an Athena connection, you can now use the dropdown menu in rules to map an individual column to a Rule in Collibra Data Intelligence Cloud. (ticket #125152, 126150)

Pushdown

  • When archive breaking records is enabled, statements containing backticks ` or new lines are properly inserted into the source system. (ticket #130122)
  • Snowflake Pushdown jobs with many outlier records either dropped or added, new limits to memory usage now prevent out-of-memory issues. (ticket #126284)

Known Limitations

  • When Link ID is enabled for a Snowflake or Databricks Pushdown job, Having_Count_Greater_Than_One and Two_Decimal_Places do not function properly.
    • The workaround for Having_Count_Greater_Than_One is to manually add the Link ID to the group by clause in the rule query.
    • The workaround for Two_Decimal_Places is to add a * to the inner query.

DQ Security

Note If your current Spark version is 3.2.2 or older, we recommend upgrading to Spark 3.4.1 to address various critical vulnerabilities present in the Spark core library, including Log4j.

Release 2024.01

Release Information

  • Release date of Collibra Data Quality & Observability 2024.01: January 29, 2024
  • Publication dates:
    • Release notes: January 4, 2024
    • Documentation Center: January 29, 2024

Highlights

    Integration
    We’ve introduced several new features and enhancements to significantly improve the integration experience.

    • Aggregation paths on tables are now set by default, simplifying the configuration within the Collibra DQ Admin Console.
    • The new Quality tab is now available as part of the latest UI updates for Asset pages in Collibra Data Intelligence Cloud for private beta participants, giving you at-a-glance insights into the quality of your assets. These insights include:
      • Score and dimension roll-ups.
      • Column, data quality rule, data quality metric, and row overviews.
      • Details about the data elements of an asset.
    • When multiple jobs are attached to a table, the Quality tab on an Asset page shows an average similar to a scorecard in Collibra DQ.
  • Note Table Assets roll up to the DQ Job Data Asset. Best practice is to roll up DQ Job to Table to align the dedication score and the Quality tab Asset score for a Table Asset.

    Spark Version Update
    As of Collibra DQ version 2023.11, we’ve upgraded our out-of-the-box Apache Spark version from 3.2.0 to 3.4.1. We strongly encourage organizations on Standalone deployments of Collibra DQ to upgrade to the latest Spark package to utilize of the new features and address some of the major vulnerabilities with Spark 3.2 or earlier versions. Additionally, Collibra DQ support for Spark 2.x is limited as of Collibra DQ 2024.01, as Spark 2.x has reached its end of life.

    If you use Spark 3.2.2 or lower, we recommend upgrading to 3.4.1 to address various critical vulnerabilities present in the Spark core library, including log4J.

Important 
Changes for Kubernetes Deployments
As of Collibra DQ version 2023.11, we've updated the Helm Chart name from owldq to dq. For Helm-based upgrades, point to the new Helm Chart while maintaining the same release name. Please update your Helm install command by referring to the renamed parameters in the values.yaml file. It is also important to note that the pull secret has changed from owldq-pull-secret to dq-pull-secret.

Further, following deployment, your existing remote agent name will change. For example, if your agent name is owldq-owl-agent-collibra-dq, the new agent name will be dq-agent-collibra-dq. If your organization uses APIs for development, ensure that you upgrade AGENT name configurations in your environments.

Lastly, when you deploy using the new Helm Charts, new service (Ingress/Load Balancer) names are created. This changes the IP address of the service and requires you to reconfigure your Load Balancer with the new IP.

Please see the expandable sections below for more details about specific changes.

Note 
If your organization has a standalone deployment of Collibra DQ with SSL enabled for DQ Web, and both DQ Web and DQ Agent are on the same VM or server, we recommend upgrading directly to Collibra DQ 2023.11.3 patch version or 2024.01.

Migration Updates

Important This section only applies if you are upgrading from a version older than Collibra DQ 2023.09 on Spark Standalone. If you have already followed these steps during a previous upgrade, you do not have to do this again.

We have migrated our code to a new repository for improved internal procedures and security. Because owl-env.sh jar files are now prepended with dq-* instead of owl-*, if you have automation procedures in place to upgrade Collibra DQ versions, you can use the RegEx replace regex=r"owl-.*-202.*-SPARK.*\.jar|dq-.*-202.*-SPARK.*\.jar" to update the jars.

Additionally, please note the following:

  • Standalone Upgrade Steps
  • When upgrading from a Collibra DQ version before 2023.09 to a Collibra DQ version 2023.09 or later on Spark Standalone, the upgrade steps have changed.

New Features

Integration

  • You can now see the Overview DQ Score on an Asset when searching via Data Marketplace. This improves your ability to browse the data quality scores of Assets without opening their Asset Pages.

Enhancements

Capabilities

  • When setting up a connection to a Google BigQuery data source, you can now use Workload Identity to authenticate your connection. By using Workload Identity to authenticate your BigQuery connection, you can now access data stored in BigQuery across GCP projects without relying on JSON credential files or metadata obfuscation.
  • When using the Alert Builder, you can now create an alert for when a job run fails. You can configure this by selecting the Job Failure option from the Status Alert modal on the Alert Builder page.
    • Additionally, the alert types on the Alert Builder page have changed from Dataset Run Alerts and Job Status to Condition and Status, respectively.
  • When you receive an email alert, the body of the alert now includes the Alert Type as either Condition or Status. When the Alert Type is Condition, the query upon which the condition is based also displays.
  • When using the /v3/rules/{dataset}/ruleBreaksSelect and you select INTERNAL for the storageType parameter, the API generates a SQL query that returns the break records included in the PostgreSQL Metastore.
  • The /v2/getbreakingrecords API is now filtered by runId and limited to 100 records by default.
  • When using multi-tenancy with SAML enabled, you can now set showsamltenantmetadatalabel=false from the ConfigMap to hide tenant metadata labels from SAML-enabled tenants to match the names of non-SAML-authenticated tenants.
  • When signing into Collibra DQ, you are now required to select a tenant from the dropdown menu before proceeding.
  • The Scheduler page is now powered by the v2/getallscheduledjobs API.
  • What was previously the DQ Job button next to the search bar at the top of the Collibra DQ application is now called Findings.

Platform

  • Standalone installations now come packaged with the PostgreSQL 12 installer instead of PostgreSQL 11.
  • Helm Charts are now versioned according to their corresponding release version. You can find version details in the Charts.yaml file.
  • You can now use the default truststore by setting the global.web.tls.trust.default flag to true.
  • With SAML configured, you can now use the default keystore without the need for a custom keystore.
  • We've introduced pattern validation for some commonly used IDs and names in Collibra DQ. You can override these patterns in the owl-env.sh file for Standalone or the Web ConfigMap for Cloud Native by replacing the default values using a RegEx for the following env variables:
  • Env variable Variable name Default Usage
    VALIDATION_PATTERN_NAME_ID Name ID pattern Alphanumeric characters (letters and digits) and underscores (_). For overriding defaults of Name IDs in Collibra DQ.
    VALIDATION_PATTERN_COMMON_NAME Common name Alphanumeric characters (letters and digits) and underscores (_). For overriding defaults of common names used in Collibra DQ.
    VALIDATION_PATTERN_SCHEMA_NAME Schema name Alphanumeric characters (letters and digits) and underscores (_). For overriding defaults of connection schema names in Collibra DQ.
    VALIDATION_PATTERN_CONN_NAME Connection name Alphanumeric characters (letters and digits), underscores (_), and hyphens (-). For overriding defaults of connection names in Collibra DQ.
    VALIDATION_PATTERN_DATASET_NAME Dataset name Alphanumeric characters (letters and digits), underscores (_), and hyphens (-). For overriding defaults of dataset names in Collibra DQ.
    VALIDATION_PATTERN_FILE_NAME File name Alphanumeric characters (letters and digits), underscores (_), hyphens (-), periods (.), backslashes (/), and spaces ( ). For overriding defaults of file name usages in Collibra DQ.
    VALIDATION_PATTERN_LDAP_DN DN Comma separated key value pair with both the key and value consisting of lowercase letters, digits, and hyphens. For overriding defaults of LDAP DN usages in Collibra DQ.

    Example If you want to override the default that does not allow spaces in the connection name, set the connection name variable to the following RegEx: VALIDATION_PATTERN_CONN_NAME=^[a-zA-Z0-9_- ]+$

  • In order to promote compatibility with various software delivery platforms, the leading ‘0’ in the defaultMode value of 0493 has been removed from 4 YAML files for Kubernetes deployments. The new defaultMode is now 493 for the following YAML files:
    • k8s/charts/dq/charts/dq-agent/templates/dq-agent-statefulset.yaml
    • k8s/charts/dq/charts/dq-livy/templates/dq-livy-deployment.yaml
    • k8s/charts/dq/charts/dq-web/templates/dq-web-statefulset.yaml
    • k8s/charts/dq/charts/spark-history-server/templates/deployment.yaml
  • Configurable PBE encryption is now supported for Kubernetes and Standalone deployments of Collibra DQ.
  • When validating a finding and assigning it to a ServiceNow user or group, you can now reassign this finding from ServiceNow to a Collibra DQ user. (ticket #122476, 133165)

Pushdown

  • When changes are made to the schema based on the source query, the dataset_schema table now contains any changes made.

Fixes

Capabilities

  • When creating a rule on a Pullup dataset that uses “in” or “rlike” where the condition ends on wrapped parentheses (ie. )) ), the string replace logic now works as expected. Previously, rules that contained rlike() or in() would throw exceptions when the job ran. (ticket #128759)
  • Important In order to keep rules as close to the original input as possible, some of the padding that was originally appended to certain characters has been removed.

  • When archiving rule breaks, break records now export to S3 buckets as expected. (ticket #125411)
  • When changing the business unit on the Metadata Bar or Dataset Manager, the updated business unit now replaces the old one in the Metastore instead of creating an additional entry. (ticket #127823, 130086)
  • When editing a dataset from Dataset Manager in the new UI, the Schema/Parent Folder and Table/File Name fields are no longer swapped. (ticket #129103)
  • When running a job against a BigQuery dataset where the underlying query uses “from” before the schema/table, Collibra DQ now parses the “from” correctly. (ticket #129798)
  • When attempting to create a Pullup job against a Trino schema whose name contains the escape character \, you can now view the table details in Explorer. (ticket #131065)
  • The configs dupelimit and dupelimitui now work as expected where dupelimit restricts the number of dupe findings stored in the Metastore and of those findings, dupelimitui restricts the number of dupes shown on the Dupes tab on the Findings page. (ticket #127748)
  • When using the Rule Builder in the new UI, freeform rules using RLIKE now successfully pass validation. (ticket #127779)
  • When using the Alert Builder in the new UI, you can now update alert batch names as expected. (ticket #129079)
  • When a scheduled job runs with zero rows and Parallel JDBC enabled, it no longer fails. (ticket #128385)
  • When using Kerberos-based authentication for PostgreSQL or CockroachDB data sources, jobs no longer time out before authenticating against KDC. (ticket #128540)

Platform

  • While configuring SSL after a fresh Standalone install of or upgrade to Collibra DQ version 2023.11, the DQ agent now starts as expected. (ticket #131078)
  • Note With this update, any system that has export SERVER_PORT set will impact the default value of -1. You need to comment out export SERVER_PORT=9000 as PORT=9000 #owl-web port NUMBER to define the port.

  • We fixed an issue that caused unexpected license name requests. (ticket #126731)

DQ Integration

  • When mapping columns from rules in Collibra DQ, column names are now parsed correctly in Collibra Data Intelligence Cloud.
  • When a User Defined Rule is deactivated or deleted from a dataset in Collibra DQ, then the Rule and Score status in Collibra Data Intelligence Cloudwill be set to “Suppressed” when the dataset runs again in Collibra DQ. (ticket #124414)

Pushdown

  • When running a Redshift Pushdown job to create a profile of a Redshift dataset, you can now view TopN Shapes for columns containing NULL values. (ticket # 129221)
  • When casting from one data type to another on a Redshift dataset, the job now runs successfully without throwing an exception message. (ticket #128718)
  • When creating a Pushdown job on a BigQuery dataset that contains a DATE column time slice, you can now create the job without receiving a command line error. (ticket #129283)
  • When using the SQL Query option on a Snowflake table with a mixed case name, you can now create the job without receiving an error. (ticket #126118)

DQ Cloud

  • When running a job on a Snowflake table containing a timestamp column, timestamps now translate correctly as date, time, and timestamp types. (ticket #120975)
  • Note While this fix was originally intended to be limited to DQ Cloud, timestamps also translate correctly on Standalone deployments.

  • When using the Dupes tab on the Findings page, dupe observations now display correctly when more dupes are discovered than the preview limit of 30. (ticket #127625)

Known Limitations

  • When configuring dupelimitui, values above 2000 result in some UI lag. Because of this, we recommend using dupelimitui values under 2000.

DQ Security

Release 2023.11

Release Information

  • Release date of Collibra Data Quality & Observability 2023.11: November 20, 2023
  • Publication dates:
    • Release notes: November 8, 2023
    • Documentation Center: November 13, 2023

Highlights

    Pushdown
    We're excited to announce that Pushdown for BigQuery is now generally available! Pushdown is an alternative compute method for running DQ jobs, where Collibra DQ submits all of the job's processing directly to a SQL data warehouse, such as BigQuery. When all of your data resides in BigQuery, Pushdown reduces the amount of data transfer, eliminates egress latency, and removes the Spark compute requirements of a DQ job.
    UI Redesign
    New installs of Collibra DQ come with REACT_MUI and UX_REACT_ON admin flags set to TRUE by default. Additionally, if a pre-existing install of Collibra DQ had these flags set to FALSE, they are now set to TRUE. While you can still modify these flags from the Admin Console Configuration Settings Application Config page, we recommend keeping them set to TRUE to allow for full feature functionality and an elevated user experience. See the Updated UI table below for more details on what's changed.
    Spark Version Update
    We’ve upgraded our out-of-the-box Apache Spark version from 3.2.0 to 3.4.1. We strongly encourage organizations on Standalone deployments of Collibra DQ to upgrade to the latest Spark package to utilize of the new features and address some of the major vulnerabilities with Spark 3.2 or earlier versions. Additionally, Collibra DQ support for Spark 2.x will be limited as of Collibra DQ 2024.01, as Spark 2.x has reached its end of life.
    Collibra AI
    We're delighted to announce that Collibra AI is now available for private beta testing! Collibra AI introduces automated SQL rule writing capabilities on the Rule Workbench and Dataset Overview that help you accelerate the discovery, curation, and visualization of your data. Contact your Collibra CSM for more details about participating in this exciting private beta.

Important 
Changes for Kubernetes Deployments
We've updated the Helm Chart name from owldq to dq. For Helm-based upgrades, point to the new Helm chart while maintaining the same release name. Please update your Helm install command by referring to the renamed parameters in the values.yaml file. It is also important to note that the pull secret has changed from owldq-pull-secret to dq-pull-secret.

Further, following deployment, your existing remote agent name will change. For example, if your agent name is owldq-owl-agent-collibra-dq, the new agent name will be dq-agent-collibra-dq. If your organization uses APIs for development, ensure that you upgrade AGENT name configurations in your environments.

Lastly, when you deploy using the new Helm Charts, new service (Ingress/Load Balancer) names are created. This changes the IP address of the service and requires you to reconfigure your Load Balancer with the new IP.

Please see the expandable sections below for more details about specific changes.

Note 
If your organization has a standalone deployment of Collibra DQ with SSL enabled for DQ Web, and both DQ Web and DQ Agent are on the same VM or server, we recommend upgrading directly to Collibra DQ 2023.11.3 patch version instead of 2023.11. For more information, see the Maintenance Updates section below.

Migration Updates

Important This section only applies if you are upgrading from a version older than Collibra DQ 2023.09 on Spark Standalone. If you have already followed these steps during a previous upgrade, you do not have to do this again.

We have migrated our code to a new repository for improved internal procedures and security. Because owl-env.sh jar files are now prepended with dq-* instead of owl-*, if you have automation procedures in place to upgrade Collibra DQ versions, you can use the RegEx replace regex=r"owl-.*-202.*-SPARK.*\.jar|dq-.*-202.*-SPARK.*\.jar" to update the jars.

Additionally, please note the following:

  • Standalone Upgrade Steps
  • When upgrading from a Collibra DQ version before 2023.09 to a Collibra DQ version 2023.09 or later on Spark Standalone, the upgrade steps have changed.

Liveness Probe Updates

  • Cloud Native
    • When a Kubernetes pod service becomes unstable, a new liveness probe automatically deletes the pod to ensure the DQ agent stays alive and running. No further action is necessary for Cloud Native deployments; this note is strictly for informational purposes only.
  • Standalone
    • Because the implementation of the liveness probe for Kubernetes required a change in the owlmanage.sh file for Standalone installations, you need to follow the steps below to upgrade a Standalone deployment.
        Important 
        If your organization has a Standalone installation of Collibra DQ, you must copy the latest owlmanage.sh to /opt/owl/bin directory, as the file has changed.

New Features

Pushdown

  • When running Profiling on Pushdown jobs, advanced level profiling is now an opt-in feature and does not run by default. Advanced Profile determines whether a string field contains various string numerics, calculates TopN, BottomN, and TopN Shapes, and detects the scale and precision of double fields. We've also included the Profile String Length setting in the Advanced Profile option on the Explorer Settings modal.

Enhancements

Capabilities

  • When using the Alert Builder, you can now create an alert for when a job run completes successfully. When you add an alert, you can now choose from two options:
    • Dataset Run Alerts let you set an alert for when a job run meets a certain condition.
    • Job Status lets you set an alert for when a job run completes.
  • You can now configure arbitrary users as part of root user groups for cloud native deployments of Collibra DQ on Openshift.
  • You can now use OAuth 2.0 to authenticate Trino connections.
  • We've moved the /v2/getprofiledeltasbyrunid API to the V3 Profile API GET /v3/profile/deltas.

Pushdown

  • You can now archive source break records from Redshift Pushdown jobs.

Integration

  • The Unassigned DQ Job domain now has the Parent Community name appended to it. Given the unique community name constraint when using more than one tenant or instance, this change allows for the unique naming of Business Units. The new Community structure is as follows:
    • Parent Community
      • Business Unit Community
        OR
      • Unassigned DQ Job + Parent Community
        • DQ Job Domain
          • DQ Job Asset
        • Rulebook (definitions)
          • DQ Rule Assets
        • Rulebook (scores)
          • DQ Metric Assets
  • When on a dataset-level page, such as Findings or Profile, you can now select "Reset Integration" from the Integration dropdown menu on the metadata bar to realign mappings that may have changed in Collibra Data Intelligence Cloud. If you get an error message when your mappings are aligned, this action can reset the integration and allow you to proceed with Collibra DQ metadata ingestion.
  • When a dataset integration with Collibra Data Intelligence Cloud is enabled, you can now view the Community, Sub-Community, and Domain hierarchy to which the integration is mapped in Collibra Data Intelligence Cloud from the metadata bar on all dataset-level pages. This increases the transparency of where Assets are created in Collibra Data Intelligence Cloud without needing to navigate away from Collibra DQ.
    • Additionally, you can click any of the breadcrumbs to open the Community, Sub-Community, or Domain in Collibra Data Intelligence Cloud.
  • When running Pushdown jobs with integrations enabled, the results now integrate into Collibra Data Intelligence Cloud successfully. Previously, you had to manually enable or disable the integration each time a Pushdown job ran.
  • After integrating a dataset from the Admin Console, Findings or Dataset Manager pages, you can now click the "Data Quality Job" link in the new View in DIC column to open its corresponding Asset page in Collibra Data Intelligence Cloud.
  • The GET /dgcjson endpoint is now included in the main Integrations API as GET /dgc/integrations/getdgcjson. Previously, this was located under the "UI Internal" section of Swagger.
  • Tip When an integration error occurs, check that the Community, Domain, and Asset names in Collibra Data Intelligence Cloud don't already exist from a previous integration.

DQ Cloud

  • Collibra Edge now reflects Collibra DQ's new default Spark version 3.4.1.

Fixes

Capabilities

  • When running a job to check for outliers where the lookback value is set to something other than the default 5, the minhistory now updates to the correct value in the metastore. (ticket #124063)

      Note Manual overrides of -dlminhist on the command line do not save in the metastore.

  • When using the new Collibra DQ UI, run date information now displays correctly. Previously, the job configuration of manual DQ job runs would override with hardcoded dates, which caused the upcoming scheduling date to only reflect the hardcoded date. (ticket #126345)
  • When changing the assignment for a dataset with the new UI turned on for an SSO instance of Collibra DQ, existing SAML assignments now load correctly on the Findings and Assignments pages. (ticket 128063)
  • When assigning users to the following roles, they can now access all appropriate Admin screens (ticket #128115):
    • ROLE_CONNECTION_MANAGER
    • ROLE_DATA_GOVERNANCE_MANAGER
    • ROLE_DATASET_MANAGER
    • ROLE_OWL_ROLE_MANAGER
    • ROLE_USER_MANAGER

Platform

  • When editing the Batch Name field of a Dataset Run Alert, the batch name distribution list no longer updates if the Batch Name field is empty or blank " ". (ticket #126763, 126796)
  • When reviewing the Outlier tab on the Findings page, outlier findings now expand correctly when you drill down into them. (ticket #126065)
  • When creating an alert, you can now enter the special characters ! # $ % & ' * + - / = ? ^ _ ` . { | } ~ in an email address for the alert recipient. (ticket #126763)

DQ Integration

  • When viewing user-defined or adaptive rules, Passing Fraction now reflects the points deducted from an individual rule, rather than the total rule score. Previously, the total breaks for the rule type of user-defined rules were used to generate the Passing Fraction, rather than the individual rule breaks. (tickets #124217, 127700)
  • When using the configuration wizard to map your single-tenant Collibra DQ environment, you can now link your integration to an existing community and create new communities as part of your integration. (tickets #122948, 123426, 126227, 127044, 128345)
  • When integrating a Collibra DQ dataset and setting up dimensions in Collibra Data Intelligence Cloud, the columns now display correctly in Collibra Data Intelligence Cloud after running the dataset in Collibra DQ. (ticket #126379)

Pushdown

  • When adding outliers to a Pushdown dataset and running a job, the outlier configurations now render properly. Previously, when editing a job, one or more of the outliers did not display as expected. (ticket #124736)

DQ Cloud

  • Fixed an issue that caused the Collibra DGC MDL Proxy to run out of memory under certain conditions.

DQ Security

Note 
We've removed all existing classic UI JS libraries and their references from the Updated UI to address and prevent any potential security vulnerabilities.

Updated UI

In addition to broader user interface and user experience enhancements, we've also added some impressive new features! The following table showcases some of the highlights.

Component Description Available in Classic
Metadata Bar

The Metadata Bar is a dataset anchor that simplifies the navigation to some of your most frequently used pages, such as Dataset Overview, Profile, Rules, and Findings. It also provides quick insight into your dataset, such as the number of active rules, the data source from which the dataset was created, and whether or not your job is scheduled to run automatically. When an integration is set up, the metabar also allows you to easily enable or disable dataset metadata integrations into Collibra Data Intelligence Cloud.

You can access the Metadata Bar on any dataset-level page, including:

  • Findings
  • Profile
  • Dataset Rules
  • Alert Builder
  • Dataset Overview
No
Dataset Overview

Dataset Overview lets you query your dataset to discover key data points and insights, which you can convert to data quality rules entirely within the Dataset Overview modal. With the power to write SQL to query your dataset, you can accelerate the process of data discovery and reveal important insights in real time.

Dataset Overview also allows private beta participants to leverage Collibra AI to automatically write and troubleshoot SQL for faster rule writing and advanced exploration of your dataset. See the Collibra AI private beta documentation to learn more.

No
Explorer

The new workflow simplifies the process of creating a DQ job to run against a dataset. With just a few clicks, you can create a basic profile job in a matter of seconds instead of minutes. For a more advanced scan, the step-by-step guide walks you through the process, eliminating many of the more tedious elements of the classic Explorer.

Yes
Findings

The new Findings page will feel similar to the classic page, but with a few important changes:

  • The daily score, pulse view, row count, and pass/fail charts are now broken out into individual tabs for an improved display.
  • The dataset metadata that previously resided next to the chart views is now anchored to the top of the page within the Metadata Bar for quicker analysis of your dataset’s key data points.
  • The findings tabs of the various data quality dimensions now have enhanced readability and more clearly displayed actions.
Yes
Profile While many of the same column- and dataset-level insights are unchanged from the classic UI, the presentation of information is now modernized for a crisper experience. Yes
Rules

Dataset Rules lists all previously saved rules for a given dataset and provides an overview of their details, such as the definitions of SQL conditions, rule types, and whether or not rules pass validation checks. From here, you can access the Rule Workbench to create or edit a rule.

The Rule Workbench replaces the classic Rule Builder, fusing an elegant SQL command line interface with the preview and advanced setting capabilities you expect from a modern SQL builder. Like the Dataset Overview, you can also use Collibra AI generated SQL to write and troubleshoot rules on the Rule Workbench. See the Collibra AI private beta documentation to learn more.

We've also split Data Class and Template rules into their own pages to emphasize that they are independent of jobs and datasets and improve their overall organization.

No
Alerts The new Alert Builder gives you an at-a-glance overview of all alerts for a particular dataset and simplifies adding new alerts and editing existing ones. Yes
Dataset Manager

Dataset Manager provides a list of all datasets in your Collibra DQ environment, as well as a variety of management options, such as bulk actions, assigning datasets to data categories, and the ability to filter datasets by a variety of criteria.

Yes
Column Manager Column Manager is a detailed breakdown of all the columns in the datasets in your Collibra DQ environment that shows key data points like data type, various ratios, and the Pass/Fail status of a given column. You can also bulk apply rules, data classes, and sensitive labels to selected columns. No
Report Dashboards

The Reports section now has two new dashboards available in the updated UI:

  • Column Dimension is an overview to view the data quality dimensions of the columns of all or specific datasets. By filtering by business unit, dataset, column, and monthly periods, you can make more informed decisions and increase the value of your data. Column Dimension also provides insight into the total current dimension scores, dimension scores across time, and DQ scores (along with other metadata) for each column.
  • Dataset Dimension is an overview to view data quality dimensions with different filters to make more informed decisions and increase the value of your data. You can filter by specific datasets or all datasets in your Collibra DQ environment, by business unit, and by monthly periods.
No
Connections

The updated Connections page in the Admin Console does away with the connection tiles of the classic page in favor of a highly searchable and sortable paginated table format. The new page also features two tabs for Connections and the Drivers stored in your Collibra DQ environment.

Additionally, when you add or edit a connection, the connection template is now organized in three tabs for Connection Details, Driver Properties, and Connection Variables. With these sections now clearly delineated, the process of creating or updating a connection is now much cleaner.

Yes
Admin Console The Admin Console now lists each admin activity for better organization and simpler navigation than the tiles of the classic UI. Yes

Maintenance Updates

Explorer

  • When using the SQL compiler on the dataset overview for remote files, the Compile button is disabled because the execution of data files at the Spark layer is unsupported.
  • You cannot currently upload temp files from the new File Explorer page. This may be addressed in a future release.
  • The Formatted view tab on the File Explorer page only supports CSV files.
  • When creating a job, the Estimate Job step from the classic Explorer is no longer a required step. However, if incorrect parameters are set, the job may fail when you run it. If this is the case, return to the Sizing step and click Estimate next to Job Size before you Run the job.

Connections

  • When adding a driver, if you enter the name of a folder that does not exist, a permission issue prevents the creation of a new folder.
    • A workaround is to use an existing folder.

Admin

  • When adding another external assignment queue from the Assignment Queue page, if an external assignment is already configured, the Test Connection and Submit buttons are disabled for the new connection. Only one external assignment queue can be configured at the same time.
  • Due to security requirements, we've removed the ability for application administrators to add new local users from the User Management page in the Admin Console. All new users must use the Register link on the Collibra DQ sign in screen.
    • When auto-approve is not configured, admin users can still manually approve new user requests and add roles to the new user from the User Management page.

Profile

  • When adding a distribution rule from the Profile page of a dataset, the Combined and Individual options incorrectly have "OR" and "AND" after them.
  • When using the Profile page, Min Length and Max Length does not display the correct string length. This will be addressed in an upcoming release.

Rules

  • When creating a quick rule from the Data Preview tab of the Findings, Profile, or Rules pages, the Preview Limit and Run Time Limit do not honor the application default limits of 6 and 30, respectively. Instead, the Preview Limit and Run Time Limit are both incorrectly set to 0.
    • While this will be addressed in the January (2024.01) release, a workaround is to manually edit these fields from the Rule Workbench Settings modal.

Alerts

  • Batch email updates are not currently working in the beta UI. This will be addressed in the January (2024.01) release.
  • When editing the Batch Name of a job alert, there is a limitation that prevents you from editing the email address field associated with the batch alert.

Scorecards

  • When creating a new scorecard from the Page dropdown menu, because of a missing function, you cannot currently create a scorecard.
    • While a fix for this is planned for the September (2023.09) release, a workaround is to select the Create Scorecard workflow from the three dots menu instead.

Navigation

  • The Dataset Overview function on the Metadata Bar is not available for remote files.
  • The Dataset Overview modal throws errors for the following connection types:
    • BigQuery (Pushdown and Pullup)
    • Athena CDATA
    • Oracle
    • SAP HANA
  • The Dataset Overview function throws errors when you run SQL queries on datasets from S3 and BigQuery connections.

Maintenance Updates

2023.11.3

  • While configuring SSL after a fresh Standalone install or upgrade to Collibra DQ version 2023.11, the DQ Agent and Web now start as expected. (ticket #131078)
    • With this fix, the DQ Agent now uses port 9101 by default to expose the Health Check API.
  • Note Ensure you select the latest corresponding Helm Chart when taking a maintenance update for Cloud Native deployments.

2023.11.4

  • When synchronizing DQ rules without business units configured in Collibra DQ, you can now synchronize them to both root and sub-communities. (ticket #127044, 128138)

Release 2023.10

Release Information

  • Expected release date of Collibra Data Quality & Observability 2023.10: October 29, 2023
  • Publication dates:
    • Release notes: October 22, 2023
    • Documentation Center: October 27, 2023

Highlights

    Pushdown
    We're excited to announce that Pushdown for Trino is now available as a public beta offering! Pushdown is an alternative compute method for running DQ jobs, where Collibra DQ submits all of the job's processing directly to a SQL data warehouse, such as Trino. When all of your data resides in Trino, Pushdown reduces the amount of data transfer, eliminates egress latency, and removes the Spark compute requirements of a DQ job.

Migration Updates

Important This section only applies if you are upgrading from a version older than Collibra DQ 2023.09 on Spark Standalone. If you have already followed these steps during a previous upgrade, you do not have to do this again.

We have migrated our code to a new repository for improved internal procedures and security. Because owl-env.sh jar files are now prepended with dq-* instead of owl-*, if you have automation procedures in place to upgrade Collibra DQ versions, you can use the RegEx replace regex=r"owl-.*-202.*-SPARK.*\.jar|dq-.*-202.*-SPARK.*\.jar" to update the jars.

Additionally, please note the following:

  • Standalone Upgrade Steps
  • When upgrading from a Collibra DQ version before 2023.09 to a Collibra DQ version 2023.09 or later on Spark Standalone, the upgrade steps have changed.

Enhancements

Capabilities

  • When using the Mapping (Validate Source in classic mode) activity, we've introduced the following options:
    • The Skip Lines (srcskiplines from the command line) option instructs Collibra DQ to skip the number of lines you specify in CSV source datasets.
    • The Multi Lines (srcmultilines from the command line) option instructs Collibra DQ to read JSON source files formatted in multi-line mode.
  • When using the Dataset Manager, you can now filter by Pushdown and/or Pullup connection type.
  • When using Pulse View, the Lookback column is now called Last X Days.

Pushdown

  • Profiling on Pushdown jobs now uses a tiered approach to determine if a string field contains various string numerics, calculate TopN, BottomN, and TopN Shapes, and detect the scale and precision of double fields.
  • When running a job with profiling and other layers enabled, the entire allocated connection pool is now used from the beginning to the end of the job to extract the maximum allowed parallelism. Previously, profiling ran first and had to finish before the activities of any other layers began.
  • You can now use the Alert Builder to set up alerts for Athena Pushdown jobs.

DQ Cloud

  • DQ Cloud now supports the Collibra DQ to Collibra Data Intelligence Cloud API-based integration with the same functionality as on-premises deployments of Collibra DQ.

Fixes

Capabilities

  • Native SQL rules on connections using Password Manager authentication types now run successfully on Cloud Native deployments of Collibra DQ. (ticket #111493)
  • When running a job with a join of 2 datasets, the job no longer incorrectly shows a "Finished" status when the secondary dataset is still in the Spark loading process. (ticket #116004)
  • When using Calibrate to modify the start and end dates for jobs with outliers, the dates now save properly. Previously, the calibration dates did not save in the calibration modal. (ticket #120283)
  • When creating a job on an Oracle connection, Collibra DQ now blocks SDO_GEOMETRY and other geospatial data types from processing, allowing you to create the job. Previously, these data types prevented the creation of jobs. (ticket #122200)

Platform

  • When using a SAML sign-in in a multi-tenant environment, user authentication is now successful when RelayState is unavailable. (ticket #123238)
  • When signing into Collibra DQ using the SAML SSO option, you now see all configured tenants. Previously, only one tenant displayed on the SAML SSO dropdown menu. (ticket #124323)
  • When specifying database properties containing spaces (' ') in the connection string field of a JDBC connection, the connection URL now properly transcribes string spaces. (ticket #124397)
  • When you select the Completeness Report from the Reports page, it now opens as intended. (ticket #123602)

DQ Cloud

  • When deleting datasets from Catalog, there is no longer a discrepancy between the number of datasets displayed on the Collibra DQ UI and in PostgreSQL metastore. (ticket #125069)
  • Eliminated a spurious error message logged during some normal operations when in Standalone mode.
  • You can now use the GET tenant/v3/edges/agents API to retrieve information about your Edge site agents and their statuses.
  • OwlCheckQ now synchronizes without error.

Pushdown

  • When creating a rule against a Snowflake Pushdown job that references the same dataset twice, the syntax now passes validation. Previously, a Snowflake Pullup rule that referenced the same dataset twice and used the same syntax passed validation, but the Snowflake Pushdown rule did not. (tickets #122364, 122923, 123748)
  • When archiving break records to Snowflake, the break records are now properly archived and you no longer receive an error. (tickets #123760, 123987)

Known Limitations

Capabilities

  • The Assignments page does not currently filter by date range. However, this functionality is planned for an upcoming release.

Platform

  • When using parquet or txt files on the Mapping activity, you must select "parquet" or ".txt," respectively, as the extension. If you use the default "auto" extension option, these file types cannot be used.

DQ Security

Important 
We've modified the OS image by removing some of the OS utils, such as curl, to address major vulnerabilities. If you use any of these OS utils in your custom scripts within containers, you need to modify them to use different mechanisms, such as /dev/tcp socket, for the same functions.

Beta UI

Beta UI Status

The following table shows the status of the Beta redesign of Collibra DQ pages as of this release.

Page Location Status
Homepage Homepage Done
Sidebar navigation Sidebar navigation Done
User Profile User Profile Done
List View Views Done
Assignments Views Done
Pulse View Views Done
Catalog by Column (Column Manager) Catalog (Column Manager) Done
Dataset Manager Dataset Manager Done
Alert Definition Alerts Done
Alert Notification Alerts Done
View Alerts Alerts Done
Jobs Jobs Done
Jobs Schedule Jobs Schedule Done
Rule Definitions Rules Done
Rule Summary Rules Done
Rule Templates Rules Done
Rule Workbench Rules Done
Data Classes Rules Done
Explorer Explorer Done
Reports Reports Done
Dataset Profile Profile Done
Dataset Findings Findings Done
Sign-in Page Sign-in Page Done

Note Admin pages are not yet fully available with the new Beta UI.

Beta UI Limitations

Explorer

  • When using the SQL compiler on the dataset overview for remote files, the Compile button is disabled because the execution of data files at the Spark layer is unsupported.
  • You cannot currently upload temp files from the new File Explorer page. This may be addressed in a future release.
  • The Formatted view tab on the File Explorer page only supports CSV files.
  • When creating a job, the Estimate Job step from the classic Explorer is no longer a required step. However, if incorrect parameters are set, the job may fail when you run it. If this is the case, return to the Sizing step and click Estimate next to Job Size before you Run the job.

Connections

  • When adding a driver, if you enter the name of a folder that does not exist, a permission issue prevents the creation of a new folder.
    • A workaround is to use an existing folder.

Admin

  • When adding another external assignment queue from the Assignment Queue page, if an external assignment is already configured, the Test Connection and Submit buttons are disabled for the new connection. Only one external assignment queue can be configured at the same time.

Profile

  • When adding a distribution rule from the Profile page of a dataset, the Combined and Individual options incorrectly have "OR" and "AND" after them.
  • When using the Profile page, Min Length and Max Length does not display the correct string length. This will be addressed in an upcoming release.

Navigation

  • The Dataset Overview function on the Metadata Bar is not available for remote files.
  • The Dataset Overview modal throws errors for the following connection types:
    • BigQuery (Pushdown and Pullup)
    • Athena CDATA
    • Oracle
    • SAP HANA
  • The Dataset Overview function throws errors when you run SQL queries on datasets from S3 and BigQuery connections.

Release 2023.09

Release Information

  • Expected release date of Collibra Data Quality & Observability 2023.09: October 8, 2023
  • Publication dates:
    • Release notes: September 24, 2023
    • Documentation Center: September 29, 2023

Highlights

    Pushdown
    We're delighted to announce that Pushdown processing for Amazon Athena and Redshift is now available as public betas! Pushdown is an alternative compute method for running DQ jobs, where Collibra DQ submits all of the job's processing directly to a SQL data warehouse, such as Athena and Redshift. When all of your data resides in Athena or Redshift, Pushdown reduces the amount of data transfer, eliminates egress latency, and removes the Spark compute requirements of a DQ job.
    Job Estimator
    Collibra DQ utilizes Spark's ability to break large datasets into smaller, more manageable segments called partitions. When you run large Pullup jobs, you can now leverage the job estimator to automatically calculate and update the number of partition columns required to optimally run and write rules against them. Previously, the only way to know when a job required the scaling of resources was when it failed.

Important 
We have migrated our code to a new repository. Consequently, Collibra DQ owl-env.sh jar files are no longer prepended with owl-*. Instead, they are now prepended with dq-*. For more details, it's crucial that you review the Migration Updates section below.

Migration Updates

We have migrated our code to a new repository for improved internal procedures and security. Because owl-env.sh jar files are now prepended with dq-* instead of owl-*, if you have automation procedures in place to upgrade Collibra DQ versions, you can use the RegEx replace regex=r"owl-.*-202.*-SPARK.*\.jar|dq-.*-202.*-SPARK.*\.jar" to update the jars.

Additionally, please note the following:

  • Standalone Upgrade Steps
  • When upgrading to Collibra DQ 2023.09 on Spark Standalone, the upgrade steps have changed.

Enhancements

Capabilities

  • When running rules that reference secondary datasets, you now have the option to use serial rule processing to reduce operational costs.
    • Set -serialrule to true to leverage the Spark cache for the secondary dataset.
  • When authenticating your connection to CockroachDB with a PostgreSQL driver, you can now leverage Kerberos TGT without errors.
  • When creating a DQ job to run against a remote file data source, you can now select BEL as a delimiter.
  • When adding a name to a rule on the Rule Workbench, a helpful message displays if you use an invalid special character.
    • Rule names can only contain alphanumerical characters, underscores, and hyphens.
  • When reviewing Rules findings, the default number of rows available to preview is now 6. Previously, the Rules tab only displayed 5 preview rows.
  • When creating a Pullup job from Explorer, the Mapping step now automatically maps source columns to target columns.
  • We've updated the connection icons on the Explorer, Pulse View, and Admin Connections pages.
    • When you add a new connection from the Admin Connections page, the icon will also update accordingly.
  • When monitoring the Jobs page with React on, you can now right-click to open a dataset in a new tab.
  • When assigning or validating a finding to an external user whose first name, last name, and external user ID cannot be found or do not exist, you can now set a backup display name in the ConfigMap to ensure you can still validate or assign that finding to the external user.
    • Set SAML_USE_EXTERNAL_USER_ID_FOR_DISPLAY to true.

Platform

  • When deleting a user, the user is now removed from both the user and user_profile metastore tables.
  • When loading a large remote file into Explorer, a progress bar now tracks its loading status.

DQ Integration

  • When using the configuration wizard in Collibra DQ to set up an integration, your Collibra Data Intelligence Cloud credentials are now encrypted in the metastore to ensure that your information is always secure.

DQ Cloud

  • We've introduced a new endpoint to retrieve aggregated WAL (write-ahead logs) stats.
  • When deploying a new Edge site, the TenantAlignmentService no longer stops checking for new tenants in DQ Cloud after 100 attempts.

Pushdown

  • When using Archive Break Records for Databricks Pushdown, the 'seqno' column for all break records tables created in Databricks is no longer designated as an identity column. Instead, its default value is now NULL. We've made this adjustment because Databricks does not support concurrent transactions for Delta tables with identity columns.
    • If you already created these tables in your Databricks environment, you need to delete them. Subsequently, allow the Collibra DQ application to re-create these tables for you, ensuring compatibility with the latest changes. To do this, you can run the following SQL commands on your Databricks target schema dedicated to maintaining records of source breaks:
    • drop table collibra_dq_outliers
      drop table collibra_dq_duplicates
      drop table collibra_dq_rules
      drop table collibra_dq_breaks
      drop table collibra_dq_shapes
    • After you run a DQ job, the tables will be re-created on your Databricks schema.
  • We’ve improved the memory usage to prevent large quantities of rule break records from causing out-of-memory errors.
  • When running a Pushdown job, the entire allocated connection pool is now used to extract the maximum allowed parallelism to allow profiling to run in parallel with other layers and reduce the latency of the job.
    • Only the required number of connection threads are used for an activity.
  • When creating rules to run against Pushdown datasets, you can now use cross-join queries.
  • We've added a Pendo tracking event to track the number of Pushdown jobs and columns in an environment.

Fixes

Capabilities

  • When editing DQ jobs for KDB (PostgreSQL) connections, you can now successfully execute a query with a large number of records. (ticket #113493, #116740)
  • When creating a BigQuery job, you can now create a dataset for a destination table without throwing an error. (ticket #118534, #122761)
  • When archiving break records from Pullup jobs, you can again write break records to S3 storage buckets. Previously, an invalid rule error returned which stated "Exception while inserting break records into S3: No FileSystem for scheme s3". (ticket #121509)
  • When you open the Oversized Job Report, you can again see the reports without any errors. (ticket #121752)

Platform

  • When reviewing the configuration after running a Validate Source job, you no longer receive a validation error due to lost database, schema, table, field, and query information. (ticket #113977)
  • Oracle dataset host strings no longer parse incorrectly. Previously, Oracle dataset host strings were parsed as "jdbc" instead of displaying the correct host string. To see the updated and correct host string for Oracle datasets, rerun your jobs manually via the scheduler or API. (ticket #124846)

DQ Integration

  • When completing the connection mapping for your Collibra DQ to Collibra Data Intelligence Cloud integration, you now correctly see database views from Collibra DQ to the tables and columns to which they relate in Collibra Data Intelligence Cloud. (ticket #124191, #124213, #125676)

DQ Cloud

  • When upgrading to Collibra DQ version 2023.06, you can now see entries in your List View scorecards. Previously, there was a discrepancy between Edge and the Cloud metastore. (ticket #121624)

Pushdown

  • When running a Pushdown job with the /v3/jobs/run API, the username now correctly updates to the authenticated user. (ticket #121192)
  • When upgrading to Collibra DQ version 2023.07.2, you can now see the Data Preview for breaking record count for a freeform SQL rule against a Snowflake Pushdown dataset. (ticket #122585)

Known Limitations

Capabilities

  • There is a limitation with Validate Source where source columns containing white spaces do not map properly to the target columns.
    • A workaround is to remove the white spaces from the command line and then copy/paste the command line into a new DQ job.
  • When using the Pulse View page after adding a new connection, there is a limitation where the icon of the connection does not automatically appear on the Pulse View page. Instead, it appears as a generic JDBC icon.

DQ Security Metrics

Note The medium, high, and critical vulnerabilities of the DQ Connector are now resolved.

Warning We found 1 critical and 1 high CVE in our JFrog scan. Upon investigation, these CVEs are disputed by Red Hat and no fix is available. For more information, see the official statements from Red Hat:
https://access.redhat.com/security/cve/cve-2023-0687 (Critical)
https://access.redhat.com/security/cve/cve-2023-27534 (High)

Beta UI

Beta UI Status

The following table shows the status of the Beta redesign of Collibra DQ pages as of this release.

Page Location Status
Homepage Homepage Done
Sidebar navigation Sidebar navigation Done
User Profile User Profile Done
List View Views Done
Assignments Views Done
Pulse View Views Done
Catalog by Column (Column Manager) Catalog (Column Manager) Done
Dataset Manager Dataset Manager Done
Alert Definition Alerts Done
Alert Notification Alerts Done
View Alerts Alerts Done
Jobs Jobs Done
Jobs Schedule Jobs Schedule Done
Rule Definitions Rules Done
Rule Summary Rules Done
Rule Templates Rules Done
Rule Workbench Rules Done
Data Classes Rules Done
Explorer Explorer Done
Reports Reports Done
Dataset Profile Profile Done
Dataset Findings Findings Done
Sign-in Page Sign-in Page Done

Note Admin pages are not yet fully available with the new Beta UI.

Beta UI Limitations

Explorer

  • When using the SQL compiler on the dataset overview for remote files, the Compile button is disabled because the execution of data files at the Spark layer is unsupported.
  • You cannot currently upload temp files from the new File Explorer page. This may be addressed in a future release.
  • The Formatted view tab on the File Explorer page only supports CSV files.

Connections

  • When adding a driver, if you enter the name of a folder that does not exist, a permission issue prevents the creation of a new folder.
    • A workaround is to use an existing folder.

Admin

  • When adding another external assignment queue from the Assignment Queue page, if an external assignment is already configured, the Test Connection and Submit buttons are disabled for the new connection. Only one external assignment queue can be configured at the same time.

Profile

  • When adding a distribution rule from the Profile page of a dataset, the Combined and Individual options incorrectly have "OR" and "AND" after them.
  • When using the Profile page, Min Length and Max Length does not display the correct string length. This will be addressed in an upcoming release.

Scorecards

  • When creating a new scorecard from the Page dropdown menu, because of a missing function, you cannot currently create a scorecard.
    • While a fix for this is planned for the September (2023.09) release, a workaround is to select the Create Scorecard workflow from the three dots menu instead.

Navigation

  • The Dataset Overview function on the Metadata Bar is not available for remote files.
  • The Dataset Overview modal throws errors for the following connection types:
    • BigQuery (Pushdown and Pullup)
    • Athena CDATA
    • Oracle
    • SAP HANA
  • The Dataset Overview function throws errors when you run SQL queries on datasets from S3 and BigQuery connections.

Release 2023.08

Highlights

    Pushdown
    We're delighted to announce that Pushdown processing for Databricks is now generally available! Pushdown is an alternative compute method for running DQ jobs, where Collibra DQ submits all of the job's processing directly to a SQL data warehouse, such as Databricks. When all of your data resides in Databricks, Pushdown reduces the amount of data transfer, eliminates egress latency, and removes the Spark compute requirements of a DQ job.

Note The legacy documentation hosted at dq-docs.collibra.com has reached its end-of-life period and now has a redirect link to the official Collibra Data Quality & Observability documentation.

New Features

Capabilities

  • When reviewing outlier findings, you can now use the Invalidate All option to invalidate all outliers from a given job run in bulk.
  • When configuring rule details on the Rule Workbench, you can now define the Scoring Type as either a Percent, which is the default scoring type, or Absolute, which deducts points for breaking rules where the percentage is greater than 0.
  • When reviewing rule break findings, you can now select Rule Breaks from the Actions dropdown menu to preview the rule break export file and copy a signed link to the external storage location, giving you more control over how you use and share break records.

DQ Cloud

  • When upgrades of DQ Edge sites are required, you can now leverage a utility script to update the Edge DQ version without reinstalling the Edge site.
  • We've added the config parameter licenseSource to the Collibra DQ Helm chart to make it easier for our internal teams to update DQ Cloud licenses.
    • "config" is the default value for DQ Cloud deployments.

Pushdown

  • When archiving break records from Databricks Pushdown jobs, you can now write them directly to a database or schema in Databricks.
  • When you archive break records to the source warehouse, records are now pruned according to specified parameters to prevent tables or schemas from growing to unreasonable sizes. When a job gets pruned from the jobs table, the source break records from the datasource get pruned as well.

Enhancements

Capabilities

  • When scheduling jobs to run automatically, jobs now reflect the runId of the time and timezone you set. Previously, the runId of scheduled jobs reflected the default UTC server time, irrespective of the timezone you set.
  • When setting up a connection to a Google BigQuery data source, you can now use the Service Account Credential option to upload a GCP JSON credential file to authenticate your connection. This enhancement means you no longer need to use the workaround of uploading a JSON credential file as a base64 encoded Kerberos secret.

Platform

  • The endpoints for the controller-email API have changed. The following endpoints are now available:
    • GET /v2/email/server
    • POST /v2/email/server
    • POST /v2/email/server/validate
    • POST /v2/email/group
    • GET /v2/email/server/status
    • GET /v2/email/groups
    • Note For more information about the new endpoints, refer to the UI Internal option in Swagger.

Integration

  • When using either the Update Integration Credentials or Add New Integration modal to map a connection, the Connections tab now only displays the full database mapping when you click Show Full Mapping, which improves the loading time and enhances the overall experience of the Connections tab.
    • Additionally, there is now a Save and Continue button on the Connections tab to ensure your mappings save before proceeding to the next step.

DQ Cloud

  • Pendo tracking events no longer contain license_key information when log files are sent to Collibra Console.
  • We've improved the performance and resilience of Collibra DQ on Edge sites.

Pushdown

  • If Archive Break Records Location is not selected when setting up archive break records from the Connections page, the default schema is now the default schema of the database platform .yaml file. Previously, when a break records output location was not specified, the default location, PUBLIC, would be used.
  • When writing custom rules with rlike (regex) operators against Pushdown datasets, exception messages no longer throw when the job runs.
  • When running a Pushdown job from the Collibra DQ app, not via API, the correct column count displays on the Findings page. Previously, the v2/run-job-json returned empty columns, which resulted in the total number of columns displayed on the Findings page as 0.

Fixes

Capabilities

  • When adding a rule with a string match to a dataset where the string contains parentheses, extra spaces around the parentheses are no longer mistakenly added. (ticket #117055, 118319)
  • When selecting a region for an Amazon S3 connection, you can now use AP_SOUTHEAST_3 and AP_SOUTHEAST_4. (ticket #119535)
  • When assessing outlier percent change calculations on Findings, the percentage now displays correctly. (ticket #114045)
  • When using the out-of-the-box Template rule, Not_In_Current_Run as a dataset rule, an exception no longer throws when the job runs. (ticket #118401)

Platform

  • When your Collibra DQ session expires, you are now redirected to the sign-in page. (ticket #111578)
  • When migrating DQ jobs from one environment to another, columns that were selected to be included in an outlier check in the source environment now persist to the target environment. Previously, some columns that were selected in the source environment did not persist to the target. (ticket #115224)
  • When attempting to edit a completed job on a Redshift connection, the preview limit is now set to 30 rows. Previously, larger datasets experienced long load times or timed out, which prevented you from editing them from Explorer. (ticket #119831, 120245)
  • Fixed the Critical CVE CVE-2023-34034 by upgrading the spring library. (ticket #122280)
  • When running a job, you no longer receive SQL grammar errors. (ticket #120691)

Known Limitations

Capabilities

  • When using the rule breaks capability on the classic Rules Findings tab and rule break records from native rules do not exist in the metastore, the preview modal displays a blank preview and sample file.
  • When using the rule breaks capability and the remote archive location does not have write permissions, the exception details of the rule being archived are only visible on the Rules Findings tab.
  • After updating the timezone from the default UTC timezone to a different one on a dataset with multiple days of data, the dates on the Findings page charts and Metadata Bar still reflect the default UTC timezone.
    • A fix will be included in Collibra DQ version 2023.11.

Pushdown

  • The archive break records capability cannot be configured from the settings modal on the Explorer page for BigQuery Pushdown connections.
  • When using the archive break records capability, BigQuery Pushdown currently only supports rule break records.
    • Additional support is planned for an upcoming release.
  • When using the archive break records capability to archive rule breaks generated from freeform rules with explicitly selected columns, and not SELECT *, you must include the Link ID column in the rule query for break records to archive correctly.
  • When you select a date column as the column of reference in the time slice filter of a BigQuery dataset, an unsupported data type message displays. While this will be resolved in an upcoming release, a temporary workaround is to use the SQL View option to manually update the source query to reference a date column. For example, select * from example.nyse where trade_date = safe_cast('${rd}' as DATE)

  • Example A rule query that includes the Link ID column is SELECT sales_id, cost FROM @dataset WHERE cost < 2000 where "sales_id" represents the Link ID column.


DQ Security Metrics

Note The medium, high, and critical vulnerabilities of the DQ Connector are now resolved.

Warning We found 1 critical and 1 high CVE in our JFrog scan. Upon investigation, these CVEs are disputed by Red Hat and no fix is available. For more information, see the official statements from Red Hat:
https://access.redhat.com/security/cve/cve-2023-0687 (Critical)
https://access.redhat.com/security/cve/cve-2023-27534 (High)

The following image shows a chart of Collibra DQ security vulnerabilities arranged by release version.

a table showing the number of critical security vulnerabilities over a period of 5 releases

The following image shows a table of Collibra DQ security metrics arranged by release version.

a chart showing the number of critical security vulnerabilities over a period of 5 releases

Beta UI Redesign

The following table shows the status of the Beta redesign of Collibra DQ pages as of this release. Because the status of these pages only reflects Collibra DQ's internal test environment and completed engineering work, pages marked as "Done" are not necessarily available externally. Full availability of the new Beta pages is planned for an upcoming release.

Page Location Status
Homepage Homepage Done
Sidebar navigation Sidebar navigation Done
User Profile User Profile Done
List View Views Done
Assignments Views Done
Pulse View Views Done
Catalog by Column (Column Manager) Catalog (Column Manager) Done
Dataset Manager Dataset Manager Done
Alert Definition Alerts Done
Alert Notification Alerts Done
View Alerts Alerts Done
Jobs Jobs Done
Jobs Schedule Jobs Schedule Done
Rule Definitions Rules Done
Rule Summary Rules Done
Rule Templates Rules Done
Rule Workbench Rules Done
Data Classes Rules Done
Explorer Explorer

In Progress

Reports Reports Done
Dataset Profile Profile Done
Dataset Findings Findings Done
Sign-in Page Sign-in Page Done

Note Admin pages are not yet fully available with the new Beta UI.

Beta UI Limitations

Explorer

  • When using the SQL compiler on the dataset overview for remote files, the Compile button is disabled because the execution of data files at the Spark layer is unsupported.
  • You cannot currently upload temp files from the new File Explorer page. This may be addressed in a future release.
  • The Formatted view tab on the File Explorer page only supports CSV files.

Connections

  • When adding a driver, if you enter the name of a folder that does not exist, a permission issue prevents the creation of a new folder.
    • A workaround is to use an existing folder.

Admin

  • When adding another external assignment queue from the Assignment Queue page, if an external assignment is already configured, the Test Connection and Submit buttons are disabled for the new connection. Only one external assignment queue can be configured at the same time.

Scorecards

  • When creating a new scorecard from the Page dropdown menu, because of a missing function, you cannot currently create a scorecard.
    • While a fix for this is planned for the September (2023.09) release, a workaround is to select the Create Scorecard workflow from the three dots menu instead.

Navigation

  • The Dataset Overview function on the metabar is not available for remote files.