Release 2024.01

Release Information

  • Release date of Data Quality & Observability Classic 2024.01: January 29, 2024
  • Publication dates:
    • Release notes: January 4, 2024
    • Documentation Center: January 29, 2024

Highlights

    Integration
    We’ve introduced several new features and enhancements to significantly improve the integration experience.

    • Aggregation paths on tables are now set by default, simplifying the configuration within the Collibra DQ Admin Console.
    • The new Quality tab is now available as part of the latest UI updates for Asset pages in Collibra Platform for private preview participants, giving you at-a-glance insights into the quality of your assets. These insights include:
      • Score and dimension roll-ups.
      • Column, data quality rule, data quality metric, and row overviews.
      • Details about the data elements of an asset.
    • When multiple jobs are attached to a table, the Quality tab on an Asset page shows an average similar to a scorecard in Collibra DQ.
  • Note Table Assets roll up to the DQ Job Data Asset. Best practice is to roll up DQ Job to Table to align the dedication score and the Quality tab Asset score for a Table Asset.

    Spark Version Update
    As of Collibra DQ version 2023.11, we’ve upgraded our out-of-the-box Apache Spark version from 3.2.0 to 3.4.1. We strongly encourage organizations on Standalone deployments of Collibra DQ to upgrade to the latest Spark package to utilize of the new features and address some of the major vulnerabilities with Spark 3.2 or earlier versions. Additionally, Collibra DQ support for Spark 2.x is limited as of Collibra DQ 2024.01, as Spark 2.x has reached its end of life.

    If you use Spark 3.2.2 or lower, we recommend upgrading to 3.4.1 to address various critical vulnerabilities present in the Spark core library, including log4J.

Important 
Changes for Kubernetes Deployments
As of Collibra DQ version 2023.11, we've updated the Helm Chart name from owldq to dq. For Helm-based upgrades, point to the new Helm Chart while maintaining the same release name. Please update your Helm install command by referring to the renamed parameters in the values.yaml file. It is also important to note that the pull secret has changed from owldq-pull-secret to dq-pull-secret.

Further, following deployment, your existing remote agent name will change. For example, if your agent name is owldq-owl-agent-collibra-dq, the new agent name will be dq-agent-collibra-dq. If your organization uses APIs for development, ensure that you upgrade AGENT name configurations in your environments.

Lastly, when you deploy using the new Helm Charts, new service (Ingress/Load Balancer) names are created. This changes the IP address of the service and requires you to reconfigure your Load Balancer with the new IP.

Please see the expandable sections below for more details about specific changes.

Note 
If your organization has a standalone deployment of Collibra DQ with SSL enabled for DQ Web, and both DQ Web and DQ Agent are on the same VM or server, we recommend upgrading directly to Collibra DQ 2023.11.3 patch version or 2024.01.

Migration Updates

Important This section only applies if you are upgrading from a version older than Collibra DQ 2023.09 on Spark Standalone. If you have already followed these steps during a previous upgrade, you do not have to do this again.

We have migrated our code to a new repository for improved internal procedures and security. Because owl-env.sh jar files are now prepended with dq-* instead of owl-*, if you have automation procedures in place to upgrade Collibra DQ versions, you can use the RegEx replace regex=r"owl-.*-202.*-SPARK.*\.jar|dq-.*-202.*-SPARK.*\.jar" to update the jars.

Additionally, please note the following:

  • Standalone Upgrade Steps
  • When upgrading from a Collibra DQ version before 2023.09 to a Collibra DQ version 2023.09 or later on Spark Standalone, the upgrade steps have changed.

New Features

Integration

  • You can now see the Overview DQ Score on an Asset when searching via Data Marketplace. This improves your ability to browse the data quality scores of Assets without opening their Asset Pages.

Enhancements

Capabilities

  • When setting up a connection to a Google BigQuery data source, you can now use Workload Identity to authenticate your connection. By using Workload Identity to authenticate your BigQuery connection, you can now access data stored in BigQuery across GCP projects without relying on JSON credential files or metadata obfuscation.
  • When using the Alert Builder, you can now create an alert for when a job run fails. You can configure this by selecting the Job Failure option from the Status Alert modal on the Alert Builder page.
    • Additionally, the alert types on the Alert Builder page have changed from Dataset Run Alerts and Job Status to Condition and Status, respectively.
  • When you receive an email alert, the body of the alert now includes the Alert Type as either Condition or Status. When the Alert Type is Condition, the query upon which the condition is based also displays.
  • When using the /v3/rules/{dataset}/ruleBreaksSelect and you select INTERNAL for the storageType parameter, the API generates a SQL query that returns the break records included in the PostgreSQL Metastore.
  • The /v2/getbreakingrecords API is now filtered by runId and limited to 100 records by default.
  • When using multi-tenancy with SAML enabled, you can now set showsamltenantmetadatalabel=false from the ConfigMap to hide tenant metadata labels from SAML-enabled tenants to match the names of non-SAML-authenticated tenants.
  • When signing into Collibra DQ, you are now required to select a tenant from the dropdown menu before proceeding.
  • The Scheduler page is now powered by the v2/getallscheduledjobs API.
  • What was previously the DQ Job button next to the search bar at the top of the Collibra DQ application is now called Findings.

Platform

  • Standalone installations now come packaged with the PostgreSQL 12 installer instead of PostgreSQL 11.
  • Helm Charts are now versioned according to their corresponding release version. You can find version details in the Charts.yaml file.
  • You can now use the default truststore by setting the global.web.tls.trust.default flag to true.
  • With SAML configured, you can now use the default keystore without the need for a custom keystore.
  • We've introduced pattern validation for some commonly used IDs and names in Collibra DQ. You can override these patterns in the owl-env.sh file for Standalone or the Web ConfigMap for Cloud Native by replacing the default values using a RegEx for the following env variables:
  • Env variable Variable name Default Usage
    VALIDATION_PATTERN_NAME_ID Name ID pattern Alphanumeric characters (letters and digits) and underscores (_). For overriding defaults of Name IDs in Collibra DQ.
    VALIDATION_PATTERN_COMMON_NAME Common name Alphanumeric characters (letters and digits) and underscores (_). For overriding defaults of common names used in Collibra DQ.
    VALIDATION_PATTERN_SCHEMA_NAME Schema name Alphanumeric characters (letters and digits) and underscores (_). For overriding defaults of connection schema names in Collibra DQ.
    VALIDATION_PATTERN_CONN_NAME Connection name Alphanumeric characters (letters and digits), underscores (_), and hyphens (-). For overriding defaults of connection names in Collibra DQ.
    VALIDATION_PATTERN_DATASET_NAME Dataset name Alphanumeric characters (letters and digits), underscores (_), and hyphens (-). For overriding defaults of dataset names in Collibra DQ.
    VALIDATION_PATTERN_FILE_NAME File name Alphanumeric characters (letters and digits), underscores (_), hyphens (-), periods (.), backslashes (/), and spaces ( ). For overriding defaults of file name usages in Collibra DQ.
    VALIDATION_PATTERN_LDAP_DN DN Comma separated key value pair with both the key and value consisting of lowercase letters, digits, and hyphens. For overriding defaults of LDAP DN usages in Collibra DQ.

    Example If you want to override the default that does not allow spaces in the connection name, set the connection name variable to the following RegEx: VALIDATION_PATTERN_CONN_NAME=^[a-zA-Z0-9_- ]+$

  • In order to promote compatibility with various software delivery platforms, the leading ‘0’ in the defaultMode value of 0493 has been removed from 4 YAML files for Kubernetes deployments. The new defaultMode is now 493 for the following YAML files:
    • k8s/charts/dq/charts/dq-agent/templates/dq-agent-statefulset.yaml
    • k8s/charts/dq/charts/dq-livy/templates/dq-livy-deployment.yaml
    • k8s/charts/dq/charts/dq-web/templates/dq-web-statefulset.yaml
    • k8s/charts/dq/charts/spark-history-server/templates/deployment.yaml
  • Configurable PBE encryption is now supported for Kubernetes and Standalone deployments of Collibra DQ.
  • When validating a finding and assigning it to a ServiceNow user or group, you can now reassign this finding from ServiceNow to a Collibra DQ user. (ticket #122476, 133165)

Pushdown

  • When changes are made to the schema based on the source query, the dataset_schema table now contains any changes made.

Fixes

Capabilities

  • When creating a rule on a Pullup dataset that uses “in” or “rlike” where the condition ends on wrapped parentheses (ie. )) ), the string replace logic now works as expected. Previously, rules that contained rlike() or in() would throw exceptions when the job ran. (ticket #128759)
  • Important In order to keep rules as close to the original input as possible, some of the padding that was originally appended to certain characters has been removed.

  • When archiving rule breaks, break records now export to S3 buckets as expected. (ticket #125411)
  • When changing the business unit on the Metadata Bar or Dataset Manager, the updated business unit now replaces the old one in the Metastore instead of creating an additional entry. (ticket #127823, 130086)
  • When editing a dataset from Dataset Manager in the new UI, the Schema/Parent Folder and Table/File Name fields are no longer swapped. (ticket #129103)
  • When running a job against a BigQuery dataset where the underlying query uses “from” before the schema/table, Collibra DQ now parses the “from” correctly. (ticket #129798)
  • When attempting to create a Pullup job against a Trino schema whose name contains the escape character \, you can now view the table details in Explorer. (ticket #131065)
  • The configs dupelimit and dupelimitui now work as expected where dupelimit restricts the number of dupe findings stored in the Metastore and of those findings, dupelimitui restricts the number of dupes shown on the Dupes tab on the Findings page. (ticket #127748)
  • When using the Rule Builder in the new UI, freeform rules using RLIKE now successfully pass validation. (ticket #127779)
  • When using the Alert Builder in the new UI, you can now update alert batch names as expected. (ticket #129079)
  • When a scheduled job runs with zero rows and Parallel JDBC enabled, it no longer fails. (ticket #128385)
  • When using Kerberos-based authentication for PostgreSQL or CockroachDB data sources, jobs no longer time out before authenticating against KDC. (ticket #128540)

Platform

  • While configuring SSL after a fresh Standalone install of or upgrade to Collibra DQ version 2023.11, the DQ agent now starts as expected. (ticket #131078)
  • Note With this update, any system that has export SERVER_PORT set will impact the default value of -1. You need to comment out export SERVER_PORT=9000 as PORT=9000 #owl-web port NUMBER to define the port.

  • We fixed an issue that caused unexpected license name requests. (ticket #126731)

DQ Integration

  • When mapping columns from rules in Collibra DQ, column names are now parsed correctly in Collibra Platform.
  • When a User Defined Rule is deactivated or deleted from a dataset in Collibra DQ, then the Rule and Score status in Collibra Platformwill be set to “Suppressed” when the dataset runs again in Collibra DQ. (ticket #124414)

Pushdown

  • When running a Redshift Pushdown job to create a profile of a Redshift dataset, you can now view TopN Shapes for columns containing NULL values. (ticket # 129221)
  • When casting from one data type to another on a Redshift dataset, the job now runs successfully without throwing an exception message. (ticket #128718)
  • When creating a Pushdown job on a BigQuery dataset that contains a DATE column time slice, you can now create the job without receiving a command line error. (ticket #129283)
  • When using the SQL Query option on a Snowflake table with a mixed case name, you can now create the job without receiving an error. (ticket #126118)

DQ Cloud

  • When running a job on a Snowflake table containing a timestamp column, timestamps now translate correctly as date, time, and timestamp types. (ticket #120975)
  • Note While this fix was originally intended to be limited to DQ Cloud, timestamps also translate correctly on Standalone deployments.

  • When using the Dupes tab on the Findings page, dupe observations now display correctly when more dupes are discovered than the preview limit of 30. (ticket #127625)

Known Limitations

  • When configuring dupelimitui, values above 2000 result in some UI lag. Because of this, we recommend using dupelimitui values under 2000.

DQ Security