Release Notes

Important 

Disclaimer - Failure to upgrade to the most recent release of the Collibra Service may adversely impact the security, reliability, availability, integrity, performance or support (including Collibra’s ability to meet its service levels) of the Service. Collibra hereby disclaims all liability, express or implied, for any reduction in the security, reliability, availability, integrity, performance or support of the Service to the extent the foregoing would have been avoided had you allowed Collibra to implement the most current release of the Service when scheduled by Collibra. Further, to the extent your failure to upgrade the Service impacts the security, reliability, availability, integrity or performance of the Service for other customers or users of the Service, Collibra may suspend your access to the Service until you have upgraded to the most recent release.

Release 2024.04

Release Information

  • Expected release date of Collibra Data Quality & Observability 2024.04: April 29, 2024
  • Publication dates:
    • Release notes: April 4, 2024
    • Documentation Center: April 4, 2024

Enhancements

Capabilities

  • You can now authenticate SQL Server connections using Active Directory Password and Active Directory Service Principal. This enhancement, available in both Java 11/Spark 3.4.1 and Java 8/Spark 3.2.2, allows you to use Azure AD-based Synapse SQL and Azure Service Principal authentication in Collibra Data Quality & Observability, to further enable your team to follow Azure authentication best practices and InfoSec policies. For more information on configuration details, see the Authentication documentation for SQL Server.
  • When using scorecards to manage datasets by owner, we’ve added a Rename Page button in the upper right corner of the Scorecards page to allow you to rename your scorecard.
  • Archive Break Records for both Pushdown and Pullup jobs now parses linkId values in the Sample File Preview and Rule Breaks Preview on the Findings page and in the downloadable CSV file.
  • Running a job with a SQLF rule that references a secondary dataset or uses @t1 no longer flashes an extra record in the job log or on the Jobs page with an Agent Id of 0. With this enhancement, only a single job record displays on the Jobs page. (idea #DCC-I-2413)
  • When using the Dataset Overview, you can now click Download Results to download a CSV file containing the contents of the Results table.

Platform

  • We improved some of the ways you can create or edit alerts:
    • To enhance the fluidity of batch alerting from the Status Alert and Condition Alert modals, you can now select the new Batch option to display the Batch Name field, where you can search, select, or create alert batches.
    • With the Batch option selected, the Alert Recipient field locks. To edit this field, you can click the to unlock it. When the Batch option is not selected, the Alert Recipient field remains unlocked and editable.

Fixes

Capabilities

  • We improved the Data Shape Granular setting to include more string length values in Shapes findings. (ticket #135351)
  • When joining datasets created from two different data source connections with Kerberos keytab as the authentication type, we fixed an issue that prevented the secondary dataset from loading because its Kerberos authentication was incorrectly passed. (ticket #131309)
    • To ensure that the secondary dataset is passed correctly during Kerberos authentication, add -jdbcprinc and -jdbckeytab to the Agent configuration's Free Form (Appended) section. For example, -jdbckeytab /tmp/keytab/dq-user.keytab -jdbcprinc [email protected]
  • We added the ability to export all job schedule records from the Jobs Schedule page, using the Export All option. Previously, the ability to export job schedule records was limited to up to 20 records per page. (ticket #136970)
  • We added the following enhancements to the Findings page when data quality findings exceed certain thresholds. (ticket #136922)
    • When there are more than 9,999 findings of any data quality layer, the value displayed in the badge on the corresponding findings tab will round to the nearest thousand with a +. For example, 12,345 will display as 12K+.
    • When there are more than 999,000 findings of any data quality layer, the value displayed in the badge on the corresponding findings tab will always display as 1M+. For example, 1,234,567 will display as 1M+.
    • When a value is truncated, you can hover your cursor over the badge to display the exact number of findings.
  • When using SAML SSO to sign into multi-tenant Collibra Data Quality & Observability environments, the SAML Enabled Tenants dropdown menu no longer shows the Tenant Name. Instead, the dropdown menu now shows the Tenant Display Name. (ticket #137865)
  • We removed the runId from the “Findings - Most Recent Run” link in alert emails to correctly take you to the most recent run of your job when you click the link.
  • When the “Findings - Run which Produced the Alert” link in alert emails contains a runId and a timestamp in the URL, you will be taken to that specific job runId and timestamp when you click the link.

Platform

  • We introduced pagination to the Rule Summary page while limiting each page to displaying a maximum of 25 records. Previously, the Rule Summary page only displayed 25 records on a single page, even when the number of records should have exceeded 25. (ticket #140229)
  • When an admin sets a limit for the datashapelimitui setting from Admin Limits, the Findings page no longer displays Shapes findings beyond that limit. (ticket #129091)
  • The Role Management page in the latest UI now allows admins to set the access rights of roles and users when associating them with specific datasets (ticket #138489)
  • The Dataset Manager page now loads correctly when the alias name of a dataset is null. (ticket #133400)
  • When filtering by row count on the Dataset Manager page, the results included in the filter no longer include daily row counts that exceed the range you select. (ticket #133453)
  • JSON files from Amazon S3 connections no longer fail to load when Livy is not enabled.
  • When large datasets (for example, one with more than 100 million records) timeout with a 504 error response when loading the table in Explorer, an error message appears in the Explorer UI with details about the error. (ticket #133530)

Pushdown

  • Databricks Pushdown now supports ANSI-compliant SQL on the server side. (ticket #136562)
  • The out-of-the-box Data Category, Currency_CD, no longer counts the number of null and empty values as part of the underlying SQL query. (ticket #133578)

Release 2024.03

Release Information

  • Release date of Collibra Data Quality & Observability 2024.03: April 1, 2024
  • Publication dates:
    • Release notes: March 8, 2024
    • Documentation Center: March 11, 2024

Enhancements

Capabilities

  • Admins can now view monthly snapshots of the total number of active datasets in the new Datasets column on the Admin Console Usage page. Additionally, Columns statistics are no longer counted twice when you edit and re-run a dataset with different columns.
  • Admins can now optionally remove “Collibra Data Quality & Observability” from alert email subjects. When removing the Collibra Data Quality & Observability from the subject, you must fill in all the alert SMTP details to use the alert configuration checkboxes on the screen. By removing Collibra Data Quality & Observability from the alert subject, you can now set up your own services to automatically crawl Collibra DQ email alerts.
  • Dataset-level alerts, such as Job Completion, Job Failure, and Condition Alerts, as well as global-level alerts for both Pullup and Pushdown Job Failure now send incrementally from the auto-enabled alert queue.
  • Important As a result of this enhancement, following an upgrade to Collibra Data Quality & Observability 2024.03, any old, unsent alerts in the alert queue will be sent automatically. This is a one-time event and these alerts can be safely ignored.

    • We've also added new functionality where, when an alert fails to send, it will still be marked as email_sent = true in the alert_q Metastore table. However, no email alerts will be sent as a result of this. An enhancement to automatically clean the alert_q table of stale alerts marked as email_sent = true is scheduled for the upcoming Collibra Data Quality & Observability 2024.05 release.
  • We've optimized internal processing when querying Trino connections by passing the catalog name in the Connection URL. The catalog name can be set by creating a Trino connection from Admin Console Connections and adding ConnCatalog=$catalogName to the Connection URL.
  • We've added a generic placeholder in the Collibra DQ Helm Charts to allow you to bring in additional external mount volumes to the DQ Web pod. Additionally, when enabled in the DQ check logs and external mount volumes, the persistent volumes provisioned to set the storage class of the persistent volume claims for Collibra DQ now include a placeholder value to allow you to specify the storage class type. This provides an option to bring Azure vault secrets as external mount volumes into the DQ Web pod.

Platform

  • We now support Cloud Native Collibra DQ deployments on OpenShift Container Platform 4.x.
  • When using a proxy, SAML does not support URL-based metadata. It can only support file-based metadata. To ensure this works properly, set the property SAML_METADATA_USER_URL=false in the owl-env.sh file for Standalone deployments or DQ-web ConfigMap for Cloud Native.

Fixes

Capabilities

  • We fixed an issue that caused Redshift datasets that referenced secondary Amazon S3 datasets to fail with a 403 error. (ticket #132975)
  • On the Profile page of a dataset, the number of TopN shapes now matches the total number of occurrences of such patterns. (ticket #133817)
  • When deleting and renaming datasets from the Dataset Manager, you can now rename a dataset using the name of a previously deleted one. (ticket #132799)
  • When renaming a dataset from the Dataset Manager and configuring it to run on a schedule, the scheduled job no longer combines the old and new names of the dataset when it runs. (ticket #132798)
  • On the Role Management page in the latest UI, roles must now pass validation to prevent ones that do not adhere to the naming requirements from being created. (ticket #133497)
  • When creating a Trino dataset with Pattern detection enabled, date type columns are no longer cast incorrectly as varchar type columns during the historical data load process. (ticket #132478)
  • On the Connections page in the latest UI, the input field now automatically populates when you click Assume Role for Amazon S3 connections. (ticket #132323, 132423)
  • On the Findings page, when reviewing the Outliers tab, the Conf column now has a tooltip to clarify the purpose of the confidence score. (ticket #129768)
  • When DQ Job Security is enabled and a user does not have ROLE_OWL_CHECK assigned to them, both Pullup and Pushdown jobs now show an error message “Failed to load job to the queue: DQ Job Security is enabled. You do not have the authority to run a DQ Job, you must be an Admin or have the role Role_Owl_Check and be mapped to this dataset.” (ticket #133623)
  • When creating a DQ job on a root or nested Amazon S3 folder without any files in it, the system now returns a more elegant error message. (ticket #134187)
  • On the Profile page in the latest UI, the WHERE query is now correctly formed when adding a valid values quick rule. (ticket #133455)

Platform

  • When viewing TopN values, the previously encrypted valid values now decrypt correctly. (ticket #131951)
  • When a user with ROLE_DATA_GOVERNANCE_MANAGER edits a dataset from the Dataset Manager, the Metatags field is the only field such a user can edit and have their updates pushed to the Metastore. (ticket #132468, 135889)
  • We fixed an issue with the /v3/datasetDefs/{dataset} and /v3/datasetDefs/{dataset}/cmdLine APIs that caused the -lib, -srclib, and -addlib parameters to revert in the command line. (ticket #131281)

Pushdown

  • When casting from one data type to another on a Redshift dataset, the job now runs successfully without returning an exception message. (ticket #128718)
  • When running a job with outliers enabled, we now parse the source query for new line characters to prevent carriage returns from causing the job to fail. (ticket #132322)
  • The character limit for varchar columns is now 256. Additionally, to prevent jobs from failing when a varchar column exceeds the 256-character limit. (ticket #131355)
  • BigQuery Pushdown jobs on huge datasets of more than 10GB no longer fail with a “Response too large” error. (ticket #134643, 135504)
  • When running a Pushdown job with archive break records enabled without a link ID assigned to a column, a helpful warning message now highlights the requirements for proper break record archival. (ticket #132545)
  • The Source Name parameter on the Connections template for Pushdown connections now persists to the /v2/getcatalogandconnsrcnamebydataset API call as intended. (ticket #132334)

Limitations

  • TopN values from jobs that ran before enabling encryption on the Collibra DQ instance are not decrypted. To decrypt TopN values after enabling encryption, re-run the job once encryption is enabled on your Collibra DQ instance.

DQ Security

Release 2024.02

Release Information

  • Release date of Collibra Data Quality & Observability 2024.02: February 26, 2024
  • Publication dates:
    • Release notes: January 22, 2024
    • Documentation Center: February 4, 2024

Highlights

    Archive Break Records

    Pullup
    When rule breaks are stored in the PostgreSQL Metastore with link IDs assigned, you can now download a CSV file containing the details of the rule breaks and link ID columns via the Findings page Rules tab Actions Rule Breaks modal.

    Pushdown
    In order to completely remove sensitive data from the PostgreSQL Metastore, you can now enable Data Preview from Source in the Archive Break Records section of the Explorer Settings. When you enable Data Preview from Source, data preview records do not store in the PostgreSQL Metastore.

    Previews of break records associated with Rules, Outliers, Dupes, and Shapes breaks on the Findings page reflect the current state of the records as they appear in your data source. With this option disabled, the preview records that display in the web app are snapshots of the PostgreSQL Metastore records at runtime. This option is disabled by default.

    Additionally, with Archive Break Records enabled and a link ID column assigned, you can now download a CSV or JSON file containing the details of the breaks and link ID columns via the Findings page Rules, Outliers, Dupes, or Shapes tab Actions Rule Breaks modal.

    Lastly, when Archive Break Records is enabled, you can now optionally enter an alternative dataset-level schema name to store source break records, instead of the schema provided in the connection.

Important 
Changes for Kubernetes Deployments
As of Collibra DQ version 2023.11, we've updated the Helm Chart name from owldq to dq. For Helm-based upgrades, point to the new Helm Chart while maintaining the same release name. Please update your Helm install command by referring to the renamed parameters in the values.yaml file. It is also important to note that the pull secret has changed from owldq-pull-secret to dq-pull-secret.

Further, following deployment, your existing remote agent name will change. For example, if your agent name is owldq-owl-agent-collibra-dq, the new agent name will be dq-agent-collibra-dq. If your organization uses APIs for development, ensure that you upgrade AGENT name configurations in your environments.

Lastly, when you deploy using the new Helm Charts, new service (Ingress/Load Balancer) names are created. This changes the IP address of the service and requires you to reconfigure your Load Balancer with the new IP.

Please see the expandable sections below for more details about specific changes.

Note 
If your organization has a standalone deployment of Collibra DQ with SSL enabled for DQ Web, and both DQ Web and DQ Agent are on the same VM or server, we recommend upgrading directly to Collibra DQ 2023.11.3 patch version instead of 2023.11. For more information, see the Maintenance Updates section below.

Migration Updates

Important This section only applies if you are upgrading from a version older than Collibra DQ 2023.09 on Spark Standalone. If you have already followed these steps during a previous upgrade, you do not have to do this again.

We have migrated our code to a new repository for improved internal procedures and security. Because owl-env.sh jar files are now prepended with dq-* instead of owl-*, if you have automation procedures in place to upgrade Collibra DQ versions, you can use the RegEx replace regex=r"owl-.*-202.*-SPARK.*\.jar|dq-.*-202.*-SPARK.*\.jar" to update the jars.

Additionally, please note the following:

  • Standalone Upgrade Steps
  • When upgrading from a Collibra DQ version before 2023.09 to a Collibra DQ version 2023.09 or later on Spark Standalone, the upgrade steps have changed.

Enhancements

Capabilities

  • When using the Dataset Overview, you can now click the -q button to load the contents of the dataset source query into the SQL editor.
  • When using the Dataset Overview, you can now use Find and Replace to find any string in the SQL editor and replace it with another.
  • When a finding is assigned to a ServiceNow incident and the ServiceNow connection has Publish Only enabled on the ServiceNow Configuration modal in the Admin screens, this finding record is pushed to ServiceNow as it was in previous versions, but the status is no longer linked. This means that you can adjust the statuses on the ServiceNow Incident as you wish and the finding in DQ. Whereas before, the ServiceNow Incident had to be closed in order for the DQ finding to be resolved.
  • From the Settings page in Explorer, you can now select the Core Fetch Mode option to allow SQL queries with spaces to run successfully. When selected, this option adds -corefetchmode to the command line to enable the core to fetch the query from the load options table and override the -q.
  • When attempting to connect to NetApp or Amazon S3 endpoints in URI format with the HTTPS option selected, you can now add the following properties to the Properties tab on Amazon S3 connection templates to successfully create connections:
    • For Amazon S3 endpoint URI: s3-endpoint=s3
    • For NetApp: s3-endpoint=netapp
  • When using the Pulse View, you can now select a few new options from the Show Failed dropdown menu, including Failed Job Runs and Failing Scores. Previously, the Show Failed option only displayed job runs that previously failed.
  • You can now use uppercasing in secondary datasets and rule references.
  • You can now configure arbitrary users as part of the root user group for DQ pod deployment.
  • Due to security concerns, we have removed the license key from the job logs.

Platform

  • We've upgraded the following drivers to their latest versions:
  • Driver Version
    Databricks 2.6.36
    Google BigQuery

    1.5.2.1005

    Dremio 24.3.0
    Snowflake 3.14.4
  • You can now enable multi-tenancy for a notebook API.
  • We now apply the same Spark CVE fixes that are applied to Cloud Native deployments of Collibra DQ to Standalone deployments.

Pushdown

  • From the Settings page on Explorer, you can now select Date or DateTime (TimeStamp) from the Date Format dropdown menu to substitute the runDate and runDateEnd at runtime.
  • To conserve memory and processing resources, the results query now rolls up outliers and shapes, and the link IDs no longer persist to the Metastore.
  • All rules from the legacy Rule Library function correctly for Snowflake and Databricks Pushdown except for Having_Count_Greater_Than_One and Two_Decimal_Places when Link ID is enabled. See the Known Limitations section below for more information.
  • You can now use cross-dataset rules that traverse across connections on the same data source.

Beta Features

Collibra AI

  • SQL assistant for data quality (beta) now allows you to select between four new options to generate prompts for:
    • Categorical: Writes a SQL query to detect categorical outliers.
    • Dupe: Writes a SQL query to detect duplicate values.
    • Record: Writes a SQL query to find values that appear on a previous day but not for the next day.
    • Pattern: Writes a SQL query to find infrequent combinations that appear less than 5 percent of the time in the columns you specify.

DQ Integration

  • The new Quality tab is now available as part of the latest UI updates for Asset pages in Collibra Data Intelligence Platform for private beta participants, giving you at-a-glance insights into the quality of your assets. These insights include:
    • Score and dimension roll-ups.
    • Column, data quality rule, data quality metric, and row overviews.
    • Details about the data elements of an asset.
  • You can now see the Overview DQ Score on an Asset when searching via Data Marketplace. This improves your ability to browse the data quality scores of Assets without opening their Asset Pages.

Pushdown

Fixes

Capabilities

  • While editing the command line of a job containing an outlier by replacing -by HOUR with -tbin HOUR, the command line no longer reverts to its original state after profiling completes. (ticket #126764)
  • When exporting the job log details to CSV, Excel, PDF, or Print from the Jobs page, the exported data now contains all rows of data. (ticket #129832)
    • Additionally, when exporting the job log details to PDF from the Jobs page, the PDF file now contains the correct column headers and data. (ticket #129832)
  • When working with the Alert Builder, you no longer see a “No Email Servers Configured” message despite having correctly configured SMTP settings. (ticket #127520)

DQ Integration

  • When integrating data from an Athena connection, you can now use the dropdown menu in rules to map an individual column to a Rule in Collibra Data Intelligence Platform. (ticket #125152, 126150)

Pushdown

  • When archive breaking records is enabled, statements containing backticks ` or new lines are properly inserted into the source system. (ticket #130122)
  • Snowflake Pushdown jobs with many outlier records either dropped or added, new limits to memory usage now prevent out-of-memory issues. (ticket #126284)

Known Limitations

  • When Link ID is enabled for a Snowflake or Databricks Pushdown job, Having_Count_Greater_Than_One and Two_Decimal_Places do not function properly.
    • The workaround for Having_Count_Greater_Than_One is to manually add the Link ID to the group by clause in the rule query.
    • The workaround for Two_Decimal_Places is to add a * to the inner query.

DQ Security

Note If your current Spark version is 3.2.2 or older, we recommend upgrading to Spark 3.4.1 to address various critical vulnerabilities present in the Spark core library, including Log4j.

Release 2024.01

Release Information

  • Release date of Collibra Data Quality & Observability 2024.01: January 29, 2024
  • Publication dates:
    • Release notes: January 4, 2024
    • Documentation Center: January 29, 2024

Highlights

    Integration
    We’ve introduced several new features and enhancements to significantly improve the integration experience.

    • Aggregation paths on tables are now set by default, simplifying the configuration within the Collibra DQ Admin Console.
    • The new Quality tab is now available as part of the latest UI updates for Asset pages in Collibra Data Intelligence Platform for private beta participants, giving you at-a-glance insights into the quality of your assets. These insights include:
      • Score and dimension roll-ups.
      • Column, data quality rule, data quality metric, and row overviews.
      • Details about the data elements of an asset.
    • When multiple jobs are attached to a table, the Quality tab on an Asset page shows an average similar to a scorecard in Collibra DQ.
  • Note Table Assets roll up to the DQ Job Data Asset. Best practice is to roll up DQ Job to Table to align the dedication score and the Quality tab Asset score for a Table Asset.

    Spark Version Update
    As of Collibra DQ version 2023.11, we’ve upgraded our out-of-the-box Apache Spark version from 3.2.0 to 3.4.1. We strongly encourage organizations on Standalone deployments of Collibra DQ to upgrade to the latest Spark package to utilize of the new features and address some of the major vulnerabilities with Spark 3.2 or earlier versions. Additionally, Collibra DQ support for Spark 2.x is limited as of Collibra DQ 2024.01, as Spark 2.x has reached its end of life.

    If you use Spark 3.2.2 or lower, we recommend upgrading to 3.4.1 to address various critical vulnerabilities present in the Spark core library, including log4J.

Important 
Changes for Kubernetes Deployments
As of Collibra DQ version 2023.11, we've updated the Helm Chart name from owldq to dq. For Helm-based upgrades, point to the new Helm Chart while maintaining the same release name. Please update your Helm install command by referring to the renamed parameters in the values.yaml file. It is also important to note that the pull secret has changed from owldq-pull-secret to dq-pull-secret.

Further, following deployment, your existing remote agent name will change. For example, if your agent name is owldq-owl-agent-collibra-dq, the new agent name will be dq-agent-collibra-dq. If your organization uses APIs for development, ensure that you upgrade AGENT name configurations in your environments.

Lastly, when you deploy using the new Helm Charts, new service (Ingress/Load Balancer) names are created. This changes the IP address of the service and requires you to reconfigure your Load Balancer with the new IP.

Please see the expandable sections below for more details about specific changes.

Note 
If your organization has a standalone deployment of Collibra DQ with SSL enabled for DQ Web, and both DQ Web and DQ Agent are on the same VM or server, we recommend upgrading directly to Collibra DQ 2023.11.3 patch version or 2024.01.

Migration Updates

Important This section only applies if you are upgrading from a version older than Collibra DQ 2023.09 on Spark Standalone. If you have already followed these steps during a previous upgrade, you do not have to do this again.

We have migrated our code to a new repository for improved internal procedures and security. Because owl-env.sh jar files are now prepended with dq-* instead of owl-*, if you have automation procedures in place to upgrade Collibra DQ versions, you can use the RegEx replace regex=r"owl-.*-202.*-SPARK.*\.jar|dq-.*-202.*-SPARK.*\.jar" to update the jars.

Additionally, please note the following:

  • Standalone Upgrade Steps
  • When upgrading from a Collibra DQ version before 2023.09 to a Collibra DQ version 2023.09 or later on Spark Standalone, the upgrade steps have changed.

New Features

Integration

  • You can now see the Overview DQ Score on an Asset when searching via Data Marketplace. This improves your ability to browse the data quality scores of Assets without opening their Asset Pages.

Enhancements

Capabilities

  • When setting up a connection to a Google BigQuery data source, you can now use Workload Identity to authenticate your connection. By using Workload Identity to authenticate your BigQuery connection, you can now access data stored in BigQuery across GCP projects without relying on JSON credential files or metadata obfuscation.
  • When using the Alert Builder, you can now create an alert for when a job run fails. You can configure this by selecting the Job Failure option from the Status Alert modal on the Alert Builder page.
    • Additionally, the alert types on the Alert Builder page have changed from Dataset Run Alerts and Job Status to Condition and Status, respectively.
  • When you receive an email alert, the body of the alert now includes the Alert Type as either Condition or Status. When the Alert Type is Condition, the query upon which the condition is based also displays.
  • When using the /v3/rules/{dataset}/ruleBreaksSelect and you select INTERNAL for the storageType parameter, the API generates a SQL query that returns the break records included in the PostgreSQL Metastore.
  • The /v2/getbreakingrecords API is now filtered by runId and limited to 100 records by default.
  • When using multi-tenancy with SAML enabled, you can now set showsamltenantmetadatalabel=false from the ConfigMap to hide tenant metadata labels from SAML-enabled tenants to match the names of non-SAML-authenticated tenants.
  • When signing into Collibra DQ, you are now required to select a tenant from the dropdown menu before proceeding.
  • The Scheduler page is now powered by the v2/getallscheduledjobs API.
  • What was previously the DQ Job button next to the search bar at the top of the Collibra DQ application is now called Findings.

Platform

  • Standalone installations now come packaged with the PostgreSQL 12 installer instead of PostgreSQL 11.
  • Helm Charts are now versioned according to their corresponding release version. You can find version details in the Charts.yaml file.
  • You can now use the default truststore by setting the global.web.tls.trust.default flag to true.
  • With SAML configured, you can now use the default keystore without the need for a custom keystore.
  • We've introduced pattern validation for some commonly used IDs and names in Collibra DQ. You can override these patterns in the owl-env.sh file for Standalone or the Web ConfigMap for Cloud Native by replacing the default values using a RegEx for the following env variables:
  • Env variable Variable name Default Usage
    VALIDATION_PATTERN_NAME_ID Name ID pattern Alphanumeric characters (letters and digits) and underscores (_). For overriding defaults of Name IDs in Collibra DQ.
    VALIDATION_PATTERN_COMMON_NAME Common name Alphanumeric characters (letters and digits) and underscores (_). For overriding defaults of common names used in Collibra DQ.
    VALIDATION_PATTERN_SCHEMA_NAME Schema name Alphanumeric characters (letters and digits) and underscores (_). For overriding defaults of connection schema names in Collibra DQ.
    VALIDATION_PATTERN_CONN_NAME Connection name Alphanumeric characters (letters and digits), underscores (_), and hyphens (-). For overriding defaults of connection names in Collibra DQ.
    VALIDATION_PATTERN_DATASET_NAME Dataset name Alphanumeric characters (letters and digits), underscores (_), and hyphens (-). For overriding defaults of dataset names in Collibra DQ.
    VALIDATION_PATTERN_FILE_NAME File name Alphanumeric characters (letters and digits), underscores (_), hyphens (-), periods (.), backslashes (/), and spaces ( ). For overriding defaults of file name usages in Collibra DQ.
    VALIDATION_PATTERN_LDAP_DN DN Comma separated key value pair with both the key and value consisting of lowercase letters, digits, and hyphens. For overriding defaults of LDAP DN usages in Collibra DQ.

    Example If you want to override the default that does not allow spaces in the connection name, set the connection name variable to the following RegEx: VALIDATION_PATTERN_CONN_NAME=^[a-zA-Z0-9_- ]+$

  • In order to promote compatibility with various software delivery platforms, the leading ‘0’ in the defaultMode value of 0493 has been removed from 4 YAML files for Kubernetes deployments. The new defaultMode is now 493 for the following YAML files:
    • k8s/charts/dq/charts/dq-agent/templates/dq-agent-statefulset.yaml
    • k8s/charts/dq/charts/dq-livy/templates/dq-livy-deployment.yaml
    • k8s/charts/dq/charts/dq-web/templates/dq-web-statefulset.yaml
    • k8s/charts/dq/charts/spark-history-server/templates/deployment.yaml
  • Configurable PBE encryption is now supported for Kubernetes and Standalone deployments of Collibra DQ.
  • When validating a finding and assigning it to a ServiceNow user or group, you can now reassign this finding from ServiceNow to a Collibra DQ user. (ticket #122476, 133165)

Pushdown

  • When changes are made to the schema based on the source query, the dataset_schema table now contains any changes made.

Fixes

Capabilities

  • When creating a rule on a Pullup dataset that uses “in” or “rlike” where the condition ends on wrapped parentheses (ie. )) ), the string replace logic now works as expected. Previously, rules that contained rlike() or in() would throw exceptions when the job ran. (ticket #128759)
  • Important In order to keep rules as close to the original input as possible, some of the padding that was originally appended to certain characters has been removed.

  • When archiving rule breaks, break records now export to S3 buckets as expected. (ticket #125411)
  • When changing the business unit on the Metadata Bar or Dataset Manager, the updated business unit now replaces the old one in the Metastore instead of creating an additional entry. (ticket #127823, 130086)
  • When editing a dataset from Dataset Manager in the new UI, the Schema/Parent Folder and Table/File Name fields are no longer swapped. (ticket #129103)
  • When running a job against a BigQuery dataset where the underlying query uses “from” before the schema/table, Collibra DQ now parses the “from” correctly. (ticket #129798)
  • When attempting to create a Pullup job against a Trino schema whose name contains the escape character \, you can now view the table details in Explorer. (ticket #131065)
  • The configs dupelimit and dupelimitui now work as expected where dupelimit restricts the number of dupe findings stored in the Metastore and of those findings, dupelimitui restricts the number of dupes shown on the Dupes tab on the Findings page. (ticket #127748)
  • When using the Rule Builder in the new UI, freeform rules using RLIKE now successfully pass validation. (ticket #127779)
  • When using the Alert Builder in the new UI, you can now update alert batch names as expected. (ticket #129079)
  • When a scheduled job runs with zero rows and Parallel JDBC enabled, it no longer fails. (ticket #128385)
  • When using Kerberos-based authentication for PostgreSQL or CockroachDB data sources, jobs no longer time out before authenticating against KDC. (ticket #128540)

Platform

  • While configuring SSL after a fresh Standalone install of or upgrade to Collibra DQ version 2023.11, the DQ agent now starts as expected. (ticket #131078)
  • Note With this update, any system that has export SERVER_PORT set will impact the default value of -1. You need to comment out export SERVER_PORT=9000 as PORT=9000 #owl-web port NUMBER to define the port.

  • We fixed an issue that caused unexpected license name requests. (ticket #126731)

DQ Integration

  • When mapping columns from rules in Collibra DQ, column names are now parsed correctly in Collibra Data Intelligence Platform.
  • When a User Defined Rule is deactivated or deleted from a dataset in Collibra DQ, then the Rule and Score status in Collibra Data Intelligence Platformwill be set to “Suppressed” when the dataset runs again in Collibra DQ. (ticket #124414)

Pushdown

  • When running a Redshift Pushdown job to create a profile of a Redshift dataset, you can now view TopN Shapes for columns containing NULL values. (ticket # 129221)
  • When casting from one data type to another on a Redshift dataset, the job now runs successfully without throwing an exception message. (ticket #128718)
  • When creating a Pushdown job on a BigQuery dataset that contains a DATE column time slice, you can now create the job without receiving a command line error. (ticket #129283)
  • When using the SQL Query option on a Snowflake table with a mixed case name, you can now create the job without receiving an error. (ticket #126118)

DQ Cloud

  • When running a job on a Snowflake table containing a timestamp column, timestamps now translate correctly as date, time, and timestamp types. (ticket #120975)
  • Note While this fix was originally intended to be limited to DQ Cloud, timestamps also translate correctly on Standalone deployments.

  • When using the Dupes tab on the Findings page, dupe observations now display correctly when more dupes are discovered than the preview limit of 30. (ticket #127625)

Known Limitations

  • When configuring dupelimitui, values above 2000 result in some UI lag. Because of this, we recommend using dupelimitui values under 2000.

DQ Security

Release 2023.11

Release Information

  • Release date of Collibra Data Quality & Observability 2023.11: November 20, 2023
  • Publication dates:
    • Release notes: November 8, 2023
    • Documentation Center: November 13, 2023

Highlights

    Pushdown
    We're excited to announce that Pushdown for BigQuery is now generally available! Pushdown is an alternative compute method for running DQ jobs, where Collibra DQ submits all of the job's processing directly to a SQL data warehouse, such as BigQuery. When all of your data resides in BigQuery, Pushdown reduces the amount of data transfer, eliminates egress latency, and removes the Spark compute requirements of a DQ job.
    UI Redesign
    New installs of Collibra DQ come with REACT_MUI and UX_REACT_ON admin flags set to TRUE by default. Additionally, if a pre-existing install of Collibra DQ had these flags set to FALSE, they are now set to TRUE. While you can still modify these flags from the Admin Console Configuration Settings Application Config page, we recommend keeping them set to TRUE to allow for full feature functionality and an elevated user experience. See the Updated UI table below for more details on what's changed.
    Spark Version Update
    We’ve upgraded our out-of-the-box Apache Spark version from 3.2.0 to 3.4.1. We strongly encourage organizations on Standalone deployments of Collibra DQ to upgrade to the latest Spark package to utilize of the new features and address some of the major vulnerabilities with Spark 3.2 or earlier versions. Additionally, Collibra DQ support for Spark 2.x will be limited as of Collibra DQ 2024.01, as Spark 2.x has reached its end of life.
    Collibra AI
    We're delighted to announce that Collibra AI is now available for private beta testing! Collibra AI introduces automated SQL rule writing capabilities on the Rule Workbench and Dataset Overview that help you accelerate the discovery, curation, and visualization of your data. Contact your Collibra CSM for more details about participating in this exciting private beta.

Important 
Changes for Kubernetes Deployments
We've updated the Helm Chart name from owldq to dq. For Helm-based upgrades, point to the new Helm chart while maintaining the same release name. Please update your Helm install command by referring to the renamed parameters in the values.yaml file. It is also important to note that the pull secret has changed from owldq-pull-secret to dq-pull-secret.

Further, following deployment, your existing remote agent name will change. For example, if your agent name is owldq-owl-agent-collibra-dq, the new agent name will be dq-agent-collibra-dq. If your organization uses APIs for development, ensure that you upgrade AGENT name configurations in your environments.

Lastly, when you deploy using the new Helm Charts, new service (Ingress/Load Balancer) names are created. This changes the IP address of the service and requires you to reconfigure your Load Balancer with the new IP.

Please see the expandable sections below for more details about specific changes.

Note 
If your organization has a standalone deployment of Collibra DQ with SSL enabled for DQ Web, and both DQ Web and DQ Agent are on the same VM or server, we recommend upgrading directly to Collibra DQ 2023.11.3 patch version instead of 2023.11. For more information, see the Maintenance Updates section below.

Migration Updates

Important This section only applies if you are upgrading from a version older than Collibra DQ 2023.09 on Spark Standalone. If you have already followed these steps during a previous upgrade, you do not have to do this again.

We have migrated our code to a new repository for improved internal procedures and security. Because owl-env.sh jar files are now prepended with dq-* instead of owl-*, if you have automation procedures in place to upgrade Collibra DQ versions, you can use the RegEx replace regex=r"owl-.*-202.*-SPARK.*\.jar|dq-.*-202.*-SPARK.*\.jar" to update the jars.

Additionally, please note the following:

  • Standalone Upgrade Steps
  • When upgrading from a Collibra DQ version before 2023.09 to a Collibra DQ version 2023.09 or later on Spark Standalone, the upgrade steps have changed.

Liveness Probe Updates

  • Cloud Native
    • When a Kubernetes pod service becomes unstable, a new liveness probe automatically deletes the pod to ensure the DQ agent stays alive and running. No further action is necessary for Cloud Native deployments; this note is strictly for informational purposes only.
  • Standalone
    • Because the implementation of the liveness probe for Kubernetes required a change in the owlmanage.sh file for Standalone installations, you need to follow the steps below to upgrade a Standalone deployment.
        Important 
        If your organization has a Standalone installation of Collibra DQ, you must copy the latest owlmanage.sh to /opt/owl/bin directory, as the file has changed.

New Features

Pushdown

  • When running Profiling on Pushdown jobs, advanced level profiling is now an opt-in feature and does not run by default. Advanced Profile determines whether a string field contains various string numerics, calculates TopN, BottomN, and TopN Shapes, and detects the scale and precision of double fields. We've also included the Profile String Length setting in the Advanced Profile option on the Explorer Settings modal.

Enhancements

Capabilities

  • When using the Alert Builder, you can now create an alert for when a job run completes successfully. When you add an alert, you can now choose from two options:
    • Dataset Run Alerts let you set an alert for when a job run meets a certain condition.
    • Job Status lets you set an alert for when a job run completes.
  • You can now configure arbitrary users as part of root user groups for cloud native deployments of Collibra DQ on Openshift.
  • You can now use OAuth 2.0 to authenticate Trino connections.
  • We've moved the /v2/getprofiledeltasbyrunid API to the V3 Profile API GET /v3/profile/deltas.

Pushdown

  • You can now archive source break records from Redshift Pushdown jobs.

Integration

  • The Unassigned DQ Job domain now has the Parent Community name appended to it. Given the unique community name constraint when using more than one tenant or instance, this change allows for the unique naming of Business Units. The new Community structure is as follows:
    • Parent Community
      • Business Unit Community
        OR
      • Unassigned DQ Job + Parent Community
        • DQ Job Domain
          • DQ Job Asset
        • Rulebook (definitions)
          • DQ Rule Assets
        • Rulebook (scores)
          • DQ Metric Assets
  • When on a dataset-level page, such as Findings or Profile, you can now select "Reset Integration" from the Integration dropdown menu on the metadata bar to realign mappings that may have changed in Collibra Data Intelligence Platform. If you get an error message when your mappings are aligned, this action can reset the integration and allow you to proceed with Collibra DQ metadata ingestion.
  • When a dataset integration with Collibra Data Intelligence Platform is enabled, you can now view the Community, Sub-Community, and Domain hierarchy to which the integration is mapped in Collibra Data Intelligence Platform from the metadata bar on all dataset-level pages. This increases the transparency of where Assets are created in Collibra Data Intelligence Platform without needing to navigate away from Collibra DQ.
    • Additionally, you can click any of the breadcrumbs to open the Community, Sub-Community, or Domain in Collibra Data Intelligence Platform.
  • When running Pushdown jobs with integrations enabled, the results now integrate into Collibra Data Intelligence Platform successfully. Previously, you had to manually enable or disable the integration each time a Pushdown job ran.
  • After integrating a dataset from the Admin Console, Findings or Dataset Manager pages, you can now click the "Data Quality Job" link in the new View in DIC column to open its corresponding Asset page in Collibra Data Intelligence Platform.
  • The GET /dgcjson endpoint is now included in the main Integrations API as GET /dgc/integrations/getdgcjson. Previously, this was located under the "UI Internal" section of Swagger.
  • Tip When an integration error occurs, check that the Community, Domain, and Asset names in Collibra Data Intelligence Platform don't already exist from a previous integration.

DQ Cloud

  • Collibra Edge now reflects Collibra DQ's new default Spark version 3.4.1.

Fixes

Capabilities

  • When running a job to check for outliers where the lookback value is set to something other than the default 5, the minhistory now updates to the correct value in the metastore. (ticket #124063)

      Note Manual overrides of -dlminhist on the command line do not save in the metastore.

  • When using the new Collibra DQ UI, run date information now displays correctly. Previously, the job configuration of manual DQ job runs would override with hardcoded dates, which caused the upcoming scheduling date to only reflect the hardcoded date. (ticket #126345)
  • When changing the assignment for a dataset with the new UI turned on for an SSO instance of Collibra DQ, existing SAML assignments now load correctly on the Findings and Assignments pages. (ticket 128063)
  • When assigning users to the following roles, they can now access all appropriate Admin screens (ticket #128115):
    • ROLE_CONNECTION_MANAGER
    • ROLE_DATA_GOVERNANCE_MANAGER
    • ROLE_DATASET_MANAGER
    • ROLE_OWL_ROLE_MANAGER
    • ROLE_USER_MANAGER

Platform

  • When editing the Batch Name field of a Dataset Run Alert, the batch name distribution list no longer updates if the Batch Name field is empty or blank " ". (ticket #126763, 126796)
  • When reviewing the Outlier tab on the Findings page, outlier findings now expand correctly when you drill down into them. (ticket #126065)
  • When creating an alert, you can now enter the special characters ! # $ % & ' * + - / = ? ^ _ ` . { | } ~ in an email address for the alert recipient. (ticket #126763)

DQ Integration

  • When viewing user-defined or adaptive rules, Passing Fraction now reflects the points deducted from an individual rule, rather than the total rule score. Previously, the total breaks for the rule type of user-defined rules were used to generate the Passing Fraction, rather than the individual rule breaks. (tickets #124217, 127700)
  • When using the configuration wizard to map your single-tenant Collibra DQ environment, you can now link your integration to an existing community and create new communities as part of your integration. (tickets #122948, 123426, 126227, 127044, 128345)
  • When integrating a Collibra DQ dataset and setting up dimensions in Collibra Data Intelligence Platform, the columns now display correctly in Collibra Data Intelligence Platform after running the dataset in Collibra DQ. (ticket #126379)

Pushdown

  • When adding outliers to a Pushdown dataset and running a job, the outlier configurations now render properly. Previously, when editing a job, one or more of the outliers did not display as expected. (ticket #124736)

DQ Cloud

  • Fixed an issue that caused the Collibra DGC MDL Proxy to run out of memory under certain conditions.

DQ Security

Note 
We've removed all existing classic UI JS libraries and their references from the Updated UI to address and prevent any potential security vulnerabilities.

Updated UI

In addition to broader user interface and user experience enhancements, we've also added some impressive new features! The following table showcases some of the highlights.

Component Description Available in Classic
Metadata Bar

The Metadata Bar is a dataset anchor that simplifies the navigation to some of your most frequently used pages, such as Dataset Overview, Profile, Rules, and Findings. It also provides quick insight into your dataset, such as the number of active rules, the data source from which the dataset was created, and whether or not your job is scheduled to run automatically. When an integration is set up, the metabar also allows you to easily enable or disable dataset metadata integrations into Collibra Data Intelligence Platform.

You can access the Metadata Bar on any dataset-level page, including:

  • Findings
  • Profile
  • Dataset Rules
  • Alert Builder
  • Dataset Overview
No
Dataset Overview

Dataset Overview lets you query your dataset to discover key data points and insights, which you can convert to data quality rules entirely within the Dataset Overview modal. With the power to write SQL to query your dataset, you can accelerate the process of data discovery and reveal important insights in real time.

Dataset Overview also allows private beta participants to leverage Collibra AI to automatically write and troubleshoot SQL for faster rule writing and advanced exploration of your dataset. See the Collibra AI private beta documentation to learn more.

No
Explorer

The new workflow simplifies the process of creating a DQ job to run against a dataset. With just a few clicks, you can create a basic profile job in a matter of seconds instead of minutes. For a more advanced scan, the step-by-step guide walks you through the process, eliminating many of the more tedious elements of the classic Explorer.

Yes
Findings

The new Findings page will feel similar to the classic page, but with a few important changes:

  • The daily score, pulse view, row count, and pass/fail charts are now broken out into individual tabs for an improved display.
  • The dataset metadata that previously resided next to the chart views is now anchored to the top of the page within the Metadata Bar for quicker analysis of your dataset’s key data points.
  • The findings tabs of the various data quality dimensions now have enhanced readability and more clearly displayed actions.
Yes
Profile While many of the same column- and dataset-level insights are unchanged from the classic UI, the presentation of information is now modernized for a crisper experience. Yes
Rules

Dataset Rules lists all previously saved rules for a given dataset and provides an overview of their details, such as the definitions of SQL conditions, rule types, and whether or not rules pass validation checks. From here, you can access the Rule Workbench to create or edit a rule.

The Rule Workbench replaces the classic Rule Builder, fusing an elegant SQL command line interface with the preview and advanced setting capabilities you expect from a modern SQL builder. Like the Dataset Overview, you can also use Collibra AI generated SQL to write and troubleshoot rules on the Rule Workbench. See the Collibra AI private beta documentation to learn more.

We've also split Data Class and Template rules into their own pages to emphasize that they are independent of jobs and datasets and improve their overall organization.

No
Alerts The new Alert Builder gives you an at-a-glance overview of all alerts for a particular dataset and simplifies adding new alerts and editing existing ones. Yes
Dataset Manager

Dataset Manager provides a list of all datasets in your Collibra DQ environment, as well as a variety of management options, such as bulk actions, assigning datasets to data categories, and the ability to filter datasets by a variety of criteria.

Yes
Column Manager Column Manager is a detailed breakdown of all the columns in the datasets in your Collibra DQ environment that shows key data points like data type, various ratios, and the Pass/Fail status of a given column. You can also bulk apply rules, data classes, and sensitive labels to selected columns. No
Report Dashboards

The Reports section now has two new dashboards available in the updated UI:

  • Column Dimension is an overview to view the data quality dimensions of the columns of all or specific datasets. By filtering by business unit, dataset, column, and monthly periods, you can make more informed decisions and increase the value of your data. Column Dimension also provides insight into the total current dimension scores, dimension scores across time, and DQ scores (along with other metadata) for each column.
  • Dataset Dimension is an overview to view data quality dimensions with different filters to make more informed decisions and increase the value of your data. You can filter by specific datasets or all datasets in your Collibra DQ environment, by business unit, and by monthly periods.
No
Connections

The updated Connections page in the Admin Console does away with the connection tiles of the classic page in favor of a highly searchable and sortable paginated table format. The new page also features two tabs for Connections and the Drivers stored in your Collibra DQ environment.

Additionally, when you add or edit a connection, the connection template is now organized in three tabs for Connection Details, Driver Properties, and Connection Variables. With these sections now clearly delineated, the process of creating or updating a connection is now much cleaner.

Yes
Admin Console The Admin Console now lists each admin activity for better organization and simpler navigation than the tiles of the classic UI. Yes

Maintenance Updates

Explorer

  • When using the SQL compiler on the dataset overview for remote files, the Compile button is disabled because the execution of data files at the Spark layer is unsupported.
  • You cannot currently upload temp files from the new File Explorer page. This may be addressed in a future release.
  • The Formatted view tab on the File Explorer page only supports CSV files.
  • When creating a job, the Estimate Job step from the classic Explorer is no longer a required step. However, if incorrect parameters are set, the job may fail when you run it. If this is the case, return to the Sizing step and click Estimate next to Job Size before you Run the job.

Connections

  • When adding a driver, if you enter the name of a folder that does not exist, a permission issue prevents the creation of a new folder.
    • A workaround is to use an existing folder.

Admin

  • When adding another external assignment queue from the Assignment Queue page, if an external assignment is already configured, the Test Connection and Submit buttons are disabled for the new connection. Only one external assignment queue can be configured at the same time.
  • Due to security requirements, we've removed the ability for application administrators to add new local users from the User Management page in the Admin Console. All new users must use the Register link on the Collibra DQ sign in screen.
    • When auto-approve is not configured, admin users can still manually approve new user requests and add roles to the new user from the User Management page.

Profile

  • When adding a distribution rule from the Profile page of a dataset, the Combined and Individual options incorrectly have "OR" and "AND" after them.
  • When using the Profile page, Min Length and Max Length does not display the correct string length. This will be addressed in an upcoming release.

Rules

  • When creating a quick rule from the Data Preview tab of the Findings, Profile, or Rules pages, the Preview Limit and Run Time Limit do not honor the application default limits of 6 and 30, respectively. Instead, the Preview Limit and Run Time Limit are both incorrectly set to 0.
    • While this will be addressed in the January (2024.01) release, a workaround is to manually edit these fields from the Rule Workbench Settings modal.

Alerts

  • Batch email updates are not currently working in the beta UI. This will be addressed in the January (2024.01) release.
  • When editing the Batch Name of a job alert, there is a limitation that prevents you from editing the email address field associated with the batch alert.

Scorecards

  • When creating a new scorecard from the Page dropdown menu, because of a missing function, you cannot currently create a scorecard.
    • While a fix for this is planned for the September (2023.09) release, a workaround is to select the Create Scorecard workflow from the three dots menu instead.

Navigation

  • The Dataset Overview function on the Metadata Bar is not available for remote files.
  • The Dataset Overview modal throws errors for the following connection types:
    • BigQuery (Pushdown and Pullup)
    • Athena CDATA
    • Oracle
    • SAP HANA
  • The Dataset Overview function throws errors when you run SQL queries on datasets from S3 and BigQuery connections.

Maintenance Updates

2023.11.3

  • While configuring SSL after a fresh Standalone install or upgrade to Collibra DQ version 2023.11, the DQ Agent and Web now start as expected. (ticket #131078)
    • With this fix, the DQ Agent now uses port 9101 by default to expose the Health Check API.
  • Note Ensure you select the latest corresponding Helm Chart when taking a maintenance update for Cloud Native deployments.

2023.11.4

  • When synchronizing DQ rules without business units configured in Collibra DQ, you can now synchronize them to both root and sub-communities. (ticket #127044, 128138)