Release 2023.09

Highlights
Migration Updates
Enhancements
Fixes
Known Limitations
DQ Security Metrics
Beta UI
Beta UI Limitations

Release Information

Expected release date of Collibra Data Quality & Observability 2023.09: October 8, 2023
Publication dates:
- Release notes: September 24, 2023
- Documentation Center: September 29, 2023

Highlights

Pushdown

Amazon Athena and Redshift

public betas

Collibra DQ

Athena

Redshift

Athena

Redshift

Job Estimator

Collibra DQ

job estimator

Important
We have migrated our code to a new repository. Consequently, Collibra DQ owl-env.sh jar files are no longer prepended with owl-*. Instead, they are now prepended with dq-*. For more details, it's crucial that you review the Migration Updates section below.

Migration Updates

We have migrated our code to a new repository for improved internal procedures and security. Because owl-env.sh jar files are now prepended with dq-* instead of owl-*, if you have automation procedures in place to upgrade Collibra DQ versions, you can use the RegEx replace regex=r"owl-.*-202.*-SPARK.*\.jar|dq-.*-202.*-SPARK.*\.jar" to update the jars.

Additionally, please note the following:

Standalone Upgrade Steps

Collibra DQ

Show steps

Open a terminal session.
Move the old jars from the owl/bin folder with the following commands.

Copy

mv owl-webapp-<oldversion>-<spark301>.jar /tmp
mv owl-agent-<oldversion>-<spark301>.jar /tmp
mv owl-core-<oldversion>-<spark301>.jar /tmp

Copy the new jars into the owl/bin folder from the extracted package.

Copy

mv dq-webapp-<newversion>-<spark301>.jar /home/owldq/owl/bin
mv dq-agent-<newversion>-<spark301>.jar /home/owldq/owl/bin
mv dq-core-<newversion>-<spark301>.jar /home/owldq/owl/bin

Copy the latest owlcheck and owlmanage.sh to /opt/owl/bin directory.

Tip You may also need to run chmod +x owlcheck owlmanage.sh to add execute permission to owlcheck and owlmanage.sh.

Start the Collibra DQ Web application.

Copy

./owlmanage.sh start=owlweb

Start the Collibra DQ Agent.

Copy

./owlmanage.sh start=owlagent

Validate the number of active services.

Copy

ps -ef | grep owl

Enhancements

Capabilities

When running rules that reference secondary datasets, you now have the option to use serial rule processing to reduce operational costs.
- Set -serialrule to true to leverage the Spark cache for the secondary dataset.
When authenticating your connection to CockroachDB with a PostgreSQL driver, you can now leverage Kerberos TGT without errors.
When creating a DQ job to run against a remote file data source, you can now select BEL as a delimiter.
When adding a name to a rule on the Rule Workbench, a helpful message displays if you use an invalid special character.
- Rule names can only contain alphanumerical characters, underscores, and hyphens.
When reviewing Rules findings, the default number of rows available to preview is now 6. Previously, the Rules tab only displayed 5 preview rows.
When creating a Pullup job from Explorer, the Mapping step now automatically maps source columns to target columns.
We've updated the connection icons on the Explorer, Pulse View, and Admin Connections pages.
- When you add a new connection from the Admin Connections page, the icon will also update accordingly.
When monitoring the Jobs page with React on, you can now right-click to open a dataset in a new tab.
When assigning or validating a finding to an external user whose first name, last name, and external user ID cannot be found or do not exist, you can now set a backup display name in the ConfigMap to ensure you can still validate or assign that finding to the external user.
- Set SAML_USE_EXTERNAL_USER_ID_FOR_DISPLAY to true.

Platform

When deleting a user, the user is now removed from both the user and user_profile metastore tables.
When loading a large remote file into Explorer, a progress bar now tracks its loading status.

DQ Integration

When using the configuration wizard in Collibra DQ to set up an integration, your Collibra Platform credentials are now encrypted in the metastore to ensure that your information is always secure.

DQ Cloud

We've introduced a new endpoint to retrieve aggregated WAL (write-ahead logs) stats.
When deploying a new Edge site, the TenantAlignmentService no longer stops checking for new tenants in DQ Cloud after 100 attempts.

Pushdown

When using Archive Break Records for Databricks Pushdown, the 'seqno' column for all break records tables created in Databricks is no longer designated as an identity column. Instead, its default value is now NULL. We've made this adjustment because Databricks does not support concurrent transactions for Delta tables with identity columns.
- If you already created these tables in your Databricks environment, you need to delete them. Subsequently, allow the Collibra DQ application to re-create these tables for you, ensuring compatibility with the latest changes. To do this, you can run the following SQL commands on your Databricks target schema dedicated to maintaining records of source breaks:
- After you run a DQ job, the tables will be re-created on your Databricks schema.
We’ve improved the memory usage to prevent large quantities of rule break records from causing out-of-memory errors.
When running a Pushdown job, the entire allocated connection pool is now used to extract the maximum allowed parallelism to allow profiling to run in parallel with other layers and reduce the latency of the job.
- Only the required number of connection threads are used for an activity.
When creating rules to run against Pushdown datasets, you can now use cross-join queries.
We've added a Pendo tracking event to track the number of Pushdown jobs and columns in an environment.

Fixes

Capabilities

When editing DQ jobs for KDB (PostgreSQL) connections, you can now successfully execute a query with a large number of records. (ticket #113493, #116740)
When creating a BigQuery job, you can now create a dataset for a destination table without throwing an error. (ticket #118534, #122761)
When archiving break records from Pullup jobs, you can again write break records to S3 storage buckets. Previously, an invalid rule error returned which stated "Exception while inserting break records into S3: No FileSystem for scheme s3". (ticket #121509)
When you open the Oversized Job Report, you can again see the reports without any errors. (ticket #121752)

Platform

When reviewing the configuration after running a Validate Source job, you no longer receive a validation error due to lost database, schema, table, field, and query information. (ticket #113977)
Oracle dataset host strings no longer parse incorrectly. Previously, Oracle dataset host strings were parsed as "jdbc" instead of displaying the correct host string. To see the updated and correct host string for Oracle datasets, rerun your jobs manually via the scheduler or API. (ticket #124846)

DQ Integration

When completing the connection mapping for your Collibra DQ to Collibra Platform integration, you now correctly see database views from Collibra DQ to the tables and columns to which they relate in Collibra Platform. (ticket #124191, #124213, #125676)

DQ Cloud

When upgrading to Collibra DQ version 2023.06, you can now see entries in your List View scorecards. Previously, there was a discrepancy between Edge and the Cloud metastore. (ticket #121624)

Pushdown

When running a Pushdown job with the /v3/jobs/run API, the username now correctly updates to the authenticated user. (ticket #121192)
When upgrading to Collibra DQ version 2023.07.2, you can now see the Data Preview for breaking record count for a freeform SQL rule against a Snowflake Pushdown dataset. (ticket #122585)

Known Limitations

Capabilities

There is a limitation with Validate Source where source columns containing white spaces do not map properly to the target columns.
- A workaround is to remove the white spaces from the command line and then copy/paste the command line into a new DQ job.
When using the Pulse View page after adding a new connection, there is a limitation where the icon of the connection does not automatically appear on the Pulse View page. Instead, it appears as a generic JDBC icon.

DQ Security Metrics

Note The medium, high, and critical vulnerabilities of the DQ Connector are now resolved.

Warning We found 1 critical and 1 high CVE in our JFrog scan. Upon investigation, these CVEs are disputed by Red Hat and no fix is available. For more information, see the official statements from Red Hat:
https://access.redhat.com/security/cve/cve-2023-0687 (Critical)
https://access.redhat.com/security/cve/cve-2023-27534 (High)

Beta UI

Beta UI Status

The following table shows the status of the Beta redesign of Collibra DQ pages as of this release.

Page	Location	Status
Homepage	Homepage	Done
Sidebar navigation	Sidebar navigation	Done
User Profile	User Profile	Done
List View	Views	Done
Assignments	Views	Done
Pulse View	Views	Done
Catalog by Column (Column Manager)	Catalog (Column Manager)	Done
Dataset Manager	Dataset Manager	Done
Alert Definition	Alerts	Done
Alert Notification	Alerts	Done
View Alerts	Alerts	Done
Jobs	Jobs	Done
Jobs Schedule	Jobs Schedule	Done
Rule Definitions	Rules	Done
Rule Summary	Rules	Done
Rule Templates	Rules	Done
Rule Workbench	Rules	Done
Data Classes	Rules	Done
Explorer	Explorer	Done
Reports	Reports	Done
Dataset Profile	Profile	Done
Dataset Findings	Findings	Done
Sign-in Page	Sign-in Page	Done

Note Admin pages are not yet fully available with the new Beta UI.

Beta UI Limitations

Explorer

When using the SQL compiler on the dataset overview for remote files, the Compile button is disabled because the execution of data files at the Spark layer is unsupported.
You cannot currently upload temp files from the new File Explorer page. This may be addressed in a future release.
The Formatted view tab on the File Explorer page only supports CSV files.

Connections

When adding a driver, if you enter the name of a folder that does not exist, a permission issue prevents the creation of a new folder.
- A workaround is to use an existing folder.

Admin

When adding another external assignment queue from the Assignment Queue page, if an external assignment is already configured, the Test Connection and Submit buttons are disabled for the new connection. Only one external assignment queue can be configured at the same time.

Profile

When adding a distribution rule from the Profile page of a dataset, the Combined and Individual options incorrectly have "OR" and "AND" after them.
When using the Profile page, Min Length and Max Length does not display the correct string length. This will be addressed in an upcoming release.

Scorecards

When creating a new scorecard from the Page dropdown menu, because of a missing function, you cannot currently create a scorecard.
- While a fix for this is planned for the September (2023.09) release, a workaround is to select the Create Scorecard workflow from the three dots menu instead.

Navigation

The Dataset Overview function on the Metadata Bar is not available for remote files.
The Dataset Overview modal throws errors for the following connection types:
- BigQuery (Pushdown and Pullup)
- Athena CDATA
- Oracle
- SAP HANA
The Dataset Overview function throws errors when you run SQL queries on datasets from S3 and BigQuery connections.