Release 2022.10

New Features

Warning For the Collibra Data Quality 2022.10 release, all Docker images run on JDK11. Standalone packages contain JDK8 and JDK11 options. If you are an existing customer who requires JDK11, please upgrade your runtime before upgrading to 2022.10. Most Hadoop environment versions (EMR/HDP/CDH) still run on JDK8, so customers using these environments can upgrade with the JDK8 packages. If you prefer to upgrade to JDK11, you must follow the documentation of your respective Hadoop environment to upgrade to JDK11 before deploying the 2022.10 release.

The MS SQL driver that comes with JDK11 standalone packages does not currently work in the JDK11 environment. MSSQL requires a separate JAR for JDK11. Please contact your Customer Success Manager for the compatible driver.

Dremio is not currently supported for JDK11 standalone packages. If you plan to run JDK11, add -Dcdjd.io.netty.tryReflectionSetAccessible=true to owlmanage.sh as a JVM option for your Web and Spark instances. Please contact your Customer Success Manager for assistance.

As of October 18, 2022, all images for the 2022.10 release have a Critical CVE (CVE-2022-42889). If you picked up the 2022.10 release before October 18, 2022, there should be no issue with your scans. If issues persist, please contact your Customer Success Manager for a new build.

Rules

You can now define a rule to detect the number of days a job runs without data by using $daysWithoutData.
You can now define a rule to detect the number of days a job runs with 0 rows by using $runsWithoutData.
You can now define a rule to detect the number of days since a job last ran by using $daysSinceLastRun.

Profile

You can now use a string length feature by toggling the Profile String Length checkbox when you create a data set.
- When Profile String Length is checked, the min/max length of a string column is saved to table dataset_field

Validate Source

You can now write rules against a loaded source data frame when -postclearcache is configured in the agent.

Note The DQ UI will be converted to the React MUI framework with the 2022.11 release. Prior to the 2022.11 release, you can turn the React flag on, but note that some features may be temporarily limited.

Enhancements

DQ Job

Start Time and Update Time are now based on the server time zone of the DQ Web App.

Scheduler

The Job Schedule page now has pagination.

Scorecards

From Pulse View, you can now view missing runs, runs with 0 rows, and runs with failed scores.

Admin/Catalog

Connection details are now masked when non-admin users attempt to view or modify database connection details from the Catalog page. Only users with role_admin or role_connection_manager have the ability to view connection details on this page. (ticket #94430)

API

The /v2/getRunIdDetailsByDataset endpoint now provides the following:
- The RunIDs for a given data set.
- All completed DQ Jobs for a given data set.

Snowflake Pushdown (beta)

You can now detect shapes that do not conform to a data field. Pushdown jobs scan all columns for shapes by default.
You can now view Histogram and Data Preview details for the Profile activity.

Connections

The Snowflake JDBC driver is now updated to 3.13.14.

Fixes

Rules

Fixed an issue with the Rule Validator that resulted in missing table errors. The Validator now correctly detects columns. (ticket #93430)

DQ Job

Fixed an issue that caused queries with joins to fail on the load activity when Full Profile Pushdown was enabled. Pushdown profiling now supports SQL joins. (ticket #92409)
Fixed an issue that caused jobs to fail at the load activity when using the CTE query. Please note that CTE support is currently limited to Postgres connections. (ticket #88287, 89150)
Fixed an issue that caused inconsistencies between the time zones represented in the Start Time and Update Time columns.

Agent

Fixed the loadBalancerSourceRanges for web and spark_history services in EKS environments. (ticket #95398)
- The helm property global.ingress.* has been removed to separate the config for web and spark_history. Please update the property as follows:__global.web.ingress.*``global.spark_history.ingress.*
Added support to specify the inbound CIDRs for the Ingress using the property .global.web.service.loadBalancerSourceRanges. (ticket #95398)
- Though Ingress is supported as part of Helm charts, we recommend attaching your own Ingress to the deployment if you need further customization.
- This requires a new Helm chart.
Fixed an issue that caused Livy file estimates to fail for GCS on K8s deployments.
Fixed an issue that caused jobs to fail for GCS on K8s deployments.

Validate Source

The Add Column Names feature is scheduled for removal with the upcoming 2022.11 release. (ticket #96066)
- This was a previous functionality before being able to limit the query directly (srcq) and Update Scope was added.
- Use the query to edit/limit columns and also use Update Scope.
Fixed an issue that caused the incorrect message to display for [VALUE_THRESHOLD] when validate source was specified for a matched case. (ticket #94435)

Dupes

The Advanced Filter is scheduled for removal from the Dupes page with the upcoming 2022.11 release. (ticket #96065)

Explorer

Fixed an issue that caused BigQuery connections to incorrectly update the library (-lib) path when a subset of columns was selected. (ticket #96768)

Scheduler

Fixed an issue that prevented the scheduler from running certain scheduled jobs in multi-tenancy setups. Email server information is now captured from the correct tenant. (ticket #92898)

Known Limitations

Rules

When a data set has 0 rows returned, stat rules applied to the data set are not executed. While a full fix is planned for a future release, this limitation is only partially fixed as of 2022.10.

DQ Job

CTE query support is currently limited to Postgres connections. DB2 and MSSQL are currently unsupported.

Catalog

When using the new bulk actions feature, updates to your job are not immediately visible in the UI. Once you apply a rule, run a DQ Job against that data set. From the Rules tab, a row with the newly applied rule is visible.

Snowflake Pushdown (beta)

Freeform (SQLF) rules cannot use a data set name but instead must use @dataset because Snowflake does not explicitly understand data set names.
When using the SQL Query workflow, selecting a subset of columns in your SQL query must be enclosed in double quotes to prevent the job from running infinitely and without failing.
Min/Max precision and scale are only calculated for double data types. All other data types are currently out of scope.

DQ Security Metrics

vulnerabilities over 5 months

critical vulnerabilties over 5 months