Release 2023.05
- Highlights
- New Features
- Enhancements
- Fixes
- Known Limitations
- DQ Security Metrics
- Maintenance Updates
Highlights
- We are excited to announce that Pushdown processing for Snowflake is now generally available! Pushdown is an alternative compute method for running DQ jobs, where Collibra DQ submits all of the job's processing directly to a SQL data warehouse, such as Snowflake. When all of your data resides in Snowflake, Pushdown reduces the amount of data transfer, eliminates egress latency, and removes the Spark compute requirement of a DQ Job.
- We are also excited to announce that Pushdown processing for Databricks is now available as in public preview! When you use Pushdown for Databricks ensure that PUSHDOWN_FOR_DATABRICKS is set to
TRUE
in the Application Configuration page of the Admin Console. - You can now use Swagger to leverage the new Integration API to send Collibra DQ rules, ML layer findings, and associated data quality scores to Collibra Platform. With data quality details in Collibra Platform, you can develop a more robust understanding of the health and performance of your organization's data.
New Features
Capabilities
- When you create or edit a connection with REACT_MUI off, you can now define variables from the Connections template to securely enter sensitive properties, such as user credentials.
- When you enable Include Links for recent job runs, the email alert now includes multiple contextual links to help you access the affected areas of the application.
Platform
- You can now use a serverless agent to submit jobs from the UI. This lets you have a Collibra DQ installation without a native DQ agent, and allows for extensive agent customization and lightweight deployment options against new compute engines. This is currently only supported for Dataproc.
Pushdown
- You can now set the maxconcurrentjobs config on the Admin Limits page to specify a maximum number of concurrent Pushdown jobs to run at once to prevent the Job service from overloading.
- You can now use the new /v3/rules/{dataset}/validatePushdown endpoint to validate the syntax of Pushdown rules.
DQ Cloud
- You can now edit and delete DQ Cloud connections.
- When you delete a connection all of the links to its agent are also deleted. When you create a new connection in DQ Cloud, ensure that in the connections template your Connection URL is unique and select an agent from the Target Agent dropdown menu. This step is required and you cannot change your agent after the connection is created.
Enhancements
Capabilities
- When you use a key comparison for source validation of decimal values that include scales of 0, you can now select a new Ignore Precision option from the Source configuration tab or set the flag,
-validatevaluesignoreprecision
, to true from the command line to ignore scales of 0 for integers before decimals and fractional values after decimals.Note
-validatevaluesignoreprecision
is only available for Validate Source when you include a key.
Platform
- The connection template is now updated to the correct Impala JDBC driver,
com.cloudera.impala.jdbc.Driver
- The Kafka Streaming Connection is no longer available from the Connections Management page because it has finished its preview cycle and has not been promoted to becoming generally available.
- The reactor-core package now supports Spark245 for successful Azure connections.
- When you configure AdaptiveRules, Data Type Check is now included in the Schema detection activity.
- When you select the Include Links option on the Alerts Configuration page, you must also enter your host address from the URL of your application into the HOST_NAME input field on the Application Configuration page. For example, if your Collibra DQ URL is http://dq.collibra.com, you need to enter http://dq.collibra.com into the HOST_NAME input field.
- The Include Links feature for alerts now has logging to show when the Include Links option is selected but HOST_NAME is not set.
- Informational text is now available on the Alerts Configuration page to help with the configuration of alerts.
- The Include Links feature for alerts now has logging to show when the Include Links option is selected but HOST_NAME is not set.
Pushdown
- When you run a Pushdown Job with Replay on, the start time of child Jobs in the Jobs queue now reflects their actual start times instead of the start time of the parent job from which the Replay initiated.
- The Outliers option of the Add Layers step is now disabled for Databricks Pushdown job configuration.
DQ Cloud
- DQ Cloud is now upgraded to the Collibra DQ 2023.05 version.
Fixes
Capabilities
- Fixed an issue with the email lookup query of password reset requests that prevented users from resetting passwords. (ticket #110075)
- Added a new admin limit flag,
valsrcdisableaq
, to fix a slow performance issue for record insertion into the assignment_q metastore table. (ticket #108547)valsrcdisableaq
lets you turn the connection between the assignmentQ feature and Source activity on or off. When you set it to true, you cannot invalidate or retrain any Source findings.
- Removed the scan button from Explorer
Connections because it was no longer supported from the UI. You can still profile multiple tables at the same time with the /v3/datasetDefs/ or /v2/run-catalog-scan-json/ endpoint. (ticket #107687)
- Fixed an issue where, when DQ jobs were created from a BigQuery source and a Hive source was added, the suffix ‘/core’ was appended to the end of the driver name of the secondary dataset driver path, which resulted in long run times and an unavailable agent. (ticket #109796, #111432)
- Fixed an issue with where loading a CSV file that use Custom delimiters on S3 remote file connections caused the file to hang in the load activity. (ticket #108890)
- Fixed an issue with an incorrect mismatch in the validate source results, when both the source and target had values of zero. (ticket #106788)
- Fixed an issue where attempts to invalidate findings in bulk hung on the "Updating Dataset" message because of an unsupported request method. (ticket #106826, 113881)
- Fixed a permission issue on SQL Server Kerberos-based connections for queries run in Collibra DQ on system tables in SQL Server. (ticket #110773)
Platform
- Fixed issues where external users could not successfully run jobs. (ticket #110773, 111988, 112417, 112857)
- Fixed an issue with Collibra DQ on AWS Cloud where Instance Profile authentication caused scheduled S3 jobs to fail. (ticket #110570, 110574)
- Addressed all the vulnerabilities of the optional drivers from Collibra DQ version 2023.03. You can now take the images with the optional drivers, as they now pass security scans. (ticket #112617)
DQ Cloud
- Fixed an issue where scheduled jobs did not run because they became stuck in "Staged" status. (ticket #112802)
Known Limitations
Capabilities
- A limitation with S3 remote file connections that use an escape character as a delimiter causes columns to parse incorrectly. For example: /a
- When you edit an existing connection and its details, another connection is created and any sensitive properties from the old connection do not persist in the new connection.
- A workaround is to recreate the sensitive properties from the old connection on the new connection.
- When you create or edit connections, the new sensitive properties feature is only available with React turned off.
- This only works for datasets created before 2023.05. Permalinks to specific Job runs in alert emails only work for datasets that run after the introduction of this feature in Collibra DQ version 2023.05. If a dataset was created before this implementation and runs on a schedule, the permalinks will not work.
- Native SQL is not currently supported on the Rules Workbench.
Platform
- If you are an external user of Collibra DQ versions 2023.03, 2023.04, and their patches, you may experience issues where completed jobs do not send alerts. If this occurs, set ALERT_SCHEDULE_ENABLED=false and then restart web and agent.
- If a license key value is damaged or deleted in a current app session you may experience job failures with license key errors.
- If you know that your deployment of Collibra DQ has a valid license key, you can attempt to refresh the page, sign out of the app and then sign back in. This will sync your license key with the metastore, which will allow you to once again run jobs without license key expiration job failures.
DQ Cloud
- Instance Profile is not supported for S3 connections.
- When using the Completeness Report, data only appears after upgrading to 2023.06 or later.
- When using the Findings page, you currently cannot drill into a rule break record. While there is no workaround for this limitation, a fix is planned for the 2023.06 release.
- When using the Findings page, you currently cannot tag job runs as off-peak. This will be fixed in the 2023.07 release.
Pushdown
- When scanning for dupes, fuzzy match duplicates display incorrectly as "NULL" values on the Findings page. This will be fixed in the 2023.06 release.
DQ Connector
- When navigating the Admin Console menu with React turned on, an "Integrations" option leads to a non-functional page. This will be fixed in the 2023.06 release.
DQ Security Metrics
The following image shows a chart of Collibra DQ security vulnerabilities arranged by release version.
The following image shows a table of Collibra DQ security metrics arranged by release version.
MUI Redesign
The following table shows the status of the MUI redesign of Collibra DQ pages as of this release. Because the status of these pages only reflects Collibra DQ's internal test environment and completed engineering work, pages marked as "Done" are not necessarily available externally. Full availability of the new MUI pages is planned for an upcoming release.
Page | Location | Status |
---|---|---|
Homepage | Homepage |
![]() |
Sidebar navigation | Sidebar navigation |
![]() |
User Profile | User Profile |
![]() |
List View | Views |
![]() |
Assignments | Views |
![]() |
Pulse View | Views |
![]() |
Catalog by Column (Column Manager) | Catalog (Column Manager) |
![]() |
Dataset Manager | Dataset Manager |
![]() |
Alert Definition | Alerts |
![]() |
Alert Notification | Alerts |
![]() |
View Alerts | Alerts |
![]() |
Jobs | Jobs |
![]() |
Jobs Schedule | Jobs Schedule |
![]() |
Rule Definitions | Rules |
![]() |
Rule Summary | Rules |
![]() |
Rule Templates | Rules |
![]() |
Rule Workbench | Rules |
In Progress |
Rule Builder | Rules |
In Progress |
Data Classes | Rules |
![]() |
Explorer | Explorer |
In Progress |
Reports | Reports |
In Progress |
Dataset Profile | Profile |
In Progress |
Dataset Findings | Findings |
![]() |
Sign-in Page | Sign-in Page |
![]() |
Maintenance Updates
2023.05.1
- Fixed the builds for the 2023.05 Collibra DQ version. Previously, jobs failed with an "Unknown" status during the Alerts activity and showed a "NoClassDefFoundError" in the job log. (ticket #115956, 116052)