Release Notes
Disclaimer - Failure to upgrade to the most recent release of the Collibra Service may adversely impact the security, reliability, availability, integrity, performance or support (including Collibra’s ability to meet its service levels) of the Service. Collibra hereby disclaims all liability, express or implied, for any reduction in the security, reliability, availability, integrity, performance or support of the Service to the extent the foregoing would have been avoided had you allowed Collibra to implement the most current release of the Service when scheduled by Collibra. Further, to the extent your failure to upgrade the Service impacts the security, reliability, availability, integrity or performance of the Service for other customers or users of the Service, Collibra may suspend your access to the Service until you have upgraded to the most recent release.
- 2024.09 (upcoming)
- 2024.08 (latest)
- 2024.07
- 2024.06
- 2024.05
Release 2024.09
Release Information
- Expected Release date of Collibra Data Quality & Observability 2024.09: September 30, 2024
- Release notes publication date: September 5, 2024
Enhancements
Integration
- Users of the Quality tab in Collibra Data Intelligence Platform who do not have a Collibra Data Quality & Observability account can now view the 7-day history of data quality scores, allowing you to monitor the health of your data over time.
- When running a DQ Job with back run and an active integration, the results of the back run are now sent to Collibra Data Intelligence Platform where they are stored in the DQ services history table.
Pushdown
- You can now scan for shapes in Trino Pushdown Jobs.
Jobs
- You can now create DQ Jobs on SAP HANA tables that contain the special characters ::$/-;@#%^&*?!{}~+=
Note The special characters .() are not supported.
Findings
- When the dupelimit and dupelimiui limits on the Admin Limits page are both set to 30, the Findings page now limits the number of dupes findings marked on the Dupes tab to 30.
Alerts
- If your organization uses multiple web pods on a Cloud Native deployment of Collibra Data Quality & Observability, you now receive only one alert email when an alert condition is met.
APIs
- When dataset security is enabled on a tenant and a user whose roles meet the requirements of the Dataset Def API and has a their role assigned to the dataset, the API returns the expected results.
Platform
- We added support for FIPS-compliant algorithms.
Fixes
Integration
- When setting up an integration, the Connections step now has a search component and pagination to prevent table load failure when the tables per schema size exceeds browser memory.
- The dimension type of Adaptive Rules (NULL, EMPTY, MIN, and so on) now correctly maps to the type and sub-type from the DQ dimension table.
- The predicate of custom rules is again included in rule asset attributes.
Pushdown
- You can again download CSV and JSON files containing rule break records of DQ Jobs created on Pushdown connections.
- The rule name no longer appears in the data preview and downloaded rule breaks of rule findings for DQ Jobs created on Pushdown connections.
- DQ Jobs created on Pushdown connections where columns with shape values that contain $ or ‘ now run successfully. Previously, such Jobs failed with an unexpected exception message.
- When running DQ Jobs created on Pushdown connections that scan multiple columns for duplicate values, the data of the columns now appears under the correct column name.
- When a DQ Job created on Pushdown connection contains 0 rows of data and one or more rules are enabled, the rules are now included in the Job run and displayed on the Findings page.
Note For consistency with rule break downloads in Pullup mode, we plan to separate rule breaks by rule in a future release. As of this release, Pullup mode still includes the rule name and runId in rule break downloads
Jobs
- The Explorer connection tree now loads successfully when a schema contains tables that contain unsupported column types.
- Dataset Overview on the Explorer page can now process the
select *
part of the SQL statement if there is an unsupported column type. - We fixed an issue on the Jobs page where Collibra Data Quality & Observability was unable to retrieve the Yarn Job ID.
Findings
- After retraining a behavioral finding to pass a value for a blindspot, the score now correctly reflects the retrained scoring model.
Profile
- When adding a stat rule for distribution from the +Add Rule option on the Profile page, the computed boundaries of categorical variables in the distribution rule now display correctly.
Rules
- When rules on Pullup datasets time out, the rule output record now displays the out-of-memory (OOM) exception message on the Findings page.
- Run Result Preview on the Rule Workbench now works as expected for custom rules that use the simple rule template.
- When creating a rule for an existing DQ Job created on a Pushdown connection, Run Result Preview now runs without errors.
- You can now use rules that contain a $ (not stat rules) with profiling off for DQ Jobs created on Pushdown connections.
Admin Console
- Admin Limits now require values to be -1, 0, or positive numbers. An inline message appears below the Value field when a limit does not meet the allowed values.
Latest UI
- We improved the performance of the Scorecards page when there are many datasets to load.
- We reduced the number of backend calls on the Profile and Findings pages to improve the load time performance.
- You can now create, edit, and delete alerts for datasets with 0 rows. When you run a job on a dataset with 0 rows, the alerts function as expected.
- Dataset Manager no longer crashes due to slow network calls when you click the Filter icon as the page loads.
- We resolved multiple scenarios where the metadata bar did not display any rows or columns.
- When hovering over a row on the Scheduler page, the row has a gray highlight, but the days of week cells remain green or white, depending on whether a DQ Job is scheduled to run on a given day.
- The date picker on the Job tab of the Findings page is now available for DQ Jobs created on Pushdown connections and you can successfully run DQ Jobs with the dates you select.
- The Findings and other pages now load correctly in Safari browsers.
- DQ Jobs created on Pushdown connections no longer generate duplicate user-defined job failure alerts.
- The donut charts in the database and schema reports from Explorer now consistently display the correct stats.
Release 2024.08
Release Information
- Release date of Collibra Data Quality & Observability 2024.08: August 26, 2024
- Publication dates:
- Release notes: August 5, 2024
- Documentation Center: August 9, 2024
Enhancements
Integration
- When a job with an active integration with Collibra Data Intelligence Platform runs, the Job Log on the Jobs page in Collibra Data Quality & Observability reflects the details of each step of the integration.
Pushdown
- You can now scan for both fuzzy and exact match duplicate records in Trino Pushdown jobs.
- All Pushdown-compatible data sources now support the use of temporal datasets in stat rule statements, for example,
“SELECT @t1.$rowcount AS yesterday, @dataset.$rowcount AS today WHERE yesterday <> today”
Connections
- You can now create BigQuery Pullup jobs on a cross-project connection without manually updating the command line to prepend the projectId in the source query.
- NetApp now has a dedicated connection tile under the Remote File Connections tab of the Add New Connection modal. Previously, to connect to a NetApp data source, you had to follow the Amazon S3 path and add the NetApp connection properties to the Properties tab.
- You can now archive breaking records to NetApp locations.
Jobs
- You can now execute command line queries that end with double quotes around the table name, for example,
select * from "<SCHEMA>"."<TABLE>"
Profile
- When you hover your cursor over the histogram on a dataset profile page, the upper quartile, median, and lower quartile statistics now display.
Findings
- NULL values are now excluded from the calculation of duplicate values for Pullup jobs.
- The indicator representing the number of findings for a given layer are now in the upper right corner of the associated layer.
- The Actions button is now always visible at the far right side of the Adaptive Rules modal. Additionally, the Adaptive Rule types are now color-coded.
- The chips in the Observations column of the Records tab are now color-coded.
Rules
- The Copy Results and Download Results buttons from the Dataset Overview are now available on the Rule Workbench.
Alerts
- We cleaned up the mailTemplate.html file within the dq-webapp to improve user experience.
Dataset Overview
- The ability to preview data on the Dataset Overview now requires access to the connection upon which your job is based and at least one of the following roles:
- ROLE_VIEW_DATA
- ROLE_ADMIN
- ROLE_CONNECTION_MANAGER
Scorecards
- We shortened the height of the scorecard blocks to reduce the amount of time it takes to scroll the Scorecards page when multiple scorecard blocks are present.
Assignments
- You can now use the Date Range filter on the Assignments Queue to sort and define a range of dates in the Update Ts (timestamps) column.
Dataset Manager
- Admins can now bulk update the agent and Spark settings of Pullup datasets.
SQL Assistant for Data Quality
- You can now see details of the Vertex AI model in the ‘About’ modal in the upper right corner of your Collibra Data Quality & Observability instance.
APIs
- We aligned the role requirements for the Jobs and Alerts V3 APIs.
- When dataset security and DB connection security are disabled, users have full access to the Jobs and Alerts endpoints.
- When dataset security is enabled and DB connection security is disabled, users without a role assignment to a dataset cannot use the following endpoints referencing that dataset:
- GET v3/jobs/{dataset}/{rundate}/logs
- GET v3/jobs/{jobId}
- GET v3/jobs/{jobId}/waitForCompletion
- GET v3/jobs/{jobId}/logs
- GET v3/jobs/{jobId}/findings
- GET v3/jobs/{jobId}/breaks/shapes
- GET v3/jobs/{jobId}/breaks/rules
- GET v3/jobs/{jobId}/breaks/outliers
- GET v3/jobs/{jobId}/breaks/dupes
- GET v3/alerts/{dataset}
- GET v3/alerts/{dataset}/{alertname}
- GET v3/alerts/notifications
- DELETE v3/alerts/{dataset}/{alertname}
- When dataset security is enabled and DB connection security is disabled, users without a role assignment to a dataset can use the following endpoints:
- GET v3/jobs
- GET v3/alerts
- The Job and Alert APIs honor dataset security by preventing access to the alert or job details when the user making the request does not have role access to the related dataset.
- By design, the Job APIs only honor connection security related to job creation or execution actions.
- Currently, connection security is not enforced for the Alert APIs.
Fixes
Connections
- You can again edit datasets created on Remote File Connections.
Jobs
- When the Case Sensitive and Exact Match options are not selected in the Dupes layer, jobs that run in Pullup mode now scan for all case-insensitive fuzzy match duplicates.
- When using the Partial Scan option in the latest UI, you can now use the ‘select all’ checkbox option in the column header to select all columns when they contain unsupported data types, such as CLOB.
- After setting up a partial scan of an Oracle dataset in the latest UI, the job now runs without error.
- You can again run jobs on Oracle datasets where source-to-target mappings to Databricks connections are configured.
- Columns now display on the Profile page when the Profile activity fails.
- When adding a distribution rule on a column from the Profile page, the percentages are now correctly calculated based on the total number or rows.
- You can again edit jobs based on temp files in Standalone deployments of Collibra Data Quality & Observability.
Reports
- The Coverage Report now returns a maximum of 1 calendar year of the statistics of database-, schema-, and table-level jobs. If each level does not have an existing structure, the report returns a helpful error message.
Agent
- When you select an option from the Master Default dropdown menu on the Edit Agent dialog of the Agent Configuration page, the correct value now displays based on your selection.
Integration
- Schemas and tables now correctly map to Collibra Data Intelligence Platform assets when you automap them from the connection mapping step of the Integration Admin Console page
- The total row count now correctly displays in the Loaded Rows field on the asset page after the integration of a Pushdown dataset.
- The Run Job Again option is no longer visible on the View Monitoring modal of the At a glance sidebar for table assets of scheduled and non-scheduled jobs.
- The scheduler in Collibra Data Quality & Observability that previously monitored for triggers sent from Collibra Data Intelligence Platform to run a job in Collibra Data Quality & Observability is now disabled and no longer scans for these inputs.
Pushdown
- You can now see the histogram of BigQuery Pushdown jobs.
- Snowflake Pushdown jobs with rules that reference secondary datasets with identical column names no longer return exceptions.
- The Profile page now shows TopN and BottomN shape results for Snowflake Pushdown jobs.
- The timestamp portion of the Run ID is now supported in Pushdown jobs that are configured to run on a schedule.
- When Archive Break Records for Rules is enabled and the rules of a Pullup job includes at least one DATATYPECHECK rule, the Rules page now shows the correct statuses when the rules are copied to a new Pushdown job. Additionally, the DATATYPECHECK rules from the Pullup job do not copy to the Pushdown job, as DATATYPECHECK rules are not supported in Pushdown mode.
APIs
- When using the POST v3/rules endpoint to add an inactive rule (isActive option set to 0) to a dataset, the rule is now added to the dataset correctly.
Latest UI
- We improved the performance of the Explorer page so that schemas with many tables no longer lock the page in an unresponsive state while they load.
- The errors that occurred when compiling a dataset source query using a date variable in Explorer are now resolved.
- Dataset pages now load correctly when an invalid run ID is applied to a Pushdown job.
- When editing a DQ Job on a Remote File Connection, the Compile button is now disabled and includes a note instructing you to edit the query from the command line instead.
- The issues that prevented some users from editing certain datasets created from Remote File Connections are now resolved.
- You can again edit DQ Jobs created on Temp File connections.
- The Agent Master Default option on the Agent Configuration page now displays correctly.
- You can now deselect unsupported column types when performing a partial scan in Explorer.
- The row count in the job estimate now reflects the source query row count.
- When using Validate Source, JDBC source connections mapped to Databricks connections no longer return errors.
- We added labels to the Histogram portion of the Profile page that are available when you hover your cursor over the histogram.
- Percentages for Quick Rules (Distribution) on the Profile page now display correctly.
- Observations on the Record tab of the Findings page are now color-coded.
- We fixed an issue with the Rule Workbench where the rule body became uneditable after loading a data type rule.
- Unintended changes to out-of-the-box Sensitive Labels are now prevented.
- File names of exports from the Schedule page now include date and timestamps.
- The Coverage Report now returns connection-level metrics from the latest UI and APIs.
- We cleaned up typos on the Completeness Report page.
DQ Security
The following image shows a chart of Collibra DQ security vulnerabilities arranged by release version.
The following image shows a table of Collibra DQ security metrics arranged by release version.
Release 2024.07
Release Information
- Release date of Collibra Data Quality & Observability 2024.07: July 30, 2024
- Publication dates:
- Release notes: June 24, 2024
- Documentation Center: July 4, 2024
Highlights
Important
As of this release, the classic UI is no longer available.
Important
To improve the security of Collibra Data Quality & Observability, we removed the default keystore password from the installation packages in this release. In general, if your organization uses a SAML or SSL keystore, we recommend that you provide a custom keystore file. However, if your organization has a Standalone installation of Collibra Data Quality & Observability and plans to continue using the default keystore, please contact Support to receive the default password to allow you to install and successfully use Collibra Data Quality & Observability versions 2024.07 and newer. If your organization has a containerized (Cloud Native) installation of Collibra Data Quality & Observability, you can continue to leverage the default keystore file, as the latest Helm Charts have the default password set within the values.yaml file.
- Integration
- You can now select the data quality layers that will have corresponding assets created in Collibra Data Intelligence Platform either automatically upon a successful integration or only when a layer contains breaking records. By selecting individual layers instead of including all of them by default, this can help prevent an overwhelming number of assets from being created automatically.
Admins can configure this on the Integration Setup wizard of the Integrations page in the Admin Console. - Admins can also map file and database views from Collibra Data Quality & Observability to corresponding assets in Collibra Data Intelligence Platform Catalog. This allows for out-of-the-box relations to be created between their file- or view-based Collibra Data Quality & Observability datasets and the file table and database view assets (and their columns) in Collibra Data Intelligence Platform.
Note Collibra Data Intelligence Platform 2024.07 or newer is required for view support.
- Pushdown
- We're delighted to announce that Pushdown for SQL Server is now generally available!
Pushdown is an alternative compute method for running DQ Jobs, where Collibra Data Quality & Observability submits all of the job's processing directly to a SQL data warehouse. When all of your data resides in the SQL data warehouse, Pushdown reduces the amount of data transfer, eliminates egress latency, and removes the Spark compute requirements of a DQ Job.
Enhancements
Explorer
- From the Sizing step in Explorer, you can now change the agent of a Pullup job by clicking the Agent Details field and selecting from the available agents listed in the Agent Status modal.
- You can now use custom delimiters for remote files in the latest UI.
- When using the Mapping layer for remote file connections, you can now add
-srcmultiline
to the command line to remove empty rows from the source to target analysis. - We added a Schema tab to the Dataset Overview to display table schema details after clicking Run in the query box. (ticket #145266)
- When the number of rows in a dataset exceeds 999, you can now hover your cursor over the abbreviated number to view the precise number of rows. (ticket #125753)
Rules
- Break record previews for out-of-the-box Data Type rules now display under the Rules tab on the Findings page when present. (idea #DCC-I-2155)
- When a dataset is renamed from the Dataset Manager, any explicit references to the dataset name for all primary and secondary rules are updated. (idea #DCC-I-2624)
- You can now rename rules from the Actions dropdown menu on the Dataset Rules page. Updated rule names cannot match the name of an existing rule and must contain only alphanumeric characters without spaces.
Alerts
- We added a new Rule Status alert to allow you to track whether your rule condition is breaking, throwing an exception, or passing.
- When configuring Condition-type alerts, you now have the option to Add Rule Details, which includes the following details in the email alert (idea #DCC-I-732):
- Rule name
- Rule condition
- Total number of points to deduct from the quality score when breaking
- Percentage of records that are breaking
- Number of breaking records
- Assignment Status
- Admins can now set a custom alert email signature from the Alerts page in the Admin Console. (idea #DCC-I-2400)
Profile
- Completeness percentage is now listed on the dataset Profile page in the latest UI.
Findings
- We added a Passing Records column to the rule break table under the Rules tab on the Findings page to show the number of records that passed a rule. This enhancement simplifies the calculation of the total number of records, both passing and breaking. Additionally, we renamed the Records column, Breaking Records. (idea #DCC-I-2223)
- You can now download CSV files and copy signed links of rule break records on secure S3 connections, including NetApp, MinIO, and Amazon S3.
Scorecards
- To improve page navigation, we added a dedicated Add Page button to the top right corner of the Scorecards page and moved the search field from the bottom of the scorecards dropdown menu to the top.
Reports
- When dataset security is enabled, the only datasets and their associated data that display on the Dataset Dimension and Column Dimension dashboards are the ones to which users have explicit access.
Jobs
- When exporting job logs, the names of the export files of job logs exported in bulk from the Jobs page begin with a timestamp representing when the file was downloaded. Additionally, the names of the export files of job logs of individual jobs begin with a timestamp representing the UpdateTimestamp and includes the first 25 characters of the dataset name.
Dataset Manager
- Admins can now update dataset hosts in bulk by selecting Bulk Manage Host from the Bulk Actions dropdown menu and specifying a new host URL.
Integration
- When datasets do not have an active Collibra Data Intelligence Platform integration, you can now enable dataset integrations in bulk from the Dataset Manager page by clicking the checkbox option next to the datasets you wish to integrate, then selecting Bulk Enable Integration from the Bulk Actions dropdown menu.
- Additionally, when datasets have an active integration, you can now submit multiple jobs to run by selecting Bulk Submit Integration Jobs from the Bulk Actions dropdown menu.
- We improved the score calculation logic of Data Quality Rule assets.
- When viewing Rule assets and assets of data quality layers in Collibra Data Intelligence Platform, the Rule Status now displays either Passing, Breaking, Learning, or Suppressed. Previously, rules and data quality layers without any breaks displayed an Active status, but that is now listed as Passing. (ticket #137526)
Pushdown
- You can now scan for exact match duplicates in BigQuery Pushdown jobs.
- You can now scan for shapes and exact match duplicates in SAP HANA Pushdown jobs. Additionally, we’ve added the ability to archive duplicates and rules break records to the source SAP HANA database to allow you to easily identify and take action on data that requires remediation.
Note An enhancement to enable scanning for fuzzy match duplicates in BigQuery Pushdown jobs is planned for Collibra Data Quality & Observability 2024.10.
SQL Assistant for Data Quality
- We added AI_PLATFORM_PATH to the Application Configuration Settings to allow Collibra Data Quality & Observability users who do not have a Collibra Data Intelligence Platform integration to bypass the integration path when this flag is set to FALSE.
- When set to TRUE (default), code will hit the integration or public proxy layer endpoint.
- When set to FALSE, code will bypass the integration path.
Identity Management
- We removed the Add Mapping button from the AD Security Settings page in the Admin Console.
- When dataset security is enabled and a dataset does not have a previous successful job run, users without explicit access to it will not see it when they use the global search to look it up.
APIs
- You must have ROLE_ADMIN to update the host of one or many datasets using the PATCH /v3/datasetDefs/batch/host call. Any updates to the hosts of datasets are logged in the Dataset Audit Trail.
- By using the Dataset Definitions API, admins can now manage the following in bulk:
- Agent
- PATCH /v3/datasetDefs/batch/agent
- Host
- PATCH /v3/datasetDefs/batch/host
- Spark settings
- PATCH /v3/datasetDefs/batch/spark
- Agent
Platform
- To ensure security compliance for Collibra Data Quality & Observability deployments on Azure Kubernetes Service (AKS), we now support the ability to pass sensitive PostgreSQL Metastore credentials to the Helm application through the Kubernetes secret,
--set global.configMap.data.metastore_secret_name
. Further, you can also pass sensitive PostgreSQL Metastore credentials to the Helm application through the Azure Key Vault secret provider class object,--set global.vault.enabled=true --set global.vault.provider=akv --set global.vault.metastore_secret_name
.
Fixes
Connections
- You can now create a Snowflake JDBC connection with a connection URL containing either a double quote or curly bracket, or a URL encoded version of the same, for example, {"tenant":"foo","product":"bar","application":"baz"}. (ticket #148348)
Findings
- We fixed an issue which prevented sorting in the Records column for datasets with multiple rule outputs and different row count values. (ticket #142362)
- We resolved a misalignment of Outlier column values in the latest UI. (ticket #146163)
Reports
- The Column Dimension and Dataset Dimension reports now display an error message when users who do not have ROLE_ADMIN attempt to access them. (ticket #141397)
Integration
- We improved the error handling when retrieving RunId for datasets. (ticket #145040, 146724, 148232)
Latest UI
- When a user does not have any assigned roles, an empty chip no longer displays in the Roles column on the User Management page in the Admin Console.
- We removed the Enabled and Locked columns from the AD Mapping page in the Admin Console, because they do not apply to AD users.
- A warning message now appears when you attempt to load a temp file and the Temp File Upload option is disabled on the Security Settings page in the Admin Console.
- You can now sort the Records column of the Rules tab on the Findings page.
- Outlier findings now display in the correct column.
- The section of the application containing Scorecards, List View, Assignments, and Pulse View is now called Views.
- The links on the metadata bar now appear in a new order to better reflect the progression of how and when they are used.
- You can now change the agent of a dataset during the edit and clone process.
- If schemas or tables fail to load in Explorer, an error message now appears.
DQ Security
The following image shows a chart of Collibra DQ security vulnerabilities arranged by release version.
The following image shows a table of Collibra DQ security metrics arranged by release version.
Release 2024.06
Release Information
- Release date of Collibra Data Quality & Observability 2024.06: July 1, 2024
- Publication dates:
- Release notes: June 6, 2024
- Documentation Center: June 14, 2024
Highlights
Important
In the upcoming Collibra Data Quality & Observability 2024.07 (July 2024) release, the classic UI will no longer be available.
- Integration
- Users without a Collibra Data Quality & Observability license can now use the Quality tab on asset pages in the latest UI of Collibra Data Intelligence Platform. Before Collibra Data Quality & Observability 2024.06, unless you created data quality rules in Collibra Data Intelligence Platform using the Collibra Data Quality & Observability integration, the Quality tab would not populate and you could not aggregate data quality across any assets.
Note To enable this Collibra Data Intelligence Platform functionality, contact a Collibra Customer Success Manager or open a support ticket.
Additionally, we improved our security standards by adding support for OAuth 2.0 authentication when setting up an integration with Collibra Data Intelligence Platform.
Important
Default SAML and SSL keystore are not supported. If you use a SAML or SSL keystore to manage and store keys and certificates, you must provide your own keystore file for both. When using both a SAML and SSL keystore, you only need to provide a single keystore file.
Enhancements
Connections
- When configuring an Amazon S3 connection and setting it as an Archive Breaking Records location, you can now use Instance Profile to authenticate it.
- When setting up a MongoDB connection, you can now use Kerberos TGT Cache to authenticate it.
- You can now use EntraID Service Principal to authenticate Databricks connections.
- Trino Pushdown connections now support Access Token Manager authentication.
- We upgraded the Teradata driver to version 20.0.0.20.
Explorer
- Explorer now fetches a new authentication token after the previous token expires to ensure seamless connectivity to your data source when using Access Token Manager or Password Manager to authenticate Pullup or Pushdown connections.
Jobs
- When using DB connection security and DQ job security, we added the security setting Require Connection Access, which requires users with ROLE_OWL_CHECK to have access to the connection they intend to run jobs on. When DB connection security and DQ job security are enabled, but Require Connection Access is not, users with ROLE_OWL_CHECK can run jobs to which they have dataset access.
Findings
- When exporting Outlier break records containing large values that were previously represented with scientific notation, the file generated from the Export with Details option now exports the true format of these values to match the unshortened, raw source data.
APIs
- Admins and user managers can now leverage the POST /v2/deleteexternaluser call to remove external users.
- You can now add template and data class rules to a dataset with the POST /v3/rules/{dataset} call.
- When you add template and data class rules to a dataset, the templates and data classes must already exist.
- You can use the GET /v3/rules/{dataset} call to return all rules from a datase, then use the POST /v3/rules/{dataset} to add them to the dataset you specify. If you add these rules to a different dataset, you must update the dataset name in the POST call and any references of the dataset name in the rules.
Integration
- You can now map Collibra Data Quality & Observability connections containing database views to their corresponding database view assets in Collibra Data Intelligence Platform.
- The Details table on the Quality tab of asset pages is now keyboard navigable.
Pushdown
- You can now archive rule break records for Trino Pushdown connections.
- You can now download a CSV file from the Findings page containing break records of rule breaks in the Metastore.
Platform
- Collibra Data Quality & Observability now supports CentOS 9 and RedHat Enterprise Linux 9.
Latest UI
- All pages within the Collibra Data Quality & Observability application with a blue banner are now set to the latest UI by default. Upon upgrade to this version, any REACT application configuration settings from previous versions will be overridden. The following pages are now set to the latest UI by default:
- Login
- Registration
- Tenant Manager Login
- Explorer
- Profile
- Findings
- Rule Builder
- Admin Connections
Fixes
Connections
- We enhanced the -conf spark.driver.extraClassPath={driver jar} Spark config to allow you to run jobs against Sybase datasets that reference secondary Oracle datasets. (ticket #129397)
Explorer
- When using Temp Files in the latest UI, you can now load table entries. (ticket #144174)
- Encrypted data columns with the BYTES datatype are now deselected and disabled in the Select Columns step, and all data displays correctly in Dataset Overview. (ticket #137738)
- When mapping source to target, we fixed an issue with the data type comparison, which previously caused incorrect Column Order Passing results. (ticket #139814, 140349)
- Preview data for remote file connections now displays throughout the application as expected. (ticket #139582,142538, 143876)
- We aligned the /v3/getsqlresult and /v2/getlistdataschemapreviewdbtablebycols endpoints so that Google BigQuery jobs with large numbers of rows do not throw errors when they are queried in Dataset Overview. (ticket #140730, 140915, 141515)
Rules
- We fixed an issue where rules did not display break records because extra spaces were added around the parentheses.
- When the file path of the S3 bucket used for break record archival has a timestamp from a previous run, the second run with the same runId no longer fails with an “exception while inserting break records” error message. (ticket #145702)
- We fixed an issue which resulted in an exception message when “limit” was present in a column name included in the query, for example,
select column_limit_test from public.abc
. (ticket #138356)
Alerts
- Dataset-level alerts with multiple conditions no longer send multiple alerts when only one of the conditions is met. (ticket #144655, 146177)
Scheduling
- After scheduling a job to run monthly in the latest UI, the new job schedule now saves correctly. (ticket #143484)
Jobs
- We fixed an issue where only the first page of job logs with multiple pages sorted in ascending or descending order. (ticket #139876)
- The Update Ts (update timestamp) on the Dataset Manager and Jobs page now match after rerunning a job. (ticket #141511)
Agent
- We fixed an issue that caused the agent to fail upon start-up when the SSL keystore password was encrypted. (ticket #140899)
Integration
- We fixed an issue where renaming an integrated dataset, then re-integrating it, caused the integration to fail because an additional job asset was incorrectly added to the object table. (ticket #140286, 140667, 140936, 143281, 143697, 144857)
- After editing an integration with a custom dimension that was previously inserted into the dq_dimension Metastore table, you can now select the custom dimension from the dropdown menu of the Dimensions tab of the Integrations page of the Admin Console. (ticket #137450, 145377)
- The Quality tab is now available for standard assets irrespective of the language. Previously, Collibra Data Intelligence Platform instances in other languages, such as French, did not support the Quality tab. (ticket #140433)
- We fixed an issue where rules that reference a column with a name that partially matches another column, for example, "cell" and "cell_phone", were incorrectly mapped to both columns in Collibra Data Intelligence Platform. (ticket #84983)
- The integration URL sent to Collibra Data Intelligence Platform no longer references legacy Collibra Data Quality & Observability URLs. (ticket #139764)
Pushdown
- When the
SELECT
statement of rules created on Snowflake Pushdown datasets uses mixed casing (for example,Select
) instead of uppercasing, breaking records now generate in the rule break tables as expected. (ticket #143619, 147953) - We fixed an issue where the username and password credentials for authenticating Azure Blob Storage connections did not properly save in the Metastore, resulting in job failure at runtime. (ticket #131026, 138844, 140793, 142635,145201)
- When a rule includes an @ symbol in its query without referring to a dataset, for example,
select * from @dataset where column rlike ‘@’
, the rule now passes syntax validation and no longer returns an error. (ticket #139670)
APIs
- When dataset security is enabled, users cannot call GET /v3/datasetdef or POST /v3/datasetdef. (ticket #138684)
- When
-profoff
is added to the command line and the job executes,-datashapeoff
is no longer removed from the command line flags when-profoff
is removed later. (ticket #140424)
Identity Management
- Users who have dataset access but not connection access can no longer access any dataset Explorer pages. (ticket #138684)
Latest UI
- We resolved an error when creating jobs with Patterns and Outlier checks with custom column references.
- When editing Dupes, columns are no longer deselected when you select a new one.
- Scorecards now support text wrapping so that scorecards with long names fit within UI elements in the latest UI. Additionally, Scorecards now have a character limit of 60 and an error message will display if a scorecard name exceeds it. (ticket #139208)
- Long meta tag names that exceed the width of the column on the Dataset Manager page now have a tooltip to display the full name when you hover your cursor over them.
- We resolved errors modifying existing mapping settings.
- We resolved an error when saving the Data Class when the Column Type is Timestamp.
Limitations
Platform
- Due to a change to the datashapelimitui admin limit in Collibra Data Quality & Observability 2024.04, you might notice significant changes to the number of Shapes marked on the Shapes tab of the Findings page. While this will be fixed in Collibra Data Quality & Observability 2024.06, if you observe this issue in your Collibra Data Quality & Observability environment, a temporary workaround is to set the datashapelimit admin limit on the Admin Console > Admin Limits page to a significantly higher value, such as 1000. This will allow all Shapes findings to appear on the Shapes tab.
- When Archive Break Records is enabled for Azure Databricks Pushdown connections authenticated over EntraID, the data preview does not display column names correctly and shows 0 columns in the metadata bar. Therefore, Archive Break Records is not supported for Azure Databricks Pushdown connections that use EntraID authentication.
Integration
- With the latest enhancement to column mapping, you can now successfully map columns containing uppercase letters and special characters, but columns containing periods cannot be mapped.
DQ Security
Important A high vulnerability, CVE-2024-2961, was recently reported and is still under analysis by NVD. A fix is not available as of now. However, after investigating this vulnerability internally and confirming that we are impacted, we have removed the vulnerable character set, ISO-2022-CN-EXT, from our images so that it cannot be exploited using the iconv
function. Therefore, we are releasing Collibra Data Quality & Observability 2024.06 with this known CVE without an available fix, and we have confirmed that Collibra Data Quality & Observability 2024.06 is not vulnerable.
Additionally, a new vulnerability, CVE-2024-33599, was recently reported and is still under analysis by NVD. Name Service Cache Daemon (nscd) is a daemon that caches name service lookups, such as hostnames, user and group names, and other information obtained through services like DNS, NIS, and LDAP. Because nscd inherently relies on glibc to provide the necessary system calls, data structures, and functions required for its operation our scanning tool reported this CVE under glibc vulnerabilities. Since this vulnerability is only possible when ncsd is present and nscd is neither enabled nor available in our base image, we consider this vulnerability a false positive that cannot be exploited.
The following image shows a chart of Collibra DQ security vulnerabilities arranged by release version.
The following image shows a table of Collibra DQ security metrics arranged by release version.
Release 2024.05
Release Information
- Release date of Collibra Data Quality & Observability 2024.05: June 3, 2024
- Publication dates:
- Release notes: April 22, 2024
- Documentation Center: May 2, 2024
Highlights
Important
In the upcoming Collibra Data Quality & Observability 2024.07 (July 2024) release, the classic UI will no longer be available.
- Integration
- For a more comprehensive bi-directional integration of Collibra Data Quality & Observability and Collibra Data Intelligence Platform, you can now view data quality scores and run jobs from the Data Quality Jobs modal on asset pages. You can find this modal via the View Monitoring link located on the At a glance pane to the right of the Quality tab on asset pages.
This significant enhancement strengthens the connection between Collibra Data Quality & Observability and Collibra Data Intelligence Platform, allowing you to compare data quality relations seamlessly without leaving the asset page. Whether you are a data steward, data engineer, or another role in between, this enhanced integration breaks down barriers, empowering you with the ability to unlock data quality and observability insights directly within Collibra Data Intelligence Platform. - Note A fix for the issue that has prevented the use of the Quality tab on asset pages for users who do not have a Collibra Data Quality & Observability license is scheduled for the third quarter (Q3) of 2024.
- Pushdown
- Additionally, Pushdown for SAP HANA and Microsoft SQL Server are now available for beta testing. Contact a Collibra CSM or apply directly to participate in private beta testing for SAP HANA Pushdown.
-
Pushdown is an alternative compute method for running DQ Jobs, where Collibra Data Quality & Observability submits all of the job's processing directly to a SQL data warehouse. When all of your data resides in the SQL data warehouse, Pushdown reduces the amount of data transfer, eliminates egress latency, and removes the Spark compute requirements of a DQ Job.
We are delighted to announce that Pushdown is now generally available, for three new data sources, including:
- SQL Assistant for Data Quality
- SQL Assistant for Data Quality is now generally available! This exciting tool allows you to automate SQL rule writing and troubleshooting to help you accelerate the discovery, curation, and visualization of your data. By leveraging SQL Assistant for Data Quality powered by Collibra AI, beginner and advanced SQL users alike can quickly discover key data points and insights and then convert them into rules.
- Explorer
- Profile
- Findings
- Alert Builder
- Further, we've added the ability to create an AI prompt for the frequency distribution of all values within a column. From the Collibra AI dropdown menu, select Frequency Distribution in the Advanced section and specify a column for Collibra AI to create a frequency distribution query. (idea #DCC-I-2639)
Anywhere Dataset Overview is, so is SQL Assistant for Data Quality. This means you can unlock the power of Collibra AI from the following pages:
For the most robust SQL rule building experience, you can also find SQL Assistant for Data Quality when adding or editing a rule on the Rule Workbench page.
Enhancements
Capabilities
- We added a JSON tab to the Review step in Explorer and the Findings page to allow you to analyze, copy, and run the JSON payload of jobs.
- When exporting rule breaks with details, the column order in the .xlsx file now matches the arrangement in the Data Preview table on the Findings page. (idea #DCC-I-1656, DCC-I-2400)
- Specifically, the columns in the export file are organized from left to right, following the same sequence as in the Data Preview table. The columns are sorted with the following priority:
- Column names starting with numbers.
- Columns names starting with letters.
- Column names starting with special characters.
- Specifically, the columns in the export file are organized from left to right, following the same sequence as in the Data Preview table. The columns are sorted with the following priority:
- To improve user experience when using the Rules table on the Findings page, we’ve locked the column headers and added a horizontal scrollbar to the rule breaks sub-table.
- You can now configure your user account to receive email notifications for your assignments by clicking your user avatar in the upper right corner of the Collibra Data Quality & Observability application and selecting "Send me email notifications for my assignments" in the Notifications section.
- When a user who is not the dataset owner or has ROLE_ADMIN or ROLE_DATASET_MANAGER attempts to delete one or multiple datasets, they are prevented from a successful dataset deletion, and an error message displays to help inform them of the role requirements needed to delete datasets. (idea #DCC-I-1938)
- We added a new Attributes section with two filter options to the Dataset Manager. (idea #DCC-I-2155)
- The Rules Defined filter option displays the datasets in your environment that contain rules only (not alerts).
- The Alerts Defined filter option displays the datasets in your environment that contain alerts only (not rules).
- When both filter options are selected, datasets that contain both rules and alerts display.
- With this release, we made several additional enhancements to SQL Assistant for Data Quality:
- The Collibra AI dropdown menu now has improved organization. We split the available options into two sections:
- Basic: For standard rule generation and troubleshooting suggestions.
- Advanced: For targeted or otherwise more complex SQL operations.
- You can now click and drag your cursor to highlight and copy specific rows and columns of the results table, click column headers to sort or highlight the entire column, and access multiple results pages through new pagination.
- We improved the UI and error handling.
- The Collibra AI dropdown menu now has improved organization. We split the available options into two sections:
- You can now authenticate SQL Server connections using an Active Directory MSI client ID. This enhancement, available in both Java 11/Spark 3.4.1 and Java 8/Spark 3.2.2, better enables your team to follow Azure authentication best practices and InfoSec policies. For more information about configuration details, see the Authentication documentation for SQL Server.
- We added an automatic cleaner to clear the
alert_q
table of stale alerts marked asemail_sent = true
in the Metastore. - We removed the license key from job logs.
Platform
- By specifying additional projects (AdditionalProjects) in the Connection URL, we now support multiple GCP projects in Google BigQuery connections. You can now specify the project ID in the Connection URL (limited to 1 additional project ID). With this enhancement, you no longer need to append the project ID in the command line.
- When running Google BigQuery jobs via the /v3/jobs/run API, the dataDef updates with the correct
-lib
and-srclib
parameters, and the jobs run successfully. - The names of all out-of-the-box sensitive labels now begin with "OOTB_". This enhancement allows you to define your own sensitive labels with names that were previously reserved, such as PII, PHI, and CUI.
- We've updated or enhanced the following API endpoints:
- We’ve added job scheduling information to the dataset def to allow you to GET and POST this information along with the rest of the dataset definition.
- We’ve added the outlier weight configs to the dataset def.
- You can now use the GET /v3/datasetDefs/{dataset} API to return a dataset’s meta tags.
- We’ve restructured the JobScheduleDTO to make job scheduling more intuitive when using the /v3/datasetDefs/{dataset} API.
- connectiontype is now connectiontypes
- dataclass is now dataClasses
- datacategory no longer displays
- businessUnitIds is now businessUnitNames
- dataConceptIds is now dataCategoryNames
- sensitivityIds is now sensitivityLabels
- "limit": 0, = The maximum number of records returned
- "offset": 0, = The number of records that should be skipped from the beginning and can be used to return the next ’pages' or number of results after calling the API in sequence
Important If you upgrade to Collibra Data Quality & Observability 2024.05 and then roll back to a previous version, you will receive a unique constraint conflict error, as the sensitive label enhancement required a change to the Metastore.
Method | Endpoint | Controller Name | Description |
---|---|---|---|
POST | /v3/rules/{dataset} | rule-api |
After using GET /v3/rules to return all rules in your environment, you can now use POST /v3/rules/{dataset} to migrate them to another environment. When settings are changed and you use POST /v3/rules/{dataset} again, those rules (with the same name) are updated. |
GET | /v3/datasetDefs/{dataset} | dataset-def-api |
We've made the following enhancements to the GET /v3/datasetDefs/{dataset} API: |
POST | /v3/datasetDefs/find | dataset-def-api |
We've updated the following parameter names for consistency with the latest Collibra DQ UI: Additionally, this API returns specific filtered arrays of datasetDefs. Parameter descriptions: |
POST | /v3/datasetDefs | dataset-def-api | You can now use the POST /v3/datasetDefs/{dataset} API to add meta tags to a dataset. |
DELETE | /v3/datasetDefs | dataset-def-api | When removing a dataset using the DELETE /v3/datasetdef API, you can now successfully rename another dataset to the name of the deleted dataset. |
POST | /v2/datasetDefs/migrate | controller-dataset | You can now add a dataset def to create a dataset record in the Dataset Manager without running the job or setting a job schedule. This is useful when migrating from a source environment to a target environment. |
GET | /v2/assignment-q/find-all-paging-datatables | controller-assignment-q | We’ve added an updateTimestampRange parameter to the GET /v2/assignment-q/find-all-paging-datatables API to allow for the filtering of assignments records based on timestamp updates. |
Integration
- We improved the connection mapping when configuring the integration by introducing pagination for tables, columns, and schemas.
- For improved security when sharing data between applications, we have temporarily removed the Score Details attribute from the Collibra Data Intelligence Platform integration and the JSON payload.
Pushdown
- When rule breaks are stored in the PostgreSQL Metastore with link IDs assigned, you can now download a CSV file containing the details of the rule breaks and link ID columns via the Findings page Rules tab Actions Rule Breaks modal.
- Additionally, the following Jobs APIs now return the source rule breaks file containing the SQL statement for breaks Pushdown jobs in JSON, CSV, or SQL:
- /v3/jobs/{dataset}/{runDate}/breaks/rules
- /v3/jobs/{dataset}/{runDate}/breaks/outliers
- /v3/jobs/{dataset}/{runDate}/breaks/dupes
- /v3/jobs/{dataset}/{runDate}/breaks/shapes
- /v3/jobs/{jobId}/breaks/rules
- /v3/jobs/{jobId}/breaks/outliers
- /v3/jobs/{jobId}/breaks/dupes
- /v3/jobs/{jobId}/breaks/shapes
- Additionally, the following Jobs APIs now return the source rule breaks file containing the SQL statement for breaks Pushdown jobs in JSON, CSV, or SQL:
Fixes
Capabilities
- The Dataset Overview, Findings, Profile, and Rules pages in the latest UI now correctly display the number of rows in your dataset. Previously, the rows displayed correctly in the job logs but did not appear on the aforementioned pages. (ticket #137230, 137979, 140203)
- When using remote file connections with Livy enabled in the latest UI, files with the same name load data content correctly. We fixed an issue where data from the first file persisted in the second file of the same name.
- We fixed an issue where renaming a dataset using the same characters with different casing returned a success message upon saving, but still reflected the old dataset name. For example, an existing dataset renamed "EXAMPLE_DATASET" from "example_dataset" now updates correctly. (ticket #139384)
- When creating jobs on S3 datasets based on data from CSV files with pipe delimited values, the delimiter no longer reverts from Pipe (|) to Comma (,) when you run the job. (ticket #132097)
- We fixed an issue with the Edit Schedule modal on the latest UI where both Enabled and Disabled displayed at once. (ticket #139207)
Platform
- We fixed an issue where the username and password credentials for authenticating Azure Blob Storage connections did not properly save in the Metastore, resulting in job failure at runtime. (ticket #131026, 138844, 140793, 142635,145201)
- When a rule includes an @ symbol in its query without referring to a dataset, for example,
select * from @dataset where column rlike ‘@’
, the rule now passes syntax validation and no longer returns an error. (ticket #139670)
Integration
- You can now map columns containing uppercase letters or special characters from Google BigQuery, Amazon Athena, Amazon Redshift, Snowflake, and PostgreSQL datasets created in Collibra Data Quality & Observability to column relations in Collibra Data Intelligence Platform. (ticket #133280)
- We fixed an issue where integrated datasets did not load correctly on the Dataset Manager page. Instead, a generic error message appeared on the Dataset Manager without loading any datasets. (ticket #136303, 140286)
- We fixed an issue where the dimension cards did not display when using the Quality tab on Column and Rule Asset pages. (ticket #122949)
Pushdown
- We updated some of the backend logic to allow the Archive Break Records option in the Connections modal to disable the Archive Break Records options on the Settings modal on the Explorer page. (ticket #137396)
- We added support for special characters in column names. (ticket #135383)
Latest UI
- We added upper and lower bound columns to the export with details file for Outliers.
- We fixed the ability to clear values in the Sizing step when manually updating job estimation fields during job creation.
- We improved the ability to update configuration settings for specific layers in the job creation process.
- We fixed intermittent errors when loading text and Parquet files in the job creation process.
- We added the correct values to the Day of Month dropdown menu in the Scheduler modal.
Limitations
Platform
- Due to a change to the datashapelimitui admin limit in Collibra Data Quality & Observability 2024.04, you might notice significant changes to the number of Shapes marked on the Shapes tab of the Findings page. While this will be fixed in Collibra Data Quality & Observability 2024.06, if you observe this issue in your Collibra Data Quality & Observability environment, a temporary workaround is to set the datashapelimit admin limit on the Admin Console > Admin Limits page to a significantly higher value, such as 1000. This will allow all Shapes findings to appear on the Shapes tab.
Integration
- With the latest enhancement to column mapping, you can now successfully map columns containing uppercase letters and special characters, but columns containing periods cannot be mapped.
DQ Security
Important A high vulnerability, CVE-2024-2961, was recently reported and is still under analysis by NVD. A fix is not available as of now. However, after investigating this vulnerability internally and confirming that we are impacted, we have removed the vulnerable character set, ISO-2022-CN-EXT, from our images so that it cannot be exploited using the iconv
function. Therefore, we are releasing Collibra Data Quality & Observability 2024.05 with this known CVE without an available fix, and we have confirmed that Collibra Data Quality & Observability 2024.05 is not vulnerable.
Additionally, a new vulnerability, CVE-2024-33599, was recently reported and is still under analysis by NVD. Name Service Cache Daemon (nscd) is a daemon that caches name service lookups, such as hostnames, user and group names, and other information obtained through services like DNS, NIS, and LDAP. Because nscd inherently relies on glibc to provide the necessary system calls, data structures, and functions required for its operation our scanning tool reported this CVE under glibc vulnerabilities. Since this vulnerability is only possible when ncsd is present and nscd is neither enabled nor available in our base image, we consider this vulnerability a false positive that cannot be exploited.
The following image shows a chart of Collibra DQ security vulnerabilities arranged by release version.
The following image shows a table of Collibra DQ security metrics arranged by release version.
Maintenance Updates
2024.05.1
- When editing an existing scheduled dataset and re-running it from Explorer, the job no longer fails with an "Invalid timeslot selected" error. (ticket #149549)
- Additionally, when using the GET /v3/datasetDefs/{dataset} call to return a dataset with a scheduled run, then update it with the POST /v3/datasetDefs call or modify the name of the dataset in the same POST call, you no longer need to manually remove the
"jobSchedule": {}
element and the API calls are successful.
- Additionally, when using the GET /v3/datasetDefs/{dataset} call to return a dataset with a scheduled run, then update it with the POST /v3/datasetDefs call or modify the name of the dataset in the same POST call, you no longer need to manually remove the