Release Notes

Important 

Disclaimer - Failure to upgrade to the most recent release of the Collibra Service may adversely impact the security, reliability, availability, integrity, performance or support (including Collibra’s ability to meet its service levels) of the Service. Collibra hereby disclaims all liability, express or implied, for any reduction in the security, reliability, availability, integrity, performance or support of the Service to the extent the foregoing would have been avoided had you allowed Collibra to implement the most current release of the Service when scheduled by Collibra. Further, to the extent your failure to upgrade the Service impacts the security, reliability, availability, integrity or performance of the Service for other customers or users of the Service, Collibra may suspend your access to the Service until you have upgraded to the most recent release.

Release 2024.07

Release Information

  • Expected release date of Collibra Data Quality & Observability 2024.07: July 29, 2024
  • Publication dates:
    • Release notes: June 24, 2024
    • Documentation Center: July 4, 2024

Highlights

Important 
As of this release, the classic UI is no longer available.

  • Integration
  • You can now select which data quality layers will have corresponding assets created in Collibra Data Intelligence Platform either automatically upon a successful integration or only when a layer contains breaking records. By selecting individual layers instead of including all of them by default, this can help prevent an overwhelming number of assets from being created automatically.

    Admins can configure this on the Integration Setup wizard of the Integrations page in the Admin Console.
  • Admins can also map file and database views from Collibra Data Quality & Observability to corresponding assets in Collibra Data Intelligence Platform Catalog. This allows for out-of-the-box relations to be created between their file- or view-based Collibra Data Quality & Observability datasets and the file table and database view assets (and their columns) in Collibra Data Intelligence Platform.

    Note Collibra Data Intelligence Platform 2024.07 or newer is required for view support.

  • Pushdown
  • We're delighted to announce that Pushdown for SQL Server is now generally available!

    Pushdown is an alternative compute method for running DQ Jobs, where Collibra Data Quality & Observability submits all of the job's processing directly to a SQL data warehouse. When all of your data resides in the SQL data warehouse, Pushdown reduces the amount of data transfer, eliminates egress latency, and removes the Spark compute requirements of a DQ Job.

Enhancements

Explorer

  • From the Sizing step in Explorer, you can now change the agent of a Pullup job by clicking the Agent Details field and selecting from the available agents listed in the Agent Status modal.
  • You can now use custom delimiters for remote files in the latest UI.
  • When using the Mapping layer for remote file connections, you can now add -srcmultiline to the command line to remove empty rows from the source to target analysis.
  • We added a Schema tab to the Dataset Overview to display table schema details after clicking Run in the query box. (ticket #145266)
  • When the number of rows in a dataset exceeds 999, you can now hover your cursor over the abbreviated number to view the precise number of rows. (ticket #125753)

Rules

  • Break record previews for out-of-the-box Data Type rules now display under the Rules tab on the Findings page when present. (idea #DCC-I-2155)
  • When a dataset is renamed from the Dataset Manager, any explicit references to the dataset name for all primary and secondary rules are updated. (idea #DCC-I-2624)
  • You can now rename rules from the Actions dropdown menu on the Dataset Rules page. Updated rule names cannot match the name of an existing rule and must contain only alphanumeric characters without spaces.

Alerts

  • We added a new Rule Status alert to allow you to track whether your rule condition is breaking, throwing an exception, or passing.
  • When configuring Condition-type alerts, you now have the option to Add Rule Details, which includes the following details in the email alert (idea #DCC-I-732):
    • Rule name
    • Rule condition
    • Total number of points to deduct from the quality score when breaking
    • Percentage of records that are breaking
    • Number of breaking records
    • Assignment Status
  • Admins can now set a custom alert email signature from the Alerts page in the Admin Console. (idea #DCC-I-2400)

Profile

  • Completeness percentage is now listed on the dataset Profile page in the latest UI.

Findings

  • We added a Passing Records column to the rule break table under the Rules tab on the Findings page to show the number of records that passed a rule. This enhancement simplifies the calculation of the total number of records, both passing and breaking. Additionally, we renamed the Records column, Breaking Records. (idea #DCC-I-2223)
  • You can now download CSV files and copy signed links of rule break records on secure S3 connections, including NetApp, MinIO, and Amazon S3.

Scorecards

  • To improve page navigation, we added a dedicated Add Page button to the top right corner of the Scorecards page and moved the search field from the bottom of the scorecards dropdown menu to the top.

Reports

  • When dataset security is enabled, the only datasets and their associated data that display on the Dataset Dimension and Column Dimension dashboards are the ones to which users have explicit access.

Jobs

  • When exporting job logs, the names of the export files of job logs exported in bulk from the Jobs page begin with a timestamp representing when the file was downloaded. Additionally, the names of the export files of job logs of individual jobs begin with a timestamp representing the UpdateTimestamp and includes the first 25 characters of the dataset name.

Dataset Manager

  • Admins can now update dataset hosts in bulk by selecting Bulk Manage Host from the Bulk Actions dropdown menu and specifying a new host URL.

Integration

  • When datasets do not have an active Collibra Data Intelligence Platform integration, you can now enable dataset integrations in bulk from the Dataset Manager page by clicking the checkbox option next to the datasets you wish to integrate, then selecting Bulk Enable Integration from the Bulk Actions dropdown menu.
    • Additionally, when datasets have an active integration, you can now submit multiple jobs to run by selecting Bulk Submit Integration Jobs from the Bulk Actions dropdown menu.
  • We improved the score calculation logic of Data Quality Rule assets.
  • When viewing Rule assets and assets of data quality layers in Collibra Data Intelligence Platform, the Rule Status now displays either Passing, Breaking, Learning, or Suppressed. Previously, rules and data quality layers without any breaks displayed an Active status, but that is now listed as Passing. (ticket #137526)

Pushdown

  • You can now scan for exact match duplicates in BigQuery Pushdown jobs.
  • Note An enhancement to enable scanning for fuzzy match duplicates in BigQuery Pushdown jobs is planned for Collibra Data Quality & Observability 2024.10.

  • You can now scan for shapes and exact match duplicates in SAP HANA Pushdown jobs. Additionally, we’ve added the ability to archive duplicates and shapes break records to the source SAP HANA database to allow you to easily identify and take action on data that requires remediation.
  • All Pushdown-compatible data sources now support the use of temporal datasets in stat rule statements, for example, SELECT @t1.$rowcount AS yesterday, @dataset.$rowcount AS today WHERE yesterday <> today

SQL Assistant for Data Quality

  • We added AI_PLATFORM_PATH to the Application Configuration Settings to allow Collibra Data Quality & Observability users who do not have a Collibra Data Intelligence Platform integration to bypass the integration path when this flag is set to FALSE.
    • When set to TRUE (default), code will hit the integration or public proxy layer endpoint.
    • When set to FALSE, code will bypass the integration path.

Identity Management

  • We removed the Add Mapping button from the AD Security Settings page in the Admin Console.
  • When dataset security is enabled and a dataset does not have a previous successful job run, users without explicit access to it will not see it when they use the global search to look it up.

APIs

  • You must have ROLE_ADMIN to update the host of one or many datasets using the PATCH /v3/datasetDefs/batch/host call. Any updates to the hosts of datasets are logged in the Dataset Audit Trail.
  • By using the Dataset Definitions API, admins can now manage the following in bulk:
    • Agent
      • PATCH /v3/datasetDefs/batch/agent
    • Host
      • PATCH /v3/datasetDefs/batch/host
    • Spark settings
      • PATCH /v3/datasetDefs/batch/spark

Platform

  • To ensure security compliance for Collibra Data Quality & Observability deployments on Azure Kubernetes Service (AKS), we now support the ability to pass sensitive PostgreSQL Metastore credentials to the Helm application through the Kubernetes secret, --set global.configMap.data.metastore_secret_name. Further, you can also pass sensitive PostgreSQL Metastore credentials to the Helm application through the Azure Key Vault secret provider class object, --set global.vault.enabled=true --set global.vault.provider=akv --set global.vault.metastore_secret_name.
  • Note If you require assistance, please contact a Collibra representative.

Fixes

Connections

  • You can now create a Snowflake JDBC connection with a connection URL containing either a double quote or curly bracket, or a URL encoded version of the same, for example, {"tenant":"foo","product":"bar","application":"baz"}. (ticket #148348)

Findings

  • We fixed an issue which prevented sorting in the Records column for datasets with multiple rule outputs and different row count values. (ticket #142362)
  • We resolved a misalignment of Outlier column values in the latest UI. (ticket #146163)

Reports

  • The Column Dimension and Dataset Dimension reports now display an error message when users who do not have ROLE_ADMIN attempt to access them. (ticket #141397)

Integration

  • We improved the error handling when retrieving RunId for datasets. (ticket #145040, 146724, 148232)

Release 2024.06

Release Information

  • Release date of Collibra Data Quality & Observability 2024.06: July 1, 2024
  • Publication dates:
    • Release notes: June 6, 2024
    • Documentation Center: June 14, 2024

Highlights

Important 
In the upcoming Collibra Data Quality & Observability 2024.07 (July 2024) release, the classic UI will no longer be available.

  • Integration
  • Users without a Collibra Data Quality & Observability license can now use the Quality tab on asset pages in the latest UI of Collibra Data Intelligence Platform. Before Collibra Data Quality & Observability 2024.06, unless you created data quality rules in Collibra Data Intelligence Platform using the Collibra Data Quality & Observability integration, the Quality tab would not populate and you could not aggregate data quality across any assets.

    Note To enable this Collibra Data Intelligence Platform functionality, contact a Collibra Customer Success Manager or open a support ticket.

    Additionally, we improved our security standards by adding support for OAuth 2.0 authentication when setting up an integration with Collibra Data Intelligence Platform.

Important 
Default SAML and SSL keystore are not supported. If you use a SAML or SSL keystore to manage and store keys and certificates, you must provide your own keystore file for both. When using both a SAML and SSL keystore, you only need to provide a single keystore file.

Enhancements

Connections

  • When configuring an Amazon S3 connection and setting it as an Archive Breaking Records location, you can now use Instance Profile to authenticate it.
  • When setting up a MongoDB connection, you can now use Kerberos TGT Cache to authenticate it.
  • You can now use EntraID Service Principal to authenticate Databricks connections.
  • Trino Pushdown connections now support Access Token Manager authentication.
  • We upgraded the Teradata driver to version 20.0.0.20.

Explorer

  • Explorer now fetches a new authentication token after the previous token expires to ensure seamless connectivity to your data source when using Access Token Manager or Password Manager to authenticate Pullup or Pushdown connections.

Jobs

  • When using DB connection security and DQ job security, we added the security setting Require Connection Access, which requires users with ROLE_OWL_CHECK to have access to the connection they intend to run jobs on. When DB connection security and DQ job security are enabled, but Require Connection Access is not, users with ROLE_OWL_CHECK can run jobs to which they have dataset access.

Findings

  • When exporting Outlier break records containing large values that were previously represented with scientific notation, the file generated from the Export with Details option now exports the true format of these values to match the unshortened, raw source data.

APIs

  • Admins and user managers can now leverage the POST /v2/deleteexternaluser call to remove external users.
  • You can now add template and data class rules to a dataset with the POST /v3/rules/{dataset} call.
    • When you add template and data class rules to a dataset, the templates and data classes must already exist.
    • You can use the GET /v3/rules/{dataset} call to return all rules from a datase, then use the POST /v3/rules/{dataset} to add them to the dataset you specify. If you add these rules to a different dataset, you must update the dataset name in the POST call and any references of the dataset name in the rules.

Integration

  • You can now map Collibra Data Quality & Observability connections containing database views to their corresponding database view assets in Collibra Data Intelligence Platform.
  • The Details table on the Quality tab of asset pages is now keyboard navigable.

Pushdown

  • You can now archive rule break records for Trino Pushdown connections.
  • You can now download a CSV file from the Findings page containing break records of rule breaks in the Metastore.

Platform

  • Collibra Data Quality & Observability now supports CentOS 9 and RedHat Enterprise Linux 9.

Latest UI

  • All pages within the Collibra Data Quality & Observability application with a blue banner are now set to the latest UI by default. Upon upgrade to this version, any REACT application configuration settings from previous versions will be overridden. The following pages are now set to the latest UI by default:
    • Login
    • Registration
    • Tenant Manager Login
    • Explorer
    • Profile
    • Findings
    • Rule Builder
    • Admin Connections

Fixes

Connections

  • We enhanced the -conf spark.driver.extraClassPath={driver jar} Spark config to allow you to run jobs against Sybase datasets that reference secondary Oracle datasets. (ticket #129397)

Explorer

  • When using Temp Files in the latest UI, you can now load table entries. (ticket #144174)
  • Encrypted data columns with the BYTES datatype are now deselected and disabled in the Select Columns step, and all data displays correctly in Dataset Overview. (ticket #137738)
  • When mapping source to target, we fixed an issue with the data type comparison, which previously caused incorrect Column Order Passing results. (ticket #139814, 140349)
  • Preview data for remote file connections now displays throughout the application as expected. (ticket #139582,142538, 143876)
  • We aligned the /v3/getsqlresult and /v2/getlistdataschemapreviewdbtablebycols endpoints so that Google BigQuery jobs with large numbers of rows do not throw errors when they are queried in Dataset Overview. (ticket #140730, 140915, 141515)

Rules

  • We fixed an issue where rules did not display break records because extra spaces were added around the parentheses.
  • When the file path of the S3 bucket used for break record archival has a timestamp from a previous run, the second run with the same runId no longer fails with an “exception while inserting break records” error message. (ticket #145702)
  • We fixed an issue which resulted in an exception message when “limit” was present in a column name included in the query, for example, select column_limit_test from public.abc. (ticket #138356)

Alerts

  • Dataset-level alerts with multiple conditions no longer send multiple alerts when only one of the conditions is met. (ticket #144655, 146177)

Scheduling

  • After scheduling a job to run monthly in the latest UI, the new job schedule now saves correctly. (ticket #143484)

Jobs

  • We fixed an issue where only the first page of job logs with multiple pages sorted in ascending or descending order. (ticket #139876)
  • The Update Ts (update timestamp) on the Dataset Manager and Jobs page now match after rerunning a job. (ticket #141511)

Agent

  • We fixed an issue that caused the agent to fail upon start-up when the SSL keystore password was encrypted. (ticket #140899)

Integration

  • We fixed an issue where renaming an integrated dataset, then re-integrating it, caused the integration to fail because an additional job asset was incorrectly added to the object table. (ticket #140286, 140667, 140936, 143281, 143697, 144857)
  • After editing an integration with a custom dimension that was previously inserted into the dq_dimension Metastore table, you can now select the custom dimension from the dropdown menu of the Dimensions tab of the Integrations page of the Admin Console. (ticket #137450, 145377)
  • The Quality tab is now available for standard assets irrespective of the language. Previously, Collibra Data Intelligence Platform instances in other languages, such as French, did not support the Quality tab. (ticket #140433)
  • We fixed an issue where rules that reference a column with a name that partially matches another column, for example, "cell" and "cell_phone", were incorrectly mapped to both columns in Collibra Data Intelligence Platform. (ticket #84983)
  • The integration URL sent to Collibra Data Intelligence Platform no longer references legacy Collibra Data Quality & Observability URLs. (ticket #139764)

Pushdown

  • When the SELECT statement of rules created on Snowflake Pushdown datasets uses mixed casing (for example, Select) instead of uppercasing, breaking records now generate in the rule break tables as expected. (ticket #143619, 147953)
  • We fixed an issue where the username and password credentials for authenticating Azure Blob Storage connections did not properly save in the Metastore, resulting in job failure at runtime. (ticket #131026, 138844, 140793, 142635,145201)
  • When a rule includes an @ symbol in its query without referring to a dataset, for example, select * from @dataset where column rlike ‘@’, the rule now passes syntax validation and no longer returns an error. (ticket #139670)

APIs

  • When dataset security is enabled, users cannot call GET /v3/datasetdef or POST /v3/datasetdef. (ticket #138684)
  • When -profoff is added to the command line and the job executes, -datashapeoff is no longer removed from the command line flags when -profoff is removed later. (ticket #140424)

Identity Management

  • Users who have dataset access but not connection access can no longer access any dataset Explorer pages. (ticket #138684)

Latest UI

  • We resolved an error when creating jobs with Patterns and Outlier checks with custom column references.
  • When editing Dupes, columns are no longer deselected when you select a new one.
  • Scorecards now support text wrapping so that scorecards with long names fit within UI elements in the latest UI. Additionally, Scorecards now have a character limit of 60 and an error message will display if a scorecard name exceeds it. (ticket #139208)
  • Long meta tag names that exceed the width of the column on the Dataset Manager page now have a tooltip to display the full name when you hover your cursor over them.
  • We resolved errors modifying existing mapping settings.
  • We resolved an error when saving the Data Class when the Column Type is Timestamp.

Limitations

Platform

  • Due to a change to the datashapelimitui admin limit in Collibra Data Quality & Observability 2024.04, you might notice significant changes to the number of Shapes marked on the Shapes tab of the Findings page. While this will be fixed in Collibra Data Quality & Observability 2024.06, if you observe this issue in your Collibra Data Quality & Observability environment, a temporary workaround is to set the datashapelimit admin limit on the Admin Console > Admin Limits page to a significantly higher value, such as 1000. This will allow all Shapes findings to appear on the Shapes tab.
  • When Archive Break Records is enabled for Azure Databricks Pushdown connections authenticated over EntraID, the data preview does not display column names correctly and shows 0 columns in the metadata bar. Therefore, Archive Break Records is not supported for Azure Databricks Pushdown connections that use EntraID authentication.

Integration

  • With the latest enhancement to column mapping, you can now successfully map columns containing uppercase letters and special characters, but columns containing periods cannot be mapped.

DQ Security

Important A high vulnerability, CVE-2024-2961, was recently reported and is still under analysis by NVD. A fix is not available as of now. However, after investigating this vulnerability internally and confirming that we are impacted, we have removed the vulnerable character set, ISO-2022-CN-EXT, from our images so that it cannot be exploited using the iconv function. Therefore, we are releasing Collibra Data Quality & Observability 2024.06 with this known CVE without an available fix, and we have confirmed that Collibra Data Quality & Observability 2024.06 is not vulnerable.

Additionally, a new vulnerability, CVE-2024-33599, was recently reported and is still under analysis by NVD. Name Service Cache Daemon (nscd) is a daemon that caches name service lookups, such as hostnames, user and group names, and other information obtained through services like DNS, NIS, and LDAP. Because nscd inherently relies on glibc to provide the necessary system calls, data structures, and functions required for its operation our scanning tool reported this CVE under glibc vulnerabilities. Since this vulnerability is only possible when ncsd is present and nscd is neither enabled nor available in our base image, we consider this vulnerability a false positive that cannot be exploited.

Maintenance Updates

2024.06.1

  • The following functions are now tenant-agnostic:
    • The Rule Definitions page now loads correctly in Cloud Native and Standalone deployments.
    • The Rule Definitions page only shows the rules from the tenant you are signed into.

Release 2024.05

Release Information

  • Release date of Collibra Data Quality & Observability 2024.05: June 3, 2024
  • Publication dates:
    • Release notes: April 22, 2024
    • Documentation Center: May 2, 2024

Highlights

Important 
In the upcoming Collibra Data Quality & Observability 2024.07 (July 2024) release, the classic UI will no longer be available.

  • Integration
  • For a more comprehensive bi-directional integration of Collibra Data Quality & Observability and Collibra Data Intelligence Platform, you can now view data quality scores and run jobs from the Data Quality Jobs modal on asset pages. You can find this modal via the View Monitoring link located on the At a glance pane to the right of the Quality tab on asset pages.

    This significant enhancement strengthens the connection between Collibra Data Quality & Observability and Collibra Data Intelligence Platform, allowing you to compare data quality relations seamlessly without leaving the asset page. Whether you are a data steward, data engineer, or another role in between, this enhanced integration breaks down barriers, empowering you with the ability to unlock data quality and observability insights directly within Collibra Data Intelligence Platform.
  • Note A fix for the issue that has prevented the use of the Quality tab on asset pages for users who do not have a Collibra Data Quality & Observability license is scheduled for the third quarter (Q3) of 2024.
  • Pushdown
  • We are delighted to announce that Pushdown is now generally available, for three new data sources, including:

  • Additionally, Pushdown for SAP HANA and Microsoft SQL Server are now available for beta testing. Contact a Collibra CSM or apply directly to participate in private beta testing for SAP HANA Pushdown.
  • Pushdown is an alternative compute method for running DQ Jobs, where Collibra Data Quality & Observability submits all of the job's processing directly to a SQL data warehouse. When all of your data resides in the SQL data warehouse, Pushdown reduces the amount of data transfer, eliminates egress latency, and removes the Spark compute requirements of a DQ Job.

  • SQL assistant for data quality
  • SQL assistant for data quality is now generally available! This exciting tool allows you to automate SQL rule writing and troubleshooting to help you accelerate the discovery, curation, and visualization of your data. By leveraging SQL assistant for data quality powered by Collibra AI, beginner and advanced SQL users alike can quickly discover key data points and insights and then convert them into rules.
  • Anywhere Dataset Overview is, so is SQL assistant for data quality. This means you can unlock the power of Collibra AI from the following pages:

    • Explorer
    • Profile
    • Findings
    • Alert Builder

    For the most robust SQL rule building experience, you can also find SQL assistant for data quality when adding or editing a rule on the Rule Workbench page.

  • Further, we've added the ability to create an AI prompt for the frequency distribution of all values within a column. From the Collibra AI dropdown menu, select Frequency Distribution in the Advanced section and specify a column for Collibra AI to create a frequency distribution query. (idea #DCC-I-2639)

Enhancements

Capabilities

  • We added a JSON tab to the Review step in Explorer and the Findings page to allow you to analyze, copy, and run the JSON payload of jobs.
  • When exporting rule breaks with details, the column order in the .xlsx file now matches the arrangement in the Data Preview table on the Findings page. (idea #DCC-I-1656, DCC-I-2400)
    • Specifically, the columns in the export file are organized from left to right, following the same sequence as in the Data Preview table. The columns are sorted with the following priority:
      1. Column names starting with numbers.
      2. Columns names starting with letters.
      3. Column names starting with special characters.
  • To improve user experience when using the Rules table on the Findings page, we’ve locked the column headers and added a horizontal scrollbar to the rule breaks sub-table.
  • You can now configure your user account to receive email notifications for your assignments by clicking your user avatar in the upper right corner of the Collibra Data Quality & Observability application and selecting "Send me email notifications for my assignments" in the Notifications section.
  • When a user who is not the dataset owner or has ROLE_ADMIN or ROLE_DATASET_MANAGER attempts to delete one or multiple datasets, they are prevented from a successful dataset deletion, and an error message displays to help inform them of the role requirements needed to delete datasets. (idea #DCC-I-1938)
  • We added a new Attributes section with two filter options to the Dataset Manager. (idea #DCC-I-2155)
    • The Rules Defined filter option displays the datasets in your environment that contain rules only (not alerts).
    • The Alerts Defined filter option displays the datasets in your environment that contain alerts only (not rules).
    • When both filter options are selected, datasets that contain both rules and alerts display.
  • With this release, we made several additional enhancements to SQL assistant for data quality:
    • The Collibra AI dropdown menu now has improved organization. We split the available options into two sections:
      • Basic: For standard rule generation and troubleshooting suggestions.
      • Advanced: For targeted or otherwise more complex SQL operations.
    • You can now click and drag your cursor to highlight and copy specific rows and columns of the results table, click column headers to sort or highlight the entire column, and access multiple results pages through new pagination.
    • We improved the UI and error handling.
  • You can now authenticate SQL Server connections using an Active Directory MSI client ID. This enhancement, available in both Java 11/Spark 3.4.1 and Java 8/Spark 3.2.2, better enables your team to follow Azure authentication best practices and InfoSec policies. For more information about configuration details, see the Authentication documentation for SQL Server.
  • We added an automatic cleaner to clear the alert_q table of stale alerts marked as email_sent = true in the Metastore.
  • We removed the license key from job logs.

Platform

  • By specifying additional projects (AdditionalProjects) in the Connection URL, we now support multiple GCP projects in Google BigQuery connections. You can now specify the project ID in the Connection URL (limited to 1 additional project ID). With this enhancement, you no longer need to append the project ID in the command line.
  • When running Google BigQuery jobs via the /v3/jobs/run API, the dataDef updates with the correct -lib and -srclib parameters, and the jobs run successfully.
  • The names of all out-of-the-box sensitive labels now begin with "OOTB_". This enhancement allows you to define your own sensitive labels with names that were previously reserved, such as PII, PHI, and CUI.
  • Important If you upgrade to Collibra Data Quality & Observability 2024.05 and then roll back to a previous version, you will receive a unique constraint conflict error, as the sensitive label enhancement required a change to the Metastore.

  • We've updated or enhanced the following API endpoints:
  • Method Endpoint Controller Name Description
    POST /v3/rules/{dataset} rule-api

    After using GET /v3/rules to return all rules in your environment, you can now use POST /v3/rules/{dataset} to migrate them to another environment.

    When settings are changed and you use POST /v3/rules/{dataset} again, those rules (with the same name) are updated.

    GET /v3/datasetDefs/{dataset} dataset-def-api

    We've made the following enhancements to the GET /v3/datasetDefs/{dataset} API:

    • We’ve added job scheduling information to the dataset def to allow you to GET and POST this information along with the rest of the dataset definition.
    • We’ve added the outlier weight configs to the dataset def.
    • You can now use the GET /v3/datasetDefs/{dataset} API to return a dataset’s meta tags.
    • We’ve restructured the JobScheduleDTO to make job scheduling more intuitive when using the /v3/datasetDefs/{dataset} API.
    POST /v3/datasetDefs/find dataset-def-api

    We've updated the following parameter names for consistency with the latest Collibra DQ UI:

    • connectiontype is now connectiontypes
    • dataclass is now dataClasses
    • datacategory no longer displays
    • businessUnitIds is now businessUnitNames
    • dataConceptIds is now dataCategoryNames
    • sensitivityIds is now sensitivityLabels

    Additionally, this API returns specific filtered arrays of datasetDefs.

    Parameter descriptions:

    • "limit": 0, = The maximum number of records returned
    • "offset": 0, = The number of records that should be skipped from the beginning and can be used to return the next ’pages' or number of results after calling the API in sequence
    POST /v3/datasetDefs dataset-def-api You can now use the POST /v3/datasetDefs/{dataset} API to add meta tags to a dataset.
    DELETE /v3/datasetDefs dataset-def-api When removing a dataset using the DELETE /v3/datasetdef API, you can now successfully rename another dataset to the name of the deleted dataset.
    POST /v2/datasetDefs/migrate controller-dataset You can now add a dataset def to create a dataset record in the Dataset Manager without running the job or setting a job schedule. This is useful when migrating from a source environment to a target environment.
    GET /v2/assignment-q/find-all-paging-datatables controller-assignment-q We’ve added an updateTimestampRange parameter to the GET /v2/assignment-q/find-all-paging-datatables API to allow for the filtering of assignments records based on timestamp updates.

Integration

  • We improved the connection mapping when configuring the integration by introducing pagination for tables, columns, and schemas.
  • For improved security when sharing data between applications, we have temporarily removed the Score Details attribute from the Collibra Data Intelligence Platform integration and the JSON payload.

Pushdown

  • When rule breaks are stored in the PostgreSQL Metastore with link IDs assigned, you can now download a CSV file containing the details of the rule breaks and link ID columns via the Findings page Rules tab Actions Rule Breaks modal.
    • Additionally, the following Jobs APIs now return the source rule breaks file containing the SQL statement for breaks Pushdown jobs in JSON, CSV, or SQL:
      • /v3/jobs/{dataset}/{runDate}/breaks/rules
      • /v3/jobs/{dataset}/{runDate}/breaks/outliers
      • /v3/jobs/{dataset}/{runDate}/breaks/dupes
      • /v3/jobs/{dataset}/{runDate}/breaks/shapes
      • /v3/jobs/{jobId}/breaks/rules
      • /v3/jobs/{jobId}/breaks/outliers
      • /v3/jobs/{jobId}/breaks/dupes
      • /v3/jobs/{jobId}/breaks/shapes

Fixes

Capabilities

  • The Dataset Overview, Findings, Profile, and Rules pages in the latest UI now correctly display the number of rows in your dataset. Previously, the rows displayed correctly in the job logs but did not appear on the aforementioned pages. (ticket #137230, 137979, 140203)
  • When using remote file connections with Livy enabled in the latest UI, files with the same name load data content correctly. We fixed an issue where data from the first file persisted in the second file of the same name.
  • We fixed an issue where renaming a dataset using the same characters with different casing returned a success message upon saving, but still reflected the old dataset name. For example, an existing dataset renamed "EXAMPLE_DATASET" from "example_dataset" now updates correctly. (ticket #139384)
  • When creating jobs on S3 datasets based on data from CSV files with pipe delimited values, the delimiter no longer reverts from Pipe (|) to Comma (,) when you run the job. (ticket #132097)
  • We fixed an issue with the Edit Schedule modal on the latest UI where both Enabled and Disabled displayed at once. (ticket #139207)

Platform

  • We fixed an issue where the username and password credentials for authenticating Azure Blob Storage connections did not properly save in the Metastore, resulting in job failure at runtime. (ticket #131026, 138844, 140793, 142635,145201)
  • When a rule includes an @ symbol in its query without referring to a dataset, for example, select * from @dataset where column rlike ‘@’, the rule now passes syntax validation and no longer returns an error. (ticket #139670)

Integration

  • You can now map columns containing uppercase letters or special characters from Google BigQuery, Amazon Athena, Amazon Redshift, Snowflake, and PostgreSQL datasets created in Collibra Data Quality & Observability to column relations in Collibra Data Intelligence Platform. (ticket #133280)
  • We fixed an issue where integrated datasets did not load correctly on the Dataset Manager page. Instead, a generic error message appeared on the Dataset Manager without loading any datasets. (ticket #136303, 140286)
  • We fixed an issue where the dimension cards did not display when using the Quality tab on Column and Rule Asset pages. (ticket #122949)

Pushdown

  • We updated some of the backend logic to allow the Archive Break Records option in the Connections modal to disable the Archive Break Records options on the Settings modal on the Explorer page. (ticket #137396)
  • We added support for special characters in column names. (ticket #135383)

Latest UI

  • We added upper and lower bound columns to the export with details file for Outliers.
  • We fixed the ability to clear values in the Sizing step when manually updating job estimation fields during job creation.
  • We improved the ability to update configuration settings for specific layers in the job creation process.
  • We fixed intermittent errors when loading text and Parquet files in the job creation process.
  • We added the correct values to the Day of Month dropdown menu in the Scheduler modal.

Limitations

Platform

  • Due to a change to the datashapelimitui admin limit in Collibra Data Quality & Observability 2024.04, you might notice significant changes to the number of Shapes marked on the Shapes tab of the Findings page. While this will be fixed in Collibra Data Quality & Observability 2024.06, if you observe this issue in your Collibra Data Quality & Observability environment, a temporary workaround is to set the datashapelimit admin limit on the Admin Console > Admin Limits page to a significantly higher value, such as 1000. This will allow all Shapes findings to appear on the Shapes tab.

Integration

  • With the latest enhancement to column mapping, you can now successfully map columns containing uppercase letters and special characters, but columns containing periods cannot be mapped.

DQ Security

Important A high vulnerability, CVE-2024-2961, was recently reported and is still under analysis by NVD. A fix is not available as of now. However, after investigating this vulnerability internally and confirming that we are impacted, we have removed the vulnerable character set, ISO-2022-CN-EXT, from our images so that it cannot be exploited using the iconv function. Therefore, we are releasing Collibra Data Quality & Observability 2024.05 with this known CVE without an available fix, and we have confirmed that Collibra Data Quality & Observability 2024.05 is not vulnerable.

Additionally, a new vulnerability, CVE-2024-33599, was recently reported and is still under analysis by NVD. Name Service Cache Daemon (nscd) is a daemon that caches name service lookups, such as hostnames, user and group names, and other information obtained through services like DNS, NIS, and LDAP. Because nscd inherently relies on glibc to provide the necessary system calls, data structures, and functions required for its operation our scanning tool reported this CVE under glibc vulnerabilities. Since this vulnerability is only possible when ncsd is present and nscd is neither enabled nor available in our base image, we consider this vulnerability a false positive that cannot be exploited.

Maintenance Updates

2024.05.1

  • When editing an existing scheduled dataset and re-running it from Explorer, the job no longer fails with an "Invalid timeslot selected" error. (ticket #149549)
    • Additionally, when using the GET /v3/datasetDefs/{dataset} call to return a dataset with a scheduled run, then update it with the POST /v3/datasetDefs call or modify the name of the dataset in the same POST call, you no longer need to manually remove the "jobSchedule": {} element and the API calls are successful.

Release 2024.04

Release Information

  • Release date of Collibra Data Quality & Observability 2024.04: April 29, 2024
  • Publication dates:
    • Release notes: April 4, 2024
    • Documentation Center: April 4, 2024

Enhancements

Capabilities

  • You can now authenticate SQL Server connections using Active Directory Password and Active Directory Service Principal. This enhancement, available in both Java 11/Spark 3.4.1 and Java 8/Spark 3.2.2, allows you to use Azure AD-based Synapse SQL and Azure Service Principal authentication in Collibra Data Quality & Observability, to further enable your team to follow Azure authentication best practices and InfoSec policies.
  • We improved some of the ways you can work with the Scorecards page:
    • The Page dropdown menu is now sorted alphabetically.
    • You can now rename scorecard pages without the need to recreate them from scratch.
    • You can now change the order in which scorecards are displayed on a scorecards page by clicking the up or down arrows to the left of a scorecard.
  • Archive Break Records for both Pushdown and Pullup jobs now parses linkId values in the Sample File Preview and Rule Breaks Preview on the Findings page and in the downloadable CSV file.
  • Running a job with a SQLF rule that references a secondary dataset or uses @t1 no longer flashes an extra record in the job log or on the Jobs page with an Agent Id of 0. With this enhancement, only a single job record displays on the Jobs page. (idea #DCC-I-2413)
  • When using the Dataset Overview, you can now click Download Results to download a CSV file containing the contents of the Results table.

Platform

  • We improved some of the ways you can create or edit alerts:
    • To enhance the fluidity of batch alerting from the Status Alert and Condition Alert modals, you can now select the new Batch option to display the Batch Name field, where you can search, select, or create alert batches.
    • With the Batch option selected, the Alert Recipient field locks. To edit this field, you can click the to unlock it. When the Batch option is not selected, the Alert Recipient field remains unlocked and editable.

Fixes

Capabilities

  • We improved the Data Shape Granular setting to include more string length values in Shapes findings. (ticket #135351)
  • When joining datasets created from two different data source connections with Kerberos keytab as the authentication type, we fixed an issue that prevented the secondary dataset from loading because its Kerberos authentication was incorrectly passed. (ticket #131309)
    • To ensure that the secondary dataset is passed correctly during Kerberos authentication, add -jdbcprinc and -jdbckeytab to the Agent configuration's Free Form (Appended) section. For example, -jdbckeytab /tmp/keytab/dq-user.keytab -jdbcprinc [email protected]
  • We added the ability to export all job schedule records from the Jobs Schedule page, using the Export All option. Previously, the ability to export job schedule records was limited to up to 20 records per page. (ticket #136970)
  • We added the following enhancements to the Findings page when data quality findings exceed certain thresholds. (ticket #136922)
    • When there are more than 9,999 findings of any data quality layer, the value displayed in the badge on the corresponding findings tab will round to the nearest thousand with a +. For example, 12,345 will display as 12K+.
    • When there are more than 999,000 findings of any data quality layer, the value displayed in the badge on the corresponding findings tab will always display as 1M+. For example, 1,234,567 will display as 1M+.
    • When a value is truncated, you can hover your cursor over the badge to display the exact number of findings.
  • When using SAML SSO to sign into multi-tenant Collibra Data Quality & Observability environments, the SAML Enabled Tenants dropdown menu no longer shows the Tenant Name. Instead, the dropdown menu now shows the Tenant Display Name. (ticket #137865)
  • We removed the runId from the “Findings - Most Recent Run” link in alert emails to correctly take you to the most recent run of your job when you click the link.
  • When the “Findings - Run which Produced the Alert” link in alert emails contains a runId and a timestamp in the URL, you will be taken to that specific job runId and timestamp when you click the link.

Platform

  • We introduced pagination to the Rule Summary page while limiting each page to displaying a maximum of 25 records. Previously, the Rule Summary page only displayed 25 records on a single page, even when the number of records should have exceeded 25. (ticket #140229)
  • When an admin sets a limit for the datashapelimitui setting from Admin Limits, the Findings page no longer displays Shapes findings beyond that limit. (ticket #129091)
  • The Role Management page in the latest UI now allows admins to set the access rights of roles and users when associating them with specific datasets (ticket #138489)
  • The Dataset Manager page now loads correctly when the alias name of a dataset is null. (ticket #133400)
  • When filtering by row count on the Dataset Manager page, the results included in the filter no longer include daily row counts that exceed the range you select. (ticket #133453)
  • JSON files from Amazon S3 connections no longer fail to load when Livy is not enabled.
  • When large datasets (for example, one with more than 100 million records) timeout with a 504 error response when loading the table in Explorer, an error message appears in the Explorer UI with details about the error. (ticket #133530)

Pushdown

  • Databricks Pushdown now supports ANSI-compliant SQL on the server side. (ticket #136562)
  • The out-of-the-box Data Category, Currency_CD, no longer counts the number of null and empty values as part of the underlying SQL query. (ticket #133578)

Latest UI

  • We added support for multiple pages of results on the Rule Summary page.
  • We improved the performance of the Pulse View when loading large amounts of data.
  • We updated the display of the Pulse View when scrolling.
  • The Command Line input field on the Findings page now supports vertical scrolling.
  • The Source tab on the Findings page now displays all labels for Cell results.
  • We updated the Job Schedule export function to include all jobs instead of limiting them to 20.
  • We added the ability to assign ACL to datasets from the Role Management page in the Admin Console.
  • We added Copy and Export buttons to the View AR modal on the Findings page for Behaviors.
  • We fixed the ability to change the upper and lower bound values for AdaptiveRules.
  • We now display the exact number of results on the tabs for each of the layers on the Findings page instead of "99+" for all values over 99.
  • The Rules tab now updates automatically when you add Quick Rules from the Data Preview section.
  • Success messages now appear on the Findings page when you use the Validate, Invalidate, and Resolve functions.

Limitations

Platform

  • Due to a change to the datashapelimitui admin limit in this release, you might notice significant changes to the number of Shapes marked on the Shapes tab of the Findings page. While this will be fixed in Collibra Data Quality & Observability 2024.06, if you observe this issue in your Collibra Data Quality & Observability environment, a temporary workaround is to set the datashapelimit admin limit on the Admin Console > Admin Limits page to a significantly higher value, such as 1000. This will allow all Shapes findings to appear on the Shapes tab.

DQ Security

Important A new high vulnerability, CVE-2024-2961, was recently reported and is still under analysis by NVD. A fix is not available as of now. However, after investigating this vulnerability internally and confirming that we are impacted, we have removed the vulnerable character set, ISO-2022-CN-EXT, from our images so that it cannot be exploited using the iconv function. Therefore, we are releasing Collibra Data Quality & Observability 2024.04 with this known CVE without an available fix, and we have confirmed that Collibra Data Quality & Observability 2024.04 is not vulnerable.

Release 2024.03

Release Information

  • Release date of Collibra Data Quality & Observability 2024.03: April 1, 2024
  • Publication dates:
    • Release notes: March 8, 2024
    • Documentation Center: March 11, 2024

Enhancements

Capabilities

  • Admins can now view monthly snapshots of the total number of active datasets in the new Datasets column on the Admin Console Usage page. Additionally, Columns statistics are no longer counted twice when you edit and re-run a dataset with different columns.
  • Admins can now optionally remove “Collibra Data Quality & Observability” from alert email subjects. When removing the Collibra Data Quality & Observability from the subject, you must fill in all the alert SMTP details to use the alert configuration checkboxes on the screen. By removing Collibra Data Quality & Observability from the alert subject, you can now set up your own services to automatically crawl Collibra DQ email alerts.
  • Dataset-level alerts, such as Job Completion, Job Failure, and Condition Alerts, as well as global-level alerts for both Pullup and Pushdown Job Failure now send incrementally from the auto-enabled alert queue.
  • Important As a result of this enhancement, following an upgrade to Collibra Data Quality & Observability 2024.03, any old, unsent alerts in the alert queue will be sent automatically. This is a one-time event and these alerts can be safely ignored.

    • We've also added new functionality where, when an alert fails to send, it will still be marked as email_sent = true in the alert_q Metastore table. However, no email alerts will be sent as a result of this. An enhancement to automatically clean the alert_q table of stale alerts marked as email_sent = true is scheduled for the upcoming Collibra Data Quality & Observability 2024.05 release.
  • We've optimized internal processing when querying Trino connections by passing the catalog name in the Connection URL. The catalog name can be set by creating a Trino connection from Admin Console Connections and adding ConnCatalog=$catalogName to the Connection URL.
  • We've added a generic placeholder in the Collibra DQ Helm Charts to allow you to bring in additional external mount volumes to the DQ Web pod. Additionally, when enabled in the DQ check logs and external mount volumes, the persistent volumes provisioned to set the storage class of the persistent volume claims for Collibra DQ now include a placeholder value to allow you to specify the storage class type. This provides an option to bring Azure vault secrets as external mount volumes into the DQ Web pod.

Platform

  • We now support Cloud Native Collibra DQ deployments on OpenShift Container Platform 4.x.
  • When using a proxy, SAML does not support URL-based metadata. It can only support file-based metadata. To ensure this works properly, set the property SAML_METADATA_USER_URL=false in the owl-env.sh file for Standalone deployments or DQ-web ConfigMap for Cloud Native.
  • We removed the Profile Report, as all of its information is also available on the Dataset Findings Report.

Fixes

Capabilities

  • We fixed an issue that caused Redshift datasets that referenced secondary Amazon S3 datasets to fail with a 403 error. (ticket #132975)
  • On the Profile page of a dataset, the number of TopN shapes now matches the total number of occurrences of such patterns. (ticket #133817)
  • When deleting and renaming datasets from the Dataset Manager, you can now rename a dataset using the name of a previously deleted one. (ticket #132799)
  • When renaming a dataset from the Dataset Manager and configuring it to run on a schedule, the scheduled job no longer combines the old and new names of the dataset when it runs. (ticket #132798)
  • On the Role Management page in the latest UI, roles must now pass validation to prevent ones that do not adhere to the naming requirements from being created. (ticket #133497)
  • When creating a Trino dataset with Pattern detection enabled, date type columns are no longer cast incorrectly as varchar type columns during the historical data load process. (ticket #132478)
  • On the Connections page in the latest UI, the input field now automatically populates when you click Assume Role for Amazon S3 connections. (ticket #132323, 132423)
  • On the Findings page, when reviewing the Outliers tab, the Conf column now has a tooltip to clarify the purpose of the confidence score. (ticket #129768)
  • When DQ Job Security is enabled and a user does not have ROLE_OWL_CHECK assigned to them, both Pullup and Pushdown jobs now show an error message “Failed to load job to the queue: DQ Job Security is enabled. You do not have the authority to run a DQ Job, you must be an Admin or have the role Role_Owl_Check and be mapped to this dataset.” (ticket #133623)
  • When creating a DQ job on a root or nested Amazon S3 folder without any files in it, the system now returns a more elegant error message. (ticket #134187)
  • On the Profile page in the latest UI, the WHERE query is now correctly formed when adding a valid values quick rule. (ticket #133455)

Platform

  • When viewing TopN values, the previously encrypted valid values now decrypt correctly. (ticket #131951)
  • When a user with ROLE_DATA_GOVERNANCE_MANAGER edits a dataset from the Dataset Manager, the Metatags field is the only field such a user can edit and have their updates pushed to the Metastore. (ticket #132468, 135889)
  • We fixed an issue with the /v3/datasetDefs/{dataset} and /v3/datasetDefs/{dataset}/cmdLine APIs that caused the -lib, -srclib, and -addlib parameters to revert in the command line. (ticket #131281)

Pushdown

  • When casting from one data type to another on a Redshift dataset, the job now runs successfully without returning an exception message. (ticket #128718)
  • When running a job with outliers enabled, we now parse the source query for new line characters to prevent carriage returns from causing the job to fail. (ticket #132322)
  • The character limit for varchar columns is now 256. Additionally, to prevent jobs from failing when a varchar column exceeds the 256-character limit. (ticket #131355)
  • BigQuery Pushdown jobs on huge datasets of more than 10GB no longer fail with a “Response too large” error. (ticket #134643, 135504)
  • When running a Pushdown job with archive break records enabled without a link ID assigned to a column, a helpful warning message now highlights the requirements for proper break record archival. (ticket #132545)
  • The Source Name parameter on the Connections template for Pushdown connections now persists to the /v2/getcatalogandconnsrcnamebydataset API call as intended. (ticket #132334)

Limitations

  • TopN values from jobs that ran before enabling encryption on the Collibra DQ instance are not decrypted. To decrypt TopN values after enabling encryption, re-run the job once encryption is enabled on your Collibra DQ instance.

DQ Security