Release Notes

Important 

Disclaimer - Failure to upgrade to the most recent release of the Collibra Service may adversely impact the security, reliability, availability, integrity, performance or support (including Collibra’s ability to meet its service levels) of the Service. Collibra hereby disclaims all liability, express or implied, for any reduction in the security, reliability, availability, integrity, performance or support of the Service to the extent the foregoing would have been avoided had you allowed Collibra to implement the most current release of the Service when scheduled by Collibra. Further, to the extent your failure to upgrade the Service impacts the security, reliability, availability, integrity or performance of the Service for other customers or users of the Service, Collibra may suspend your access to the Service until you have upgraded to the most recent release.

Release 2024.06

Release Information

  • Expected release date of Collibra Data Quality & Observability 2024.06: July 1, 2024
  • Publication dates:
    • Release notes: June 6, 2024
    • Documentation Center: June 14, 2024

Highlights

Important 
In the upcoming Collibra Data Quality & Observability 2024.07 (July 2024) release, the classic UI will no longer be available.

  • Integration
  • Users without a Collibra Data Quality & Observability license can now use the Quality tab on asset pages in the latest UI of Collibra Data Intelligence Platform. Before Collibra Data Quality & Observability 2024.06, unless you created data quality rules in Collibra Data Intelligence Platform using the Collibra Data Quality & Observability integration, the Quality tab would not populate and you could not aggregate data quality across any assets.

    Note To enable this functionality, contact a Collibra Customer Success Manager or open a support ticket.

Important 
Default SAML and SSL keystore are not supported. If you use a SAML or SSL keystore to manage and store keys and certificates, you must provide your own keystore file for both. When using both a SAML and SSL keystore, you only need to provide a single keystore file.

Enhancements

Connections

  • When configuring an Amazon S3 connection and setting it as an Archive Breaking Records location, you can now use Instance Profile to authenticate it.
  • When setting up a MongoDB connection, you can now use Kerberos TGT Cache to authenticate it.
  • You can now use EntraID Service Principal to authenticate Databricks connections.
  • Trino Pushdown connections now support Access Token Manager authentication.

Explorer

  • Explorer now fetches a new authentication token after the previous token expires to ensure seamless connectivity to your data source when using Access Token Manager or Password Manager to authenticate Pullup or Pushdown connections.

Jobs

  • When using DB connection security and DQ job security, we added the security setting Require Connection Access, which requires users with ROLE_OWL_CHECK to have access to the connection they intend to run jobs on. When DB connection security and DQ job security are enabled, but Require Connection Access is not, users with ROLE_OWL_CHECK can run jobs to which they have dataset access.

Findings

  • When exporting Outlier break records containing large values that were previously represented with scientific notation, the file generated from the Export with Details option now exports the true format of these values to match the unshortened, raw source data.

APIs

  • Admins and user managers can now leverage the POST /v2/deleteexternaluser call to remove external users.
  • You can now add template and data class rules to a dataset with the POST /v3/rules/{dataset} call.
    • When you add template and data class rules to a dataset, the templates and data classes must already exist.
    • You can use the GET /v3/rules/{dataset} call to return all rules from a datase, then use the POST /v3/rules/{dataset} to add them to the dataset you specify. If you add these rules to a different dataset, you must update the dataset name in the POST call and any references of the dataset name in the rules.

Integration

  • You can now map Collibra Data Quality & Observability connections containing database views to their corresponding database view assets in Collibra Data Intelligence Platform.
  • The Details table on the Quality tab of asset pages is now keyboard navigable.

Pushdown

  • You can now archive rule break records for Trino Pushdown connections.
  • You can now download a CSV file from the Findings page containing break records of rule breaks in the Metastore.

Latest UI

  • All pages within the Collibra Data Quality & Observability application with a blue banner are now set to the latest UI by default. Upon upgrade to this version, any REACT application configuration settings from previous versions will be overridden. The following pages are now set to the latest UI by default:
    • Login
    • Registration
    • Tenant Manager Login
    • Explorer
    • Profile
    • Findings
    • Rule Builder
    • Admin Connections

Fixes

Connections

  • We enhanced the -conf spark.driver.extraClassPath={driver jar} Spark config to allow you to run jobs against Sybase datasets that reference secondary Oracle datasets. (ticket #129397)

Explorer

  • When using Temp Files in the latest UI, you can now load table entries. (ticket #144174)
  • Encrypted data columns with the BYTES datatype are now deselected and disabled in the Select Columns step, and all data displays correctly in Dataset Overview. (ticket #137738)
  • When mapping source to target, we fixed an issue with the data type comparison, which previously caused incorrect Column Order Passing results. (ticket #139814, 140349)
  • Preview data for remote file connections now displays throughout the application as expected. (ticket #139582,142538, 143876)
  • We aligned the /v3/getsqlresult and /v2/getlistdataschemapreviewdbtablebycols endpoints so that Google BigQuery jobs with large numbers of rows do not throw errors when they are queried in Dataset Overview. (ticket #140730, 140915, 141515)

Rules

  • We fixed an issue where rules did not display break records because extra spaces were added around the parentheses.
  • When the file path of the S3 bucket used for break record archival has a timestamp from a previous run, the second run with the same runId no longer fails with an “exception while inserting break records” error message. (ticket #145702)
  • We fixed an issue which resulted in an exception message when “limit” was present in a column name included in the query, for example, select column_limit_test from public.abc. (ticket #138356)

Alerts

  • Dataset-level alerts with multiple conditions no longer send multiple alerts when only one of the conditions is met. (ticket #144655, 146177)

Scheduling

  • After scheduling a job to run monthly in the latest UI, the new job schedule now saves correctly. (ticket #143484)

Jobs

  • We fixed an issue where only the first page of job logs with multiple pages sorted in ascending or descending order. (ticket #139876)
  • The Update Ts (update timestamp) on the Dataset Manager and Jobs page now match after rerunning a job. (ticket #141511)

Agent

  • We fixed an issue that caused the agent to fail upon start-up when the SSL keystore password was encrypted. (ticket #140899)

Integration

  • We fixed an issue where renaming an integrated dataset, then re-integrating it, caused the integration to fail because an additional job asset was incorrectly added to the object table. (ticket #140286, 140667, 140936, 143281, 143697, 144857)
  • After editing an integration with a custom dimension that was previously inserted into the dq_dimension Metastore table, you can now select the custom dimension from the dropdown menu of the Dimensions tab of the Integrations page of the Admin Console. (ticket #137450, 145377)
  • The Quality tab is now available for standard assets irrespective of the language. Previously, Collibra Data Intelligence Platform instances in other languages, such as French, did not support the Quality tab. (ticket #140433)
  • We fixed an issue where rules that reference a column with a name that partially matches another column, for example, "cell" and "cell_phone", were incorrectly mapped to both columns in Collibra Data Intelligence Platform. (ticket #84983)
  • The integration URL sent to Collibra Data Intelligence Platform no longer references legacy Collibra Data Quality & Observability URLs. (ticket #139764)

Pushdown

  • When the SELECT statement of rules created on Snowflake Pushdown datasets uses mixed casing (for example, Select) instead of uppercasing, breaking records now generate in the rule break tables as expected. (ticket #143619, 147953)
  • We fixed an issue where the username and password credentials for authenticating Azure Blob Storage connections did not properly save in the Metastore, resulting in job failure at runtime. (ticket #131026, 138844, 140793, 142635,145201)
  • When a rule includes an @ symbol in its query without referring to a dataset, for example, select * from @dataset where column rlike ‘@’, the rule now passes syntax validation and no longer returns an error. (ticket #139670)

APIs

  • When dataset security is enabled, users cannot call GET /v3/datasetdef or POST /v3/datasetdef. (ticket #138684)
  • When -profoff is added to the command line and the job executes, -datashapeoff is no longer removed from the command line flags when -profoff is removed later. (ticket #140424)

Identity Management

  • Users who have dataset access but not connection access can no longer access any dataset Explorer pages. (ticket #138684)

Latest UI

  • We resolved an error when creating jobs with Patterns and Outlier checks with custom column references.
  • When editing Dupes, columns are no longer deselected when you select a new one.
  • Scorecards now support text wrapping so that scorecards with long names fit within UI elements in the latest UI. Additionally, Scorecards now have a character limit of 60 and an error message will display if a scorecard name exceeds it. (ticket #139208)
  • Long meta tag names that exceed the width of the column on the Dataset Manager page now have a tooltip to display the full name when you hover your cursor over them.
  • We resolved errors modifying existing mapping settings.
  • We resolved an error when saving the Data Class when the Column Type is Timestamp.

Limitations

Platform

  • Due to a change to the datashapelimitui admin limit in Collibra Data Quality & Observability 2024.04, you might notice significant changes to the number of Shapes marked on the Shapes tab of the Findings page. While this will be fixed in Collibra Data Quality & Observability 2024.06, if you observe this issue in your Collibra Data Quality & Observability environment, a temporary workaround is to set the datashapelimit admin limit on the Admin Console > Admin Limits page to a significantly higher value, such as 1000. This will allow all Shapes findings to appear on the Shapes tab.

Integration

  • With the latest enhancement to column mapping, you can now successfully map columns containing uppercase letters and special characters, but columns containing periods cannot be mapped.

DQ Security

Important A high vulnerability, CVE-2024-2961, was recently reported and is still under analysis by NVD. A fix is not available as of now. However, after investigating this vulnerability internally and confirming that we are impacted, we have removed the vulnerable character set, ISO-2022-CN-EXT, from our images so that it cannot be exploited using the iconv function. Therefore, we are releasing Collibra Data Quality & Observability 2024.06 with this known CVE without an available fix, and we have confirmed that Collibra Data Quality & Observability 2024.06 is not vulnerable.

Additionally, a new vulnerability, CVE-2024-33599, was recently reported and is still under analysis by NVD. Name Service Cache Daemon (nscd) is a daemon that caches name service lookups, such as hostnames, user and group names, and other information obtained through services like DNS, NIS, and LDAP. Because nscd inherently relies on glibc to provide the necessary system calls, data structures, and functions required for its operation our scanning tool reported this CVE under glibc vulnerabilities. Since this vulnerability is only possible when ncsd is present and nscd is neither enabled nor available in our base image, we consider this vulnerability a false positive that cannot be exploited.

Release 2024.05

Release Information

  • Release date of Collibra Data Quality & Observability 2024.05: June 3, 2024
  • Publication dates:
    • Release notes: April 22, 2024
    • Documentation Center: May 2, 2024

Highlights

Important 
In the upcoming Collibra Data Quality & Observability 2024.07 (July 2024) release, the classic UI will no longer be available.

  • Integration
  • For a more comprehensive bi-directional integration of Collibra Data Quality & Observability and Collibra Data Intelligence Platform, you can now view data quality scores and run jobs from the Data Quality Jobs modal on asset pages. You can find this modal via the View Monitoring link located on the At a glance pane to the right of the Quality tab on asset pages.

    This significant enhancement strengthens the connection between Collibra Data Quality & Observability and Collibra Data Intelligence Platform, allowing you to compare data quality relations seamlessly without leaving the asset page. Whether you are a data steward, data engineer, or another role in between, this enhanced integration breaks down barriers, empowering you with the ability to unlock data quality and observability insights directly within Collibra Data Intelligence Platform.
  • Note A fix for the issue that has prevented the use of the Quality tab on asset pages for users who do not have a Collibra Data Quality & Observability license is scheduled for the third quarter (Q3) of 2024.
  • Pushdown
  • We are delighted to announce that Pushdown is now generally available, for three new data sources, including:

  • Additionally, Pushdown for SAP HANA and Microsoft SQL Server are now available for beta testing. Contact a Collibra CSM or apply directly to participate in private beta testing for SAP HANA Pushdown.
  • Pushdown is an alternative compute method for running DQ Jobs, where Collibra Data Quality & Observability submits all of the job's processing directly to a SQL data warehouse. When all of your data resides in the SQL data warehouse, Pushdown reduces the amount of data transfer, eliminates egress latency, and removes the Spark compute requirements of a DQ Job.

  • SQL assistant for data quality
  • SQL assistant for data quality is now generally available! This exciting tool allows you to automate SQL rule writing and troubleshooting to help you accelerate the discovery, curation, and visualization of your data. By leveraging SQL assistant for data quality powered by Collibra AI, beginner and advanced SQL users alike can quickly discover key data points and insights and then convert them into rules.
  • Anywhere Dataset Overview is, so is SQL assistant for data quality. This means you can unlock the power of Collibra AI from the following pages:

    • Explorer
    • Profile
    • Findings
    • Alert Builder

    For the most robust SQL rule building experience, you can also find SQL assistant for data quality when adding or editing a rule on the Rule Workbench page.

  • Further, we've added the ability to create an AI prompt for the frequency distribution of all values within a column. From the Collibra AI dropdown menu, select Frequency Distribution in the Advanced section and specify a column for Collibra AI to create a frequency distribution query. (idea #DCC-I-2639)

Enhancements

Capabilities

  • We added a JSON tab to the Review step in Explorer and the Findings page to allow you to analyze, copy, and run the JSON payload of jobs.
  • When exporting rule breaks with details, the column order in the .xlsx file now matches the arrangement in the Data Preview table on the Findings page. (idea #DCC-I-1656, DCC-I-2400)
    • Specifically, the columns in the export file are organized from left to right, following the same sequence as in the Data Preview table. The columns are sorted with the following priority:
      1. Column names starting with numbers.
      2. Columns names starting with letters.
      3. Column names starting with special characters.
  • To improve user experience when using the Rules table on the Findings page, we’ve locked the column headers and added a horizontal scrollbar to the rule breaks sub-table.
  • When a user who is not the dataset owner or has ROLE_ADMIN or ROLE_DATASET_MANAGER attempts to delete one or multiple datasets, they are prevented from a successful dataset deletion, and an error message displays to help inform them of the role requirements needed to delete datasets. (idea #DCC-I-1938)
  • We added a new Attributes section with two filter options to the Dataset Manager. (idea #DCC-I-2155)
    • The Rules Defined filter option displays the datasets in your environment that contain rules only (not alerts).
    • The Alerts Defined filter option displays the datasets in your environment that contain alerts only (not rules).
    • When both filter options are selected, datasets that contain both rules and alerts display.
  • With this release, we made several additional enhancements to SQL assistant for data quality:
    • The Collibra AI dropdown menu now has improved organization. We split the available options into two sections:
      • Basic: For standard rule generation and troubleshooting suggestions.
      • Advanced: For targeted or otherwise more complex SQL operations.
    • You can now click and drag your cursor to highlight and copy specific rows and columns of the results table, click column headers to sort or highlight the entire column, and access multiple results pages through new pagination.
    • We improved the UI and error handling.
  • You can now authenticate SQL Server connections using an Active Directory MSI client ID. This enhancement, available in both Java 11/Spark 3.4.1 and Java 8/Spark 3.2.2, better enables your team to follow Azure authentication best practices and InfoSec policies. For more information about configuration details, see the Authentication documentation for SQL Server.
  • We added an automatic cleaner to clear the alert_q table of stale alerts marked as email_sent = true in the Metastore.
  • We removed the license key from job logs.

Platform

  • By specifying additional projects (AdditionalProjects) in the Connection URL, we now support multiple GCP projects in Google BigQuery connections. You can now specify the project ID in the Connection URL (limited to 1 additional project ID). With this enhancement, you no longer need to append the project ID in the command line.
  • When running Google BigQuery jobs via the /v3/jobs/run API, the dataDef updates with the correct -lib and -srclib parameters, and the jobs run successfully.
  • The names of all out-of-the-box sensitive labels now begin with "OOTB_". This enhancement allows you to define your own sensitive labels with names that were previously reserved, such as PII, PHI, and CUI.
  • Important If you upgrade to Collibra Data Quality & Observability 2024.05 and then roll back to a previous version, you will receive a unique constraint conflict error, as the sensitive label enhancement required a change to the Metastore.

  • We've updated or enhanced the following API endpoints:
  • Method Endpoint Controller Name Description
    POST /v3/rules/{dataset} rule-api

    After using GET /v3/rules to return all rules in your environment, you can now use POST /v3/rules/{dataset} to migrate them to another environment.

    When settings are changed and you use POST /v3/rules/{dataset} again, those rules (with the same name) are updated.

    GET /v3/datasetDefs/{dataset} dataset-def-api

    We've made the following enhancements to the GET /v3/datasetDefs/{dataset} API:

    • We’ve added job scheduling information to the dataset def to allow you to GET and POST this information along with the rest of the dataset definition.
    • We’ve added the outlier weight configs to the dataset def.
    • You can now use the GET /v3/datasetDefs/{dataset} API to return a dataset’s meta tags.
    • We’ve restructured the JobScheduleDTO to make job scheduling more intuitive when using the /v3/datasetDefs/{dataset} API.
    POST /v3/datasetDefs/find dataset-def-api

    We've updated the following parameter names for consistency with the latest Collibra DQ UI:

    • connectiontype is now connectiontypes
    • dataclass is now dataClasses
    • datacategory no longer displays
    • businessUnitIds is now businessUnitNames
    • dataConceptIds is now dataCategoryNames
    • sensitivityIds is now sensitivityLabels

    Additionally, this API returns specific filtered arrays of datasetDefs.

    Parameter descriptions:

    • "limit": 0, = The maximum number of records returned
    • "offset": 0, = The number of records that should be skipped from the beginning and can be used to return the next ’pages' or number of results after calling the API in sequence
    POST /v3/datasetDefs dataset-def-api You can now use the POST /v3/datasetDefs/{dataset} API to add meta tags to a dataset.
    DELETE /v3/datasetDefs dataset-def-api When removing a dataset using the DELETE /v3/datasetdef API, you can now successfully rename another dataset to the name of the deleted dataset.
    POST /v2/datasetDefs/migrate controller-dataset You can now add a dataset def to create a dataset record in the Dataset Manager without running the job or setting a job schedule. This is useful when migrating from a source environment to a target environment.
    GET /v2/assignment-q/find-all-paging-datatables controller-assignment-q We’ve added an updateTimestampRange parameter to the GET /v2/assignment-q/find-all-paging-datatables API to allow for the filtering of assignments records based on timestamp updates.

Integration

  • We improved the connection mapping when configuring the integration by introducing pagination for tables, columns, and schemas.
  • For improved security when sharing data between applications, we have temporarily removed the Score Details attribute from the Collibra Data Intelligence Platform integration and the JSON payload.

Pushdown

  • When rule breaks are stored in the PostgreSQL Metastore with link IDs assigned, you can now download a CSV file containing the details of the rule breaks and link ID columns via the Findings page Rules tab Actions Rule Breaks modal.
    • Additionally, the following Jobs APIs now return the source rule breaks file containing the SQL statement for breaks Pushdown jobs in JSON, CSV, or SQL:
      • /v3/jobs/{dataset}/{runDate}/breaks/rules
      • /v3/jobs/{dataset}/{runDate}/breaks/outliers
      • /v3/jobs/{dataset}/{runDate}/breaks/dupes
      • /v3/jobs/{dataset}/{runDate}/breaks/shapes
      • /v3/jobs/{jobId}/breaks/rules
      • /v3/jobs/{jobId}/breaks/outliers
      • /v3/jobs/{jobId}/breaks/dupes
      • /v3/jobs/{jobId}/breaks/shapes

Fixes

Capabilities

  • The Dataset Overview, Findings, Profile, and Rules pages in the latest UI now correctly display the number of rows in your dataset. Previously, the rows displayed correctly in the job logs but did not appear on the aforementioned pages. (ticket #137230, 137979, 140203)
  • When using remote file connections with Livy enabled in the latest UI, files with the same name load data content correctly. We fixed an issue where data from the first file persisted in the second file of the same name.
  • We fixed an issue where renaming a dataset using the same characters with different casing returned a success message upon saving, but still reflected the old dataset name. For example, an existing dataset renamed "EXAMPLE_DATASET" from "example_dataset" now updates correctly. (ticket #139384)
  • When creating jobs on S3 datasets based on data from CSV files with pipe delimited values, the delimiter no longer reverts from Pipe (|) to Comma (,) when you run the job. (ticket #132097)
  • We fixed an issue with the Edit Schedule modal on the latest UI where both Enabled and Disabled displayed at once. (ticket #139207)

Platform

  • We fixed an issue where the username and password credentials for authenticating Azure Blob Storage connections did not properly save in the Metastore, resulting in job failure at runtime. (ticket #131026, 138844, 140793, 142635,145201)
  • When a rule includes an @ symbol in its query without referring to a dataset, for example, select * from @dataset where column rlike ‘@’, the rule now passes syntax validation and no longer returns an error. (ticket #139670)

Integration

  • You can now map columns containing uppercase letters or special characters from Google BigQuery, Amazon Athena, Amazon Redshift, Snowflake, and PostgreSQL datasets created in Collibra Data Quality & Observability to column relations in Collibra Data Intelligence Platform. (ticket #133280)
  • We fixed an issue where integrated datasets did not load correctly on the Dataset Manager page. Instead, a generic error message appeared on the Dataset Manager without loading any datasets. (ticket #136303, 140286)
  • We fixed an issue where the dimension cards did not display when using the Quality tab on Column and Rule Asset pages. (ticket #122949)

Pushdown

  • We updated some of the backend logic to allow the Archive Break Records option in the Connections modal to disable the Archive Break Records options on the Settings modal on the Explorer page. (ticket #137396)
  • We added support for special characters in column names. (ticket #135383)

Latest UI

  • We added upper and lower bound columns to the export with details file for Outliers.
  • We fixed the ability to clear values in the Sizing step when manually updating job estimation fields during job creation.
  • We improved the ability to update configuration settings for specific layers in the job creation process.
  • We fixed intermittent errors when loading text and Parquet files in the job creation process.
  • We added the correct values to the Day of Month dropdown menu in the Scheduler modal.

Limitations

Platform

  • Due to a change to the datashapelimitui admin limit in Collibra Data Quality & Observability 2024.04, you might notice significant changes to the number of Shapes marked on the Shapes tab of the Findings page. While this will be fixed in Collibra Data Quality & Observability 2024.06, if you observe this issue in your Collibra Data Quality & Observability environment, a temporary workaround is to set the datashapelimit admin limit on the Admin Console > Admin Limits page to a significantly higher value, such as 1000. This will allow all Shapes findings to appear on the Shapes tab.

Integration

  • With the latest enhancement to column mapping, you can now successfully map columns containing uppercase letters and special characters, but columns containing periods cannot be mapped.

DQ Security

Important A high vulnerability, CVE-2024-2961, was recently reported and is still under analysis by NVD. A fix is not available as of now. However, after investigating this vulnerability internally and confirming that we are impacted, we have removed the vulnerable character set, ISO-2022-CN-EXT, from our images so that it cannot be exploited using the iconv function. Therefore, we are releasing Collibra Data Quality & Observability 2024.05 with this known CVE without an available fix, and we have confirmed that Collibra Data Quality & Observability 2024.05 is not vulnerable.

Additionally, a new vulnerability, CVE-2024-33599, was recently reported and is still under analysis by NVD. Name Service Cache Daemon (nscd) is a daemon that caches name service lookups, such as hostnames, user and group names, and other information obtained through services like DNS, NIS, and LDAP. Because nscd inherently relies on glibc to provide the necessary system calls, data structures, and functions required for its operation our scanning tool reported this CVE under glibc vulnerabilities. Since this vulnerability is only possible when ncsd is present and nscd is neither enabled nor available in our base image, we consider this vulnerability a false positive that cannot be exploited.

Release 2024.04

Release Information

  • Release date of Collibra Data Quality & Observability 2024.04: April 29, 2024
  • Publication dates:
    • Release notes: April 4, 2024
    • Documentation Center: April 4, 2024

Enhancements

Capabilities

  • You can now authenticate SQL Server connections using Active Directory Password and Active Directory Service Principal. This enhancement, available in both Java 11/Spark 3.4.1 and Java 8/Spark 3.2.2, allows you to use Azure AD-based Synapse SQL and Azure Service Principal authentication in Collibra Data Quality & Observability, to further enable your team to follow Azure authentication best practices and InfoSec policies.
  • We improved some of the ways you can work with the Scorecards page:
    • The Page dropdown menu is now sorted alphabetically.
    • You can now rename scorecard pages without the need to recreate them from scratch.
    • You can now change the order in which scorecards are displayed on a scorecards page by clicking the up or down arrows to the left of a scorecard.
  • Archive Break Records for both Pushdown and Pullup jobs now parses linkId values in the Sample File Preview and Rule Breaks Preview on the Findings page and in the downloadable CSV file.
  • Running a job with a SQLF rule that references a secondary dataset or uses @t1 no longer flashes an extra record in the job log or on the Jobs page with an Agent Id of 0. With this enhancement, only a single job record displays on the Jobs page. (idea #DCC-I-2413)
  • When using the Dataset Overview, you can now click Download Results to download a CSV file containing the contents of the Results table.

Platform

  • We improved some of the ways you can create or edit alerts:
    • To enhance the fluidity of batch alerting from the Status Alert and Condition Alert modals, you can now select the new Batch option to display the Batch Name field, where you can search, select, or create alert batches.
    • With the Batch option selected, the Alert Recipient field locks. To edit this field, you can click the to unlock it. When the Batch option is not selected, the Alert Recipient field remains unlocked and editable.

Fixes

Capabilities

  • We improved the Data Shape Granular setting to include more string length values in Shapes findings. (ticket #135351)
  • When joining datasets created from two different data source connections with Kerberos keytab as the authentication type, we fixed an issue that prevented the secondary dataset from loading because its Kerberos authentication was incorrectly passed. (ticket #131309)
    • To ensure that the secondary dataset is passed correctly during Kerberos authentication, add -jdbcprinc and -jdbckeytab to the Agent configuration's Free Form (Appended) section. For example, -jdbckeytab /tmp/keytab/dq-user.keytab -jdbcprinc [email protected]
  • We added the ability to export all job schedule records from the Jobs Schedule page, using the Export All option. Previously, the ability to export job schedule records was limited to up to 20 records per page. (ticket #136970)
  • We added the following enhancements to the Findings page when data quality findings exceed certain thresholds. (ticket #136922)
    • When there are more than 9,999 findings of any data quality layer, the value displayed in the badge on the corresponding findings tab will round to the nearest thousand with a +. For example, 12,345 will display as 12K+.
    • When there are more than 999,000 findings of any data quality layer, the value displayed in the badge on the corresponding findings tab will always display as 1M+. For example, 1,234,567 will display as 1M+.
    • When a value is truncated, you can hover your cursor over the badge to display the exact number of findings.
  • When using SAML SSO to sign into multi-tenant Collibra Data Quality & Observability environments, the SAML Enabled Tenants dropdown menu no longer shows the Tenant Name. Instead, the dropdown menu now shows the Tenant Display Name. (ticket #137865)
  • We removed the runId from the “Findings - Most Recent Run” link in alert emails to correctly take you to the most recent run of your job when you click the link.
  • When the “Findings - Run which Produced the Alert” link in alert emails contains a runId and a timestamp in the URL, you will be taken to that specific job runId and timestamp when you click the link.

Platform

  • We introduced pagination to the Rule Summary page while limiting each page to displaying a maximum of 25 records. Previously, the Rule Summary page only displayed 25 records on a single page, even when the number of records should have exceeded 25. (ticket #140229)
  • When an admin sets a limit for the datashapelimitui setting from Admin Limits, the Findings page no longer displays Shapes findings beyond that limit. (ticket #129091)
  • The Role Management page in the latest UI now allows admins to set the access rights of roles and users when associating them with specific datasets (ticket #138489)
  • The Dataset Manager page now loads correctly when the alias name of a dataset is null. (ticket #133400)
  • When filtering by row count on the Dataset Manager page, the results included in the filter no longer include daily row counts that exceed the range you select. (ticket #133453)
  • JSON files from Amazon S3 connections no longer fail to load when Livy is not enabled.
  • When large datasets (for example, one with more than 100 million records) timeout with a 504 error response when loading the table in Explorer, an error message appears in the Explorer UI with details about the error. (ticket #133530)

Pushdown

  • Databricks Pushdown now supports ANSI-compliant SQL on the server side. (ticket #136562)
  • The out-of-the-box Data Category, Currency_CD, no longer counts the number of null and empty values as part of the underlying SQL query. (ticket #133578)

Latest UI

  • We added support for multiple pages of results on the Rule Summary page.
  • We improved the performance of the Pulse View when loading large amounts of data.
  • We updated the display of the Pulse View when scrolling.
  • The Command Line input field on the Findings page now supports vertical scrolling.
  • The Source tab on the Findings page now displays all labels for Cell results.
  • We updated the Job Schedule export function to include all jobs instead of limiting them to 20.
  • We added the ability to assign ACL to datasets from the Role Management page in the Admin Console.
  • We added Copy and Export buttons to the View AR modal on the Findings page for Behaviors.
  • We fixed the ability to change the upper and lower bound values for AdaptiveRules.
  • We now display the exact number of results on the tabs for each of the layers on the Findings page instead of "99+" for all values over 99.
  • The Rules tab now updates automatically when you add Quick Rules from the Data Preview section.
  • Success messages now appear on the Findings page when you use the Validate, Invalidate, and Resolve functions.

Limitations

Platform

  • Due to a change to the datashapelimitui admin limit in this release, you might notice significant changes to the number of Shapes marked on the Shapes tab of the Findings page. While this will be fixed in Collibra Data Quality & Observability 2024.06, if you observe this issue in your Collibra Data Quality & Observability environment, a temporary workaround is to set the datashapelimit admin limit on the Admin Console > Admin Limits page to a significantly higher value, such as 1000. This will allow all Shapes findings to appear on the Shapes tab.

DQ Security

Important A new high vulnerability, CVE-2024-2961, was recently reported and is still under analysis by NVD. A fix is not available as of now. However, after investigating this vulnerability internally and confirming that we are impacted, we have removed the vulnerable character set, ISO-2022-CN-EXT, from our images so that it cannot be exploited using the iconv function. Therefore, we are releasing Collibra Data Quality & Observability 2024.04 with this known CVE without an available fix, and we have confirmed that Collibra Data Quality & Observability 2024.04 is not vulnerable.

Release 2024.03

Release Information

  • Release date of Collibra Data Quality & Observability 2024.03: April 1, 2024
  • Publication dates:
    • Release notes: March 8, 2024
    • Documentation Center: March 11, 2024

Enhancements

Capabilities

  • Admins can now view monthly snapshots of the total number of active datasets in the new Datasets column on the Admin Console Usage page. Additionally, Columns statistics are no longer counted twice when you edit and re-run a dataset with different columns.
  • Admins can now optionally remove “Collibra Data Quality & Observability” from alert email subjects. When removing the Collibra Data Quality & Observability from the subject, you must fill in all the alert SMTP details to use the alert configuration checkboxes on the screen. By removing Collibra Data Quality & Observability from the alert subject, you can now set up your own services to automatically crawl Collibra DQ email alerts.
  • Dataset-level alerts, such as Job Completion, Job Failure, and Condition Alerts, as well as global-level alerts for both Pullup and Pushdown Job Failure now send incrementally from the auto-enabled alert queue.
  • Important As a result of this enhancement, following an upgrade to Collibra Data Quality & Observability 2024.03, any old, unsent alerts in the alert queue will be sent automatically. This is a one-time event and these alerts can be safely ignored.

    • We've also added new functionality where, when an alert fails to send, it will still be marked as email_sent = true in the alert_q Metastore table. However, no email alerts will be sent as a result of this. An enhancement to automatically clean the alert_q table of stale alerts marked as email_sent = true is scheduled for the upcoming Collibra Data Quality & Observability 2024.05 release.
  • We've optimized internal processing when querying Trino connections by passing the catalog name in the Connection URL. The catalog name can be set by creating a Trino connection from Admin Console Connections and adding ConnCatalog=$catalogName to the Connection URL.
  • We've added a generic placeholder in the Collibra DQ Helm Charts to allow you to bring in additional external mount volumes to the DQ Web pod. Additionally, when enabled in the DQ check logs and external mount volumes, the persistent volumes provisioned to set the storage class of the persistent volume claims for Collibra DQ now include a placeholder value to allow you to specify the storage class type. This provides an option to bring Azure vault secrets as external mount volumes into the DQ Web pod.

Platform

  • We now support Cloud Native Collibra DQ deployments on OpenShift Container Platform 4.x.
  • When using a proxy, SAML does not support URL-based metadata. It can only support file-based metadata. To ensure this works properly, set the property SAML_METADATA_USER_URL=false in the owl-env.sh file for Standalone deployments or DQ-web ConfigMap for Cloud Native.
  • We removed the Profile Report, as all of its information is also available on the Dataset Findings Report.

Fixes

Capabilities

  • We fixed an issue that caused Redshift datasets that referenced secondary Amazon S3 datasets to fail with a 403 error. (ticket #132975)
  • On the Profile page of a dataset, the number of TopN shapes now matches the total number of occurrences of such patterns. (ticket #133817)
  • When deleting and renaming datasets from the Dataset Manager, you can now rename a dataset using the name of a previously deleted one. (ticket #132799)
  • When renaming a dataset from the Dataset Manager and configuring it to run on a schedule, the scheduled job no longer combines the old and new names of the dataset when it runs. (ticket #132798)
  • On the Role Management page in the latest UI, roles must now pass validation to prevent ones that do not adhere to the naming requirements from being created. (ticket #133497)
  • When creating a Trino dataset with Pattern detection enabled, date type columns are no longer cast incorrectly as varchar type columns during the historical data load process. (ticket #132478)
  • On the Connections page in the latest UI, the input field now automatically populates when you click Assume Role for Amazon S3 connections. (ticket #132323, 132423)
  • On the Findings page, when reviewing the Outliers tab, the Conf column now has a tooltip to clarify the purpose of the confidence score. (ticket #129768)
  • When DQ Job Security is enabled and a user does not have ROLE_OWL_CHECK assigned to them, both Pullup and Pushdown jobs now show an error message “Failed to load job to the queue: DQ Job Security is enabled. You do not have the authority to run a DQ Job, you must be an Admin or have the role Role_Owl_Check and be mapped to this dataset.” (ticket #133623)
  • When creating a DQ job on a root or nested Amazon S3 folder without any files in it, the system now returns a more elegant error message. (ticket #134187)
  • On the Profile page in the latest UI, the WHERE query is now correctly formed when adding a valid values quick rule. (ticket #133455)

Platform

  • When viewing TopN values, the previously encrypted valid values now decrypt correctly. (ticket #131951)
  • When a user with ROLE_DATA_GOVERNANCE_MANAGER edits a dataset from the Dataset Manager, the Metatags field is the only field such a user can edit and have their updates pushed to the Metastore. (ticket #132468, 135889)
  • We fixed an issue with the /v3/datasetDefs/{dataset} and /v3/datasetDefs/{dataset}/cmdLine APIs that caused the -lib, -srclib, and -addlib parameters to revert in the command line. (ticket #131281)

Pushdown

  • When casting from one data type to another on a Redshift dataset, the job now runs successfully without returning an exception message. (ticket #128718)
  • When running a job with outliers enabled, we now parse the source query for new line characters to prevent carriage returns from causing the job to fail. (ticket #132322)
  • The character limit for varchar columns is now 256. Additionally, to prevent jobs from failing when a varchar column exceeds the 256-character limit. (ticket #131355)
  • BigQuery Pushdown jobs on huge datasets of more than 10GB no longer fail with a “Response too large” error. (ticket #134643, 135504)
  • When running a Pushdown job with archive break records enabled without a link ID assigned to a column, a helpful warning message now highlights the requirements for proper break record archival. (ticket #132545)
  • The Source Name parameter on the Connections template for Pushdown connections now persists to the /v2/getcatalogandconnsrcnamebydataset API call as intended. (ticket #132334)

Limitations

  • TopN values from jobs that ran before enabling encryption on the Collibra DQ instance are not decrypted. To decrypt TopN values after enabling encryption, re-run the job once encryption is enabled on your Collibra DQ instance.

DQ Security

Release 2024.02

Release Information

  • Release date of Collibra Data Quality & Observability 2024.02: February 26, 2024
  • Publication dates:
    • Release notes: January 22, 2024
    • Documentation Center: February 4, 2024

Highlights

    Archive Break Records

    Pullup
    When rule breaks are stored in the PostgreSQL Metastore with link IDs assigned, you can now download a CSV file containing the details of the rule breaks and link ID columns via the Findings page Rules tab Actions Rule Breaks modal.

    Pushdown
    In order to completely remove sensitive data from the PostgreSQL Metastore, you can now enable Data Preview from Source in the Archive Break Records section of the Explorer Settings. When you enable Data Preview from Source, data preview records do not store in the PostgreSQL Metastore.

    Previews of break records associated with Rules, Outliers, Dupes, and Shapes breaks on the Findings page reflect the current state of the records as they appear in your data source. With this option disabled, the preview records that display in the web app are snapshots of the PostgreSQL Metastore records at runtime. This option is disabled by default.

    Additionally, with Archive Break Records enabled and a link ID column assigned, you can now download a CSV or JSON file containing the details of the breaks and link ID columns via the Findings page Rules, Outliers, Dupes, or Shapes tab Actions Rule Breaks modal.

    Lastly, when Archive Break Records is enabled, you can now optionally enter an alternative dataset-level schema name to store source break records, instead of the schema provided in the connection.

Important 
Changes for Kubernetes Deployments
As of Collibra DQ version 2023.11, we've updated the Helm Chart name from owldq to dq. For Helm-based upgrades, point to the new Helm Chart while maintaining the same release name. Please update your Helm install command by referring to the renamed parameters in the values.yaml file. It is also important to note that the pull secret has changed from owldq-pull-secret to dq-pull-secret.

Further, following deployment, your existing remote agent name will change. For example, if your agent name is owldq-owl-agent-collibra-dq, the new agent name will be dq-agent-collibra-dq. If your organization uses APIs for development, ensure that you upgrade AGENT name configurations in your environments.

Lastly, when you deploy using the new Helm Charts, new service (Ingress/Load Balancer) names are created. This changes the IP address of the service and requires you to reconfigure your Load Balancer with the new IP.

Please see the expandable sections below for more details about specific changes.

Note 
If your organization has a standalone deployment of Collibra DQ with SSL enabled for DQ Web, and both DQ Web and DQ Agent are on the same VM or server, we recommend upgrading directly to Collibra DQ 2023.11.3 patch version instead of 2023.11. For more information, see the Maintenance Updates section below.

Migration Updates

Important This section only applies if you are upgrading from a version older than Collibra DQ 2023.09 on Spark Standalone. If you have already followed these steps during a previous upgrade, you do not have to do this again.

We have migrated our code to a new repository for improved internal procedures and security. Because owl-env.sh jar files are now prepended with dq-* instead of owl-*, if you have automation procedures in place to upgrade Collibra DQ versions, you can use the RegEx replace regex=r"owl-.*-202.*-SPARK.*\.jar|dq-.*-202.*-SPARK.*\.jar" to update the jars.

Additionally, please note the following:

  • Standalone Upgrade Steps
  • When upgrading from a Collibra DQ version before 2023.09 to a Collibra DQ version 2023.09 or later on Spark Standalone, the upgrade steps have changed.

Enhancements

Capabilities

  • When using the Dataset Overview, you can now click the -q button to load the contents of the dataset source query into the SQL editor.
  • When using the Dataset Overview, you can now use Find and Replace to find any string in the SQL editor and replace it with another.
  • When a finding is assigned to a ServiceNow incident and the ServiceNow connection has Publish Only enabled on the ServiceNow Configuration modal in the Admin screens, this finding record is pushed to ServiceNow as it was in previous versions, but the status is no longer linked. This means that you can adjust the statuses on the ServiceNow Incident as you wish and the finding in DQ. Whereas before, the ServiceNow Incident had to be closed in order for the DQ finding to be resolved.
  • From the Settings page in Explorer, you can now select the Core Fetch Mode option to allow SQL queries with spaces to run successfully. When selected, this option adds -corefetchmode to the command line to enable the core to fetch the query from the load options table and override the -q.
  • When attempting to connect to NetApp or Amazon S3 endpoints in URI format with the HTTPS option selected, you can now add the following properties to the Properties tab on Amazon S3 connection templates to successfully create connections:
    • For Amazon S3 endpoint URI: s3-endpoint=s3
    • For NetApp: s3-endpoint=netapp
  • When using the Pulse View, you can now select a few new options from the Show Failed dropdown menu, including Failed Job Runs and Failing Scores. Previously, the Show Failed option only displayed job runs that previously failed.
  • You can now use uppercasing in secondary datasets and rule references.
  • You can now configure arbitrary users as part of the root user group for DQ pod deployment.
  • Due to security concerns, we have removed the license key from the job logs.

Platform

  • We've upgraded the following drivers to their latest versions:
  • Driver Version
    Databricks 2.6.36
    Google BigQuery

    1.5.2.1005

    Dremio 24.3.0
    Snowflake 3.14.4
  • You can now enable multi-tenancy for a notebook API.
  • We now apply the same Spark CVE fixes that are applied to Cloud Native deployments of Collibra DQ to Standalone deployments.

Pushdown

  • From the Settings page on Explorer, you can now select Date or DateTime (TimeStamp) from the Date Format dropdown menu to substitute the runDate and runDateEnd at runtime.
  • To conserve memory and processing resources, the results query now rolls up outliers and shapes, and the link IDs no longer persist to the Metastore.
  • All rules from the legacy Rule Library function correctly for Snowflake and Databricks Pushdown except for Having_Count_Greater_Than_One and Two_Decimal_Places when Link ID is enabled. See the Known Limitations section below for more information.
  • You can now use cross-dataset rules that traverse across connections on the same data source.

Beta Features

Collibra AI

  • SQL assistant for data quality (beta) now allows you to select between four new options to generate prompts for:
    • Categorical: Writes a SQL query to detect categorical outliers.
    • Dupe: Writes a SQL query to detect duplicate values.
    • Record: Writes a SQL query to find values that appear on a previous day but not for the next day.
    • Pattern: Writes a SQL query to find infrequent combinations that appear less than 5 percent of the time in the columns you specify.

DQ Integration

  • The new Quality tab is now available as part of the latest UI updates for Asset pages in Collibra Data Intelligence Platform for private beta participants, giving you at-a-glance insights into the quality of your assets. These insights include:
    • Score and dimension roll-ups.
    • Column, data quality rule, data quality metric, and row overviews.
    • Details about the data elements of an asset.
  • You can now see the Overview DQ Score on an Asset when searching via Data Marketplace. This improves your ability to browse the data quality scores of Assets without opening their Asset Pages.

Pushdown

Fixes

Capabilities

  • While editing the command line of a job containing an outlier by replacing -by HOUR with -tbin HOUR, the command line no longer reverts to its original state after profiling completes. (ticket #126764)
  • When exporting the job log details to CSV, Excel, PDF, or Print from the Jobs page, the exported data now contains all rows of data. (ticket #129832)
    • Additionally, when exporting the job log details to PDF from the Jobs page, the PDF file now contains the correct column headers and data. (ticket #129832)
  • When working with the Alert Builder, you no longer see a “No Email Servers Configured” message despite having correctly configured SMTP settings. (ticket #127520)

DQ Integration

  • When integrating data from an Athena connection, you can now use the dropdown menu in rules to map an individual column to a Rule in Collibra Data Intelligence Platform. (ticket #125152, 126150)

Pushdown

  • When archive breaking records is enabled, statements containing backticks ` or new lines are properly inserted into the source system. (ticket #130122)
  • Snowflake Pushdown jobs with many outlier records either dropped or added, new limits to memory usage now prevent out-of-memory issues. (ticket #126284)

Known Limitations

  • When Link ID is enabled for a Snowflake or Databricks Pushdown job, Having_Count_Greater_Than_One and Two_Decimal_Places do not function properly.
    • The workaround for Having_Count_Greater_Than_One is to manually add the Link ID to the group by clause in the rule query.
    • The workaround for Two_Decimal_Places is to add a * to the inner query.

DQ Security

Note If your current Spark version is 3.2.2 or older, we recommend upgrading to Spark 3.4.1 to address various critical vulnerabilities present in the Spark core library, including Log4j.