Release Notes

Important 

Disclaimer - Failure to upgrade to the most recent release of the Collibra Service may adversely impact the security, reliability, availability, integrity, performance or support (including Collibra’s ability to meet its service levels) of the Service. Collibra hereby disclaims all liability, express or implied, for any reduction in the security, reliability, availability, integrity, performance or support of the Service to the extent the foregoing would have been avoided had you allowed Collibra to implement the most current release of the Service when scheduled by Collibra. Further, to the extent your failure to upgrade the Service impacts the security, reliability, availability, integrity or performance of the Service for other customers or users of the Service, Collibra may suspend your access to the Service until you have upgraded to the most recent release.

Release 2024.11

Release Information

  • Release dates of Collibra Data Quality & Observability:
    • November 25, 2024: Collibra Data Quality & Observability 2024.11
    • December 11, 2024: Collibra Data Quality & Observability 2024.11.1
  • Release notes publication date: October 31, 2024

Announcement

Important 

As a security measure, we are announcing the end of life of the Java 8 and 11 versions of Collibra Data Quality & Observability, effective in the August 2025 (2025.08) release.

In the February 2025 (2025.02) release, Collibra Data Quality & Observability will only be available on Java 17. Depending on your installation of Collibra Data Quality & Observability, you can expect the following in the 2025.02 release:
  • Kubernetes installations: Kubernetes containers will automatically contain Java 17. You may need to update your custom drivers and SAML keystore to maintain compatibility with Java 17.
  • Standalone installations: You must upgrade to Java 17 to install Collibra Data Quality & Observability 2025.02. Additional upgrade guidance will be provided upon the release date. We encourage you to migrate to a Kubernetes installation, to improve the scalability and ease of future maintenance.

The March 2025 (2025.03) release will have Java 8 and 11 versions of Collibra Data Quality & Observability and will be the final release to contain new features on those Java versions. Between 2025.04 and 2025.07, only critical and high-priority bug fixes will be included in the Java 8 and 11 versions of Collibra Data Quality & Observability.

Additional details on driver compatibility, SAML upgrade procedures, and more will be available alongside the 2025.02 release.

For more information, visit the Collibra Data Quality & Observability Java Upgrade FAQ.

Enhancements

Platform

  • When querying the rule_output table in the Metastore, the rows_breaking and total_count columns now populate the correct values for each assignment_id. When a rule filter is used, the total_count column reflects the filtered number of total rows.

Integration

  • We automated connection mapping by introducing:
    • Automapping of schemas, tables, and columns.
    • The ability to view table statistics for troubleshooting unmapped or partially mapped connections.
  • The Quality tab is now hidden when an aggregation path is not available. (idea #DCC-I-3252)

Pushdown

  • Snowflake Pushdown connections now support source to target analysis for datasets from the same Snowflake Pushdown connection.
  • You can now monitor advanced data quality layers for SAP HANA Pushdown connections, including categorical and numerical outliers and records.
  • Trino Pushdown connections now support multiple link IDs for dupes scans.

Connections

  • We now provide out of the box support for Cassandra and Denodo data source connections. You can authenticate both connection types with the basic username and password combination and password manager method.
  • Amazon S3 now supports OAuth2 authentication, whereby you can use an Okta principal as a service account to authenticate Amazon S3 connections and access files therein.
  • You can now authenticate SQL Server connections with NTLM.
  • We upgraded the Snowflake JDBC driver to 3.20.0.

Jobs

  • You can now set a new variable, -rdAdj, in the command line to dynamically calculate and substitute the run date for the -rd variable at the run time of your DQ Job.
  • The metadata bar now displays the schema and table name.

Findings

  • If you assign multiple Link IDs in a Dupes configuration, each Link ID is now present in the break record preview.
  • When there are rule findings, the Breaking Records column on the Rules tab displays the number of rows that do not pass the conditions of a rule. In the Metastore, the values from the Breaking Records column are included in the rows_breaking column of the rule_output table. However, after initially upgrading to 2024.11, values in the rows_breaking column remain [NULL] until you re-run your DQ Job.
  • Important To include data from the rows_breaking column in a dashboard or report, you first need to re-run your DQ Job to populate the column with data.

Alerts

  • There are now 8 new variables that allow you to create condition alerts for the scores of findings that meet their criteria. These condition variables include:
    • behaviorscore
    • outlierscore
    • patternscore
    • sourcescore
    • recordscore
    • schemascore
    • dupescore
    • shapescore
  • Example To create an alert for shapes scores above 25, you can set the condition to shapescore > 25.

  • Job failure alerts now send when a DQ Job fails in the Staged or Initiation activities.

Dataset Manager

  • You can now edit and clone DQ Jobs from the Actions button in the far right column on the Dataset Manager.

Fixes

Integration

  • Data Quality Job assets now display a “No data quality score available” message when an invalid rule is selected.
  • When Collibra Data Quality & Observability cannot retrieve the columns from a table or view during the column mapping process, the column UUIDs in Collibra Data Intelligence Platform are now used by default.

Pushdown

  • You can now run Pushdown Jobs using OAuth Tokens generated by the /v3/auth/Oauth/signin endpoint.
  • Unique adaptive rules for Pushdown Jobs with columns that contain null values no longer fail when a scheduled run occurs.
  • When turning behavioral scoring off in the JSON definition of DQ Job created on Pushdown connections, behavior scores are no longer displayed.
  • When DQ Job created on Pushdown connections with Archive Break Records enabled run, references to link IDs in the rule query are now checked and added automatically if they are missing. This also allows you to add your own CONCAT() when using complex rules.
  • We improved the performance of DQ Jobs created on Snowflake Pushdown connections that use LIMIT 1 for data type queries.

Connections

  • We fixed a critical issue that prevented DQ Jobs on temp files from running because of a missing temp file bucket error.

Jobs

  • Backrun DQ Jobs are now included in the Stage 3 Job Logs.
  • Data Preview now works correctly when the source in the Mapping (source to target) activity is a remote file storage connection, such as Amazon S3.
  • DQ Jobs on Oracle datasets now run without errors when Parallel JDBC is enabled.
  • When using Dataset Overview to query an Oracle dataset, you no longer receive a generic "Error occurred. Please try again." error message when the source data contains a column with a "TIMESTAMP" data type.
  • When including any combination of the conformity options (Min, Mean, or Max) from the Adaptive Rules tab, the column of reference on the Shapes tab is no longer incorrectly marked “N/A” instead of “Auto.”
  • Shapes can now be detected after enabling additional Adaptive Rules beyond the default Adaptive Rules settings for file-based DQ Jobs.
  • After setting up a source to target mapping in the Mapping step of Explorer where both source and target are temp files, you no longer encounter a “Leave this Mapping” message when you click one of the arrow on the right side of the page to proceed to the next step.

Findings

  • After suppressing a behavior score for a dataset that you then use to create a scorecard, the scorecard and Findings page now reflect the same score.
  • After suppressing a behavior score and the total score is over 100, the new score is calculated correctly.

Rules

  • Rules with special characters in the link ID column now load successfully in the Rule Breaks preview.
  • When changing a rule type from a non-Native to Native rule, the Livy banner no longer displays and the Run Result Preview button is enabled. When changing any rule type to any other rule type that is non-Native, Livy checks run and the appropriate banner displays or the Run Result Preview button is enabled.

Alerts

  • When a single rule is passing after adding 3 distinct alerts for each Rule Status trigger (Breaking, Exception, and Passing) and one alert with all 3, unexpected alerts no longer send when the DQ Job runs.
  • Batch alerts now use the same alerts queue to process as all other alert emails.

APIs

  • The /v2/getdatapreview API is now crossed out and marked as deprecated in Swagger. While this API is now deprecated, it continues to function to allow backward compatibility and functionality in legacy workflows.
  • The Swagger UI response array now includes the 204 status code, which means that a request has been successfully completed, but no response payload body will be present.

Latest UI

  • When using Dataset Overview to query an Oracle dataset, you no longer receive a generic "Error occurred. Please try again." error message when the source data contains a column with a "TIMESTAMP" data type.
  • The Adaptive Rules modal on the Findings page now allows you to filter the results to display only Adaptive or Manual Rules or display both.
  • We re-added the ability to expand the results portion of the Findings page to full screen.
  • There is now an enhanced warning message when you create an invalid Distribution Rules from the Profile page.
  • The Select Rows step of Explorer now has a tooltip next to the Standard View option to explain why it is not always available.
  • The Actions button on the Dataset Manager now includes options to edit and clone DQ Jobs.
  • The Rule Details dialog now has a tooltip next to the "Score" buttons to explain the downscoring options.
  • We consolidated the individual login buttons on the Tenant Manager page to a single button that returns you to the main login page.
  • Table headers in exported Job Logs generated from the Jobs page now display correctly.

Beta features

Rules

  • You can now apply a rule tolerance value to indicate the threshold above which your rule breaks require the most urgent attention. Because alerts associated with rules can generate many alert notifications, this helps to declutter your inbox and allows you to focus on the rule breaks that matter most to you.
  • Rule filtering is now available for Pushdown DQ Jobs.

Maintenance Updates

  • We added a new check on the flyway library to resolve issues upon upgrade to Collibra Data Quality & Observability 2024.10.
  • Denodo connections now support OAuth2 authentication.
  • You can now Configure AWS passwordless authentication using Amazon RDS PostgreSQL as the Metastore using Amazon RDS PostgreSQL as the Metastore.
  • Note AWS passwordless authentication is currently only supported for EC2 Instance Profile-based authentication with an Amazon RDS Metastore for Collibra Data Quality & Observability standalone and cluster-based deployments. IAM pod role-based authentication support will be available in a future release.

Release 2024.10

Release Information

  • Release date of Collibra Data Quality & Observability 2024.10: October 29, 2024
  • Release notes publication date: September 23, 2024

Warning 

Some customers have encountered issues while upgrading to Collibra Data Quality & Observability 2024.10 due to a change in our flyway library that is not backwards compatible. The fix for this issue is included in the Collibra Data Quality & Observability 2024.11.1 patch. As always, we recommend backing up and restoring your Metastore before upgrading Collibra Data Quality & Observability versions.

Note The above issue only impacts upgrades to Collibra Data Quality & Observability 2024.10. New installations will not encounter this issue.

Enhancements

Pushdown

  • SAP HANA Pushdown is now generally available.
  • When creating and running DQ Jobs on SQL Server Pushdown connections, you can now perform schema, profile, and rules checks.
  • You can now scan for fuzzy match duplicates in DQ Jobs created on BigQuery Pushdown connections.
  • You can now scan for numerical outliers in DQ Jobs created on Trino Pushdown connections.
  • DQ Jobs created on Snowflake Pushdown connections now support union lookback for advanced outlier configurations.
  • DQ Jobs created on Snowflake Pushdown connections now support source to target validation to ensure data moves consistently through your data pipeline and identify changes when they occur.

Integration

  • You can now define custom integration data quality rules, which are also known as aggregation paths, in the Collibra Data Intelligence Platform operating model setting to allow you to view data quality scores for assets other than databases, schemas, tables, and columns.
  • Important You need a minimum of Collibra Data Intelligence Platform 2024.10 and Collibra Data Quality & Observability 2024.07.

  • To allow you to manage the scope of the DGC resources to which OAuth can grant access during the integration, a new OAuth parameter in the Web ConfigMap is now set to DQ_DGC_OAUTH_DEFAULT_REQUEST_SCOPE: "dgc.global-manage-all-resources" by default. This configuration grants Collibra Data Quality & Observability access via OAuth to all DGC resources during the integration. For more granular control over the DGC resources to which Collibra Data Quality & Observability is granted access via OAuth, we plan to introduce additional allowable values in a future release.
  • Users of the Quality tab in Collibra Data Intelligence Platform who do not have a Collibra Data Quality & Observability account can now view the history table to track the evolution of the quality score of a given asset.
  • When using Replay to run DQ Jobs over a defined historical period, for example 5 days in the past, the metrics from each backrun DQ Job is included in the DQ History table and the quality calculation.
  • After integrating a dataset from Collibra Data Quality & Observability and Collibra Data Intelligence Platform, you can now see the number of passing and failing rows for a given rule on the Data Quality Rule asset page.
  • The JSON of a dataset integration between Collibra Data Quality & Observability and Collibra Data Intelligence Platform now shows the number of passing and breaking records.

Jobs

  • You can now edit the schedule details of DQ Jobs from the Jobs Schedule page.
  • A banner now appears when a data type is not supported.

Rules

  • Data Class Rules now have a maximum character length of 64 characters for the Name option and 256 for the Description.
  • Email Data Classes where the email contains a single character domain name now pass validation. For example, [email protected]
  • The Rule Definitions page now has a Break Record Preview Available column to make it easier to see when a rule is eligible for previewing break records.
  • You can now use the search field to search for Rule Descriptions on the Rule Definitions page.

Alerts

  • You can now toggle individual alerts from the Active column on the Alert Builder page to improve control over when you want alerts to send. This can prevent unnecessary alerts from being sent during certain occasions, such as setup and debugging.

Dataset Manager

  • The Dataset Manager table now contains a searchable and sortable column called Connection Name to help identify your datasets more easily.
  • We aligned the roles and permissions requirements for the Dataset Manager API.
    • PUT /v2/updatecatalogobj requires its users to have ROLE_ADMIN, ROLE_DATA_GOVERNANCE_MANAGER, or ROLE_DATASET_ACTIONS.
    • PUT /v2/updatecatalog requires its users to be the dataset owner or have ROLE_ADMIN or ROLE_DATASET_ACTIONS.
    • DELETE /v2/deletedataset requires its users to be the dataset owner or have ROLE_ADMIN or ROLE_DATASET_MANAGER.
    • PATCH /v2/renameDataset requires its users to be the owner of the source dataset or have ROLE_ADMIN, ROLE_DATASET_MANAGER, or ROLE_DATASET_ACTIONS.
    • POST /v2/update-run-mode requires its users to have ROLE_DATASET_TRAIN, ROLE_DATASET_ACTIONS, or dataset access.
    • POST /v2/update-catalog-data-category requires its users to have ROLE_DATASET_TRAIN, ROLE_DATASET_ACTIONS, or dataset access.
    • PUT /v2/business-unit-to-dataset requires its users to have ROLE_DATASET_ACTIONS or dataset access.
    • POST /v2/business-unit-to-dataset requires its users to have ROLE_DATASET_ACTIONS or dataset access.
    • POST /dgc/integrations/trigger/integration requires its users to have ROLE_ADMIN, ROLE_DATASET_MANAGER, or ROLE_DATASET_ACTIONS.
    • POST /v2/postjobschedule requires its users to have ROLE_OWL_CHECK and either ROLE_DATASET_ACTIONS or dataset access.

Connections

  • The authentication dropdown menu for any given connection now displays only its supported authentication types.
  • We've upgraded the following drivers to the versions listed:
    • Db2 4.27.25
    • Snowflake 3.19.0
    • Note 
      If you use additional encryption algorithms for JWT authentication, you must set one of the following parameters during your deployment of Collibra Data Quality & Observability, depending on your deployment type:

      Helm-based deployments
      Set the following parameter in the Helm Chart: --set global.web.extraJvmOptions="-Dnet.snowflake.jdbc.enableBouncyCastle=true"

      Standalone deployments
      Set the following environment variable in the owl-env.sh: -export EXTRA_JVM_OPTIONS=”-Dnet.snowflake.jdbc.enableBouncyCastle=true"
    • SQL Server 12.6.4 (Java 11 only)
    • Note 
      While the Java 8 version is not officially supported, you can replace the SQL Server driver in the /opt/owl/drivers/mssql folder with the Java 8 version of the supported driver. You can find Java 8 versions of the supported SQL Server on the Maven Repository.

Admin Console

  • A new access control layer, Require DATASET_ACTIONS role for dataset management actions, is available from the Security Settings page. When enabled, a new out of the box role, ROLE_DATASET_ACTIONS, is required to allow its users to edit, rename, publish, assign data categories and business units, and enable integrations from the Dataset Manager.
  • A new out of the box role, ROLE_ADMIN_VIEWER, allows users who are assigned to it to access the following Admin Console pages, but restricts access to all others:
    • Actions
    • All Audit Trail subpages
    • Dashboard
    • Note Users with ROLE_ADMIN_VIEWER cannot access the pages to which the quick access buttons on the Dashboards page are linked.

    • Inventory
    • Schedule Restrictions
    • Note Users with ROLE_ADMIN_VIEWER cannot add or delete schedule restrictions.

    • Usage
  • You can now set both size- and time-based data purges from the Data Retention Policy page of the Admin Console. Previously, you could only set size-based data retention policies.

APIs

  • We’ve made several changes to the API documentation. First, we aligned the role checks between the Product APIs (V3 endpoints) and the Collibra Data Quality & Observability UI. We’ve also enhanced the documentation in Swagger to include more detailed descriptions of endpoints. Lastly, we reproduced the Swagger documentation of the Product API in the Collibra Developer Portal to ensure a more unified user experience with the broader Collibra platform and allow for easier scalability of API documentation in the future.

Fixes

Integration

  • The Overview - History section of Data Quality Job assets now displays the correct month of historical data when an integration job run occurs near the end of a given month.

Jobs

  • DQ Jobs on SAP HANA connections with SQL referencing table names containing semicolons ; now run successfully when you escape the table name with quotation marks " ". For example, the SQL query select * from TEST."SPECIAL_CHAR_TABLE::$/-;@#%^&*?!{}~\+=" now runs successfully.
  • You can now run DQ Jobs that use Amazon S3 as the secondary dataset with Instance Profile authentication.

Rules

  • Native rules on DQ Jobs created on connections authenticated by password manager now run successfully and return all related break records when their conditions are met.

Alerts

  • The Assignee column is no longer included in the alert email for Rule Status and Condition alerts with rule details enabled.

APIs

  • When using the POST /v2/controller-db-export call on a dataset with an alert condition, then using the POST /v2/controller-db-import call, now returns a successful 200 response instead of an unexpected JSON parsing error.

Latest UI

  • When running DQ Jobs on NFS connections, data files with the date format ${yyyy}${MM}${dd} within their file name are now supported.
  • Native Rules now display the variable name of parameters such as @runId and @dataset with the actual value in the Condition column of the Rules tab on the Findings page.
  • The Jobs Schedule page now shows the time zone offset (+ or - a number of hours) in the Last Updated column. Additionally, the TimeZone column is now directly to the right of the Scheduled Time column to improve its visibility.
  • You can now sort columns on the Job Schedule page.
  • The Agent Configuration, Role Management - Connections, Business Units, Inventory pages of the Admin Console now have fixed column headers and the Actions button and horizontal scrollbar are now visible at all times.
  • After adding or deleting rules, the rule count on the metadata bar now reflect any updates.

Beta features

  • The Rule Workbench now contains an additional query input field called “Filter,” which allows you to narrow the scope of your rule query so that only the rows you specify are considered when calculating the rule score. A filter query not only helps to provide a better representation of your quality score but improves the relevance of your rule results, saving both time and operational costs by reducing the need to create multiple datasets for each filter.
  • Important This feature is currently available as a public beta option. For more information about beta features, see Betas at Collibra.

Known limitations

  • In the this release, table names with spaces are not supported because of dataset name validation during the creation of a dataset. This will be addressed in an upcoming release.

DQ Security

Release 2024.09

Release Information

  • Release date of Collibra Data Quality & Observability 2024.09: September 30, 2024
  • Release notes publication date: September 5, 2024

Enhancements

Platform

Integration

  • Users of the Quality tab in Collibra Data Intelligence Platform who do not have a Collibra Data Quality & Observability account can now view the 7-day history of data quality scores, allowing you to monitor the health of your data over time.
  • When running a DQ Job with back run and an active integration, the results of the back run are now sent to Collibra Data Intelligence Platform where they are stored in the DQ services history table.

Pushdown

  • You can now scan for shapes in DQ Jobs created on Trino Pushdown connections.

Jobs

  • You can now create DQ Jobs on SAP HANA tables that contain the special characters ::$/-;@#%^&*?!{}~+=
  • Note The special characters .() are not supported.

Findings

  • When the dupelimit and dupelimiui limits on the Admin Limits page are both set to 30, the Findings page now limits the number of dupes findings marked on the Dupes tab to 30.

Alerts

  • If your organization uses multiple web pods on a Cloud Native deployment of Collibra Data Quality & Observability, you now receive only one alert email when an alert condition is met.

APIs

  • When dataset security is enabled on a tenant and a user whose roles meet the requirements of the Dataset Def API and has a their role assigned to the dataset, the API returns the expected results.

Fixes

Integration

  • When setting up an integration, the Connections step now has a search component and pagination to prevent table load failure when the tables per schema size exceeds browser memory.
  • The dimension type of Adaptive Rules (NULL, EMPTY, MIN, and so on) now correctly maps to the type and sub-type from the DQ dimension table.
  • The predicate of custom rules is again included in rule asset attributes.

Pushdown

  • You can again download CSV and JSON files containing rule break records of DQ Jobs created on Pushdown connections.
  • The rule name no longer appears in the data preview and downloaded rule breaks of rule findings for DQ Jobs created on Pushdown connections.
  • Note For consistency with rule break downloads in Pullup mode, we plan to separate rule breaks by rule in a future release. As of this release, Pullup mode still includes the rule name and runId in rule break downloads

  • DQ Jobs created on Pushdown connections where columns with shape values that contain $ or now run successfully. Previously, such Jobs failed with an unexpected exception message.
  • When running DQ Jobs created on Pushdown connections that scan multiple columns for duplicate values, the data of the columns now appears under the correct column name.
  • When a DQ Job created on Pushdown connection contains 0 rows of data and one or more rules are enabled, the rules are now included in the Job run and displayed on the Findings page.

Jobs

  • The Explorer connection tree now loads successfully when a schema contains tables that contain unsupported column types.
  • Dataset Overview on the Explorer page can now process the select * part of the SQL statement if there is an unsupported column type.
  • We fixed an issue on the Jobs page where Collibra Data Quality & Observability was unable to retrieve the Yarn Job ID.
  • When re-running a DQ Job from the metadata bar on DQ Job that previously ran with backrun (-br), the DQ Job that you re-run will no longer incorrectly initiate a backrun.
  • Note If the -br option is included in the beginning of your command line, your DQ Job will perform a backrun and -br will be removed from the command line when the DQ Job completes.

Findings

  • After retraining a behavioral finding to pass a value for a blindspot, the score now correctly reflects the retrained scoring model.

Profile

  • When adding a stat rule for distribution from the +Add Rule option on the Profile page, the computed boundaries of categorical variables in the distribution rule now display correctly.

Rules

  • When rules on Pullup datasets time out, the rule output record now displays the out-of-memory (OOM) exception message on the Findings page.
  • Run Result Preview on the Rule Workbench now works as expected for custom rules that use the simple rule template.
  • When creating a rule for an existing DQ Job created on a Pushdown connection, Run Result Preview now runs without errors.
  • You can now use rules that contain a $ (not stat rules) with profiling off for DQ Jobs created on Pushdown connections.

Admin Console

  • Admin Limits now require values to be -1, 0, or positive numbers. An inline message appears below the Value field when a limit does not meet the allowed values.

Latest UI

  • We improved the performance of the Scorecards page when there are many datasets to load.
  • We reduced the number of backend calls on the Profile and Findings pages to improve the load time performance.
  • You can now create, edit, and delete alerts for datasets with 0 rows. When you run a job on a dataset with 0 rows, the alerts function as expected.
  • Dataset Manager no longer crashes due to slow network calls when you click the Filter icon as the page loads.
  • We resolved multiple scenarios where the metadata bar did not display any rows or columns.
  • When hovering over a row on the Scheduler page, the row has a gray highlight, but the days of week cells remain green or white, depending on whether a DQ Job is scheduled to run on a given day.
  • The date picker on the Job tab of the Findings page is now available for DQ Jobs created on Pushdown connections and you can successfully run DQ Jobs with the dates you select.
  • The Findings and other pages now load correctly in Safari browsers.
  • DQ Jobs created on Pushdown connections no longer generate duplicate user-defined job failure alerts.
  • The donut charts in the database and schema reports from Explorer now consistently display the correct stats.

DQ Security

Release 2024.08

Release Information

  • Release date of Collibra Data Quality & Observability 2024.08: August 26, 2024
  • Publication dates:
    • Release notes: August 5, 2024
    • Documentation Center: August 9, 2024

Enhancements

Integration

  • When a job with an active integration with Collibra Data Intelligence Platform runs, the Job Log on the Jobs page in Collibra Data Quality & Observability reflects the details of each step of the integration.

Pushdown

  • You can now scan for both fuzzy and exact match duplicate records in Trino Pushdown jobs.
  • All Pushdown-compatible data sources now support the use of temporal datasets in stat rule statements, for example, “SELECT @t1.$rowcount AS yesterday, @dataset.$rowcount AS today WHERE yesterday <> today”

Connections

  • You can now create BigQuery Pullup jobs on a cross-project connection without manually updating the command line to prepend the projectId in the source query.
  • NetApp now has a dedicated connection tile under the Remote File Connections tab of the Add New Connection modal. Previously, to connect to a NetApp data source, you had to follow the Amazon S3 path and add the NetApp connection properties to the Properties tab.
  • You can now archive breaking records to NetApp locations.

Jobs

  • You can now execute command line queries that end with double quotes around the table name, for example, select * from "<SCHEMA>"."<TABLE>"

Profile

  • When you hover your cursor over the histogram on a dataset profile page, the upper quartile, median, and lower quartile statistics now display.

Findings

  • NULL values are now excluded from the calculation of duplicate values for Pullup jobs.
  • The indicator representing the number of findings for a given layer are now in the upper right corner of the associated layer.
  • The Actions button is now always visible at the far right side of the Adaptive Rules modal. Additionally, the Adaptive Rule types are now color-coded.
  • The chips in the Observations column of the Records tab are now color-coded.

Rules

  • The Copy Results and Download Results buttons from the Dataset Overview are now available on the Rule Workbench.

Alerts

  • We cleaned up the mailTemplate.html file within the dq-webapp to improve user experience.

Dataset Overview

  • The ability to preview data on the Dataset Overview now requires access to the connection upon which your job is based and at least one of the following roles:
    • ROLE_VIEW_DATA
    • ROLE_ADMIN
    • ROLE_CONNECTION_MANAGER

Scorecards

  • We shortened the height of the scorecard blocks to reduce the amount of time it takes to scroll the Scorecards page when multiple scorecard blocks are present.

Assignments

  • You can now use the Date Range filter on the Assignments Queue to sort and define a range of dates in the Update Ts (timestamps) column.

Dataset Manager

  • Admins can now bulk update the agent and Spark settings of Pullup datasets.

SQL Assistant for Data Quality

  • You can now see details of the Vertex AI model in the ‘About’ modal in the upper right corner of your Collibra Data Quality & Observability instance.

APIs

  • We aligned the role requirements for the Jobs and Alerts V3 APIs.
    • When dataset security and DB connection security are disabled, users have full access to the Jobs and Alerts endpoints.
    • When dataset security is enabled and DB connection security is disabled, users without a role assignment to a dataset cannot use the following endpoints referencing that dataset:
      • GET v3/jobs/{dataset}/{rundate}/logs
      • GET v3/jobs/{jobId}
      • GET v3/jobs/{jobId}/waitForCompletion
      • GET v3/jobs/{jobId}/logs
      • GET v3/jobs/{jobId}/findings
      • GET v3/jobs/{jobId}/breaks/shapes
      • GET v3/jobs/{jobId}/breaks/rules
      • GET v3/jobs/{jobId}/breaks/outliers
      • GET v3/jobs/{jobId}/breaks/dupes
      • GET v3/alerts/{dataset}
      • GET v3/alerts/{dataset}/{alertname}
      • GET v3/alerts/notifications
      • DELETE v3/alerts/{dataset}/{alertname}
    • When dataset security is enabled and DB connection security is disabled, users without a role assignment to a dataset can use the following endpoints:
      • GET v3/jobs
      • GET v3/alerts
    • The Job and Alert APIs honor dataset security by preventing access to the alert or job details when the user making the request does not have role access to the related dataset.
    • By design, the Job APIs only honor connection security related to job creation or execution actions.
    • Currently, connection security is not enforced for the Alert APIs.

Fixes

Connections

  • You can again edit datasets created on Remote File Connections.

Jobs

  • When the Case Sensitive and Exact Match options are not selected in the Dupes layer, jobs that run in Pullup mode now scan for all case-insensitive fuzzy match duplicates.
  • When using the Partial Scan option in the latest UI, you can now use the ‘select all’ checkbox option in the column header to select all columns when they contain unsupported data types, such as CLOB.
  • After setting up a partial scan of an Oracle dataset in the latest UI, the job now runs without error.
  • You can again run jobs on Oracle datasets where source-to-target mappings to Databricks connections are configured.
  • Columns now display on the Profile page when the Profile activity fails.
  • When adding a distribution rule on a column from the Profile page, the percentages are now correctly calculated based on the total number or rows.
  • You can again edit jobs based on temp files in Standalone deployments of Collibra Data Quality & Observability.

Reports

  • The Coverage Report now returns a maximum of 1 calendar year of the statistics of database-, schema-, and table-level jobs. If each level does not have an existing structure, the report returns a helpful error message.

Agent

  • When you select an option from the Master Default dropdown menu on the Edit Agent dialog of the Agent Configuration page, the correct value now displays based on your selection.

Integration

  • Schemas and tables now correctly map to Collibra Data Intelligence Platform assets when you automap them from the connection mapping step of the Integration Admin Console page
  • The total row count now correctly displays in the Loaded Rows field on the asset page after the integration of a Pushdown dataset.
  • The Run Job Again option is no longer visible on the View Monitoring modal of the At a glance sidebar for table assets of scheduled and non-scheduled jobs.
  • The scheduler in Collibra Data Quality & Observability that previously monitored for triggers sent from Collibra Data Intelligence Platform to run a job in Collibra Data Quality & Observability is now disabled and no longer scans for these inputs.

Pushdown

  • You can now see the histogram of BigQuery Pushdown jobs.
  • Snowflake Pushdown jobs with rules that reference secondary datasets with identical column names no longer return exceptions.
  • The Profile page now shows TopN and BottomN shape results for Snowflake Pushdown jobs.
  • The timestamp portion of the Run ID is now supported in Pushdown jobs that are configured to run on a schedule.
  • When Archive Break Records for Rules is enabled and the rules of a Pullup job includes at least one DATATYPECHECK rule, the Rules page now shows the correct statuses when the rules are copied to a new Pushdown job. Additionally, the DATATYPECHECK rules from the Pullup job do not copy to the Pushdown job, as DATATYPECHECK rules are not supported in Pushdown mode.

APIs

  • When using the POST v3/rules endpoint to add an inactive rule (isActive option set to 0) to a dataset, the rule is now added to the dataset correctly.

Latest UI

  • We improved the performance of the Explorer page so that schemas with many tables no longer lock the page in an unresponsive state while they load.
  • The errors that occurred when compiling a dataset source query using a date variable in Explorer are now resolved.
  • Dataset pages now load correctly when an invalid run ID is applied to a Pushdown job.
  • When editing a DQ Job on a Remote File Connection, the Compile button is now disabled and includes a note instructing you to edit the query from the command line instead.
  • The issues that prevented some users from editing certain datasets created from Remote File Connections are now resolved.
  • You can again edit DQ Jobs created on Temp File connections.
  • The Agent Master Default option on the Agent Configuration page now displays correctly.
  • You can now deselect unsupported column types when performing a partial scan in Explorer.
  • The row count in the job estimate now reflects the source query row count.
  • When using Validate Source, JDBC source connections mapped to Databricks connections no longer return errors.
  • We added labels to the Histogram portion of the Profile page that are available when you hover your cursor over the histogram.
  • Percentages for Quick Rules (Distribution) on the Profile page now display correctly.
  • Observations on the Record tab of the Findings page are now color-coded.
  • We fixed an issue with the Rule Workbench where the rule body became uneditable after loading a data type rule.
  • Unintended changes to out-of-the-box Sensitive Labels are now prevented.
  • File names of exports from the Schedule page now include date and timestamps.
  • The Coverage Report now returns connection-level metrics from the latest UI and APIs.
  • We cleaned up typos on the Completeness Report page.

DQ Security

Release 2024.07

Release Information

  • Release date of Collibra Data Quality & Observability 2024.07: July 30, 2024
  • Publication dates:
    • Release notes: June 24, 2024
    • Documentation Center: July 4, 2024

Highlights

Important 
As of this release, the classic UI is no longer available.

Important 
To improve the security of Collibra Data Quality & Observability, we removed the default keystore password from the installation packages in this release. In general, if your organization uses a SAML or SSL keystore, we recommend that you provide a custom keystore file. However, if your organization has a Standalone installation of Collibra Data Quality & Observability and plans to continue using the default keystore, please contact Support to receive the default password to allow you to install and successfully use Collibra Data Quality & Observability versions 2024.07 and newer. If your organization has a containerized (Cloud Native) installation of Collibra Data Quality & Observability, you can continue to leverage the default keystore file, as the latest Helm Charts have the default password set within the values.yaml file.

  • Integration
  • You can now select the data quality layers that will have corresponding assets created in Collibra Data Intelligence Platform either automatically upon a successful integration or only when a layer contains breaking records. By selecting individual layers instead of including all of them by default, this can help prevent an overwhelming number of assets from being created automatically.

    Admins can configure this on the Integration Setup wizard of the Integrations page in the Admin Console.
  • Admins can also map file and database views from Collibra Data Quality & Observability to corresponding assets in Collibra Data Intelligence Platform Catalog. This allows for out-of-the-box relations to be created between their file- or view-based Collibra Data Quality & Observability datasets and the file table and database view assets (and their columns) in Collibra Data Intelligence Platform.

    Note Collibra Data Intelligence Platform 2024.07 or newer is required for view support.

  • Pushdown
  • We're delighted to announce that Pushdown for SQL Server is now generally available!

    Pushdown is an alternative compute method for running DQ Jobs, where Collibra Data Quality & Observability submits all of the job's processing directly to a SQL data warehouse. When all of your data resides in the SQL data warehouse, Pushdown reduces the amount of data transfer, eliminates egress latency, and removes the Spark compute requirements of a DQ Job.

Enhancements

Explorer

  • From the Sizing step in Explorer, you can now change the agent of a Pullup job by clicking the Agent Details field and selecting from the available agents listed in the Agent Status modal.
  • You can now use custom delimiters for remote files in the latest UI.
  • When using the Mapping layer for remote file connections, you can now add -srcmultiline to the command line to remove empty rows from the source to target analysis.
  • We added a Schema tab to the Dataset Overview to display table schema details after clicking Run in the query box. (ticket #145266)
  • When the number of rows in a dataset exceeds 999, you can now hover your cursor over the abbreviated number to view the precise number of rows. (ticket #125753)

Rules

  • Break record previews for out-of-the-box Data Type rules now display under the Rules tab on the Findings page when present. (idea #DCC-I-2155)
  • When a dataset is renamed from the Dataset Manager, any explicit references to the dataset name for all primary and secondary rules are updated. (idea #DCC-I-2624)
  • You can now rename rules from the Actions dropdown menu on the Dataset Rules page. Updated rule names cannot match the name of an existing rule and must contain only alphanumeric characters without spaces.

Alerts

  • We added a new Rule Status alert to allow you to track whether your rule condition is breaking, throwing an exception, or passing.
  • When configuring Condition-type alerts, you now have the option to Add Rule Details, which includes the following details in the email alert (idea #DCC-I-732):
    • Rule name
    • Rule condition
    • Total number of points to deduct from the quality score when breaking
    • Percentage of records that are breaking
    • Number of breaking records
    • Assignment Status
  • Admins can now set a custom alert email signature from the Alerts page in the Admin Console. (idea #DCC-I-2400)

Profile

  • Completeness percentage is now listed on the dataset Profile page in the latest UI.

Findings

  • We added a Passing Records column to the rule break table under the Rules tab on the Findings page to show the number of records that passed a rule. This enhancement simplifies the calculation of the total number of records, both passing and breaking. Additionally, we renamed the Records column, Breaking Records. (idea #DCC-I-2223)
  • You can now download CSV files and copy signed links of rule break records on secure S3 connections, including NetApp, MinIO, and Amazon S3.

Scorecards

  • To improve page navigation, we added a dedicated Add Page button to the top right corner of the Scorecards page and moved the search field from the bottom of the scorecards dropdown menu to the top.

Reports

  • When dataset security is enabled, the only datasets and their associated data that display on the Dataset Dimension and Column Dimension dashboards are the ones to which users have explicit access.

Jobs

  • When exporting job logs, the names of the export files of job logs exported in bulk from the Jobs page begin with a timestamp representing when the file was downloaded. Additionally, the names of the export files of job logs of individual jobs begin with a timestamp representing the UpdateTimestamp and includes the first 25 characters of the dataset name.

Dataset Manager

  • Admins can now update dataset hosts in bulk by selecting Bulk Manage Host from the Bulk Actions dropdown menu and specifying a new host URL.

Integration

  • When datasets do not have an active Collibra Data Intelligence Platform integration, you can now enable dataset integrations in bulk from the Dataset Manager page by clicking the checkbox option next to the datasets you wish to integrate, then selecting Bulk Enable Integration from the Bulk Actions dropdown menu.
    • Additionally, when datasets have an active integration, you can now submit multiple jobs to run by selecting Bulk Submit Integration Jobs from the Bulk Actions dropdown menu.
  • We improved the score calculation logic of Data Quality Rule assets.
  • When viewing Rule assets and assets of data quality layers in Collibra Data Intelligence Platform, the Rule Status now displays either Passing, Breaking, Learning, or Suppressed. Previously, rules and data quality layers without any breaks displayed an Active status, but that is now listed as Passing. (ticket #137526)

Pushdown

  • You can now scan for exact match duplicates in BigQuery Pushdown jobs.
  • Note An enhancement to enable scanning for fuzzy match duplicates in BigQuery Pushdown jobs is planned for Collibra Data Quality & Observability 2024.10.

  • You can now scan for shapes and exact match duplicates in SAP HANA Pushdown jobs. Additionally, we’ve added the ability to archive duplicates and rules break records to the source SAP HANA database to allow you to easily identify and take action on data that requires remediation.

SQL Assistant for Data Quality

  • We added AI_PLATFORM_PATH to the Application Configuration Settings to allow Collibra Data Quality & Observability users who do not have a Collibra Data Intelligence Platform integration to bypass the integration path when this flag is set to FALSE.
    • When set to TRUE (default), code will hit the integration or public proxy layer endpoint.
    • When set to FALSE, code will bypass the integration path.

Identity Management

  • We removed the Add Mapping button from the AD Security Settings page in the Admin Console.
  • When dataset security is enabled and a dataset does not have a previous successful job run, users without explicit access to it will not see it when they use the global search to look it up.

APIs

  • You must have ROLE_ADMIN to update the host of one or many datasets using the PATCH /v3/datasetDefs/batch/host call. Any updates to the hosts of datasets are logged in the Dataset Audit Trail.
  • By using the Dataset Definitions API, admins can now manage the following in bulk:
    • Agent
      • PATCH /v3/datasetDefs/batch/agent
    • Host
      • PATCH /v3/datasetDefs/batch/host
    • Spark settings
      • PATCH /v3/datasetDefs/batch/spark

Platform

  • To ensure security compliance for Collibra Data Quality & Observability deployments on Azure Kubernetes Service (AKS), we now support the ability to pass sensitive PostgreSQL Metastore credentials to the Helm application through the Kubernetes secret, --set global.configMap.data.metastore_secret_name. Further, you can also pass sensitive PostgreSQL Metastore credentials to the Helm application through the Azure Key Vault secret provider class object, --set global.vault.enabled=true --set global.vault.provider=akv --set global.vault.metastore_secret_name.

Fixes

Connections

  • You can now create a Snowflake JDBC connection with a connection URL containing either a double quote or curly bracket, or a URL encoded version of the same, for example, {"tenant":"foo","product":"bar","application":"baz"}. (ticket #148348)

Findings

  • We fixed an issue which prevented sorting in the Records column for datasets with multiple rule outputs and different row count values. (ticket #142362)
  • We resolved a misalignment of Outlier column values in the latest UI. (ticket #146163)

Reports

  • The Column Dimension and Dataset Dimension reports now display an error message when users who do not have ROLE_ADMIN attempt to access them. (ticket #141397)

Integration

  • We improved the error handling when retrieving RunId for datasets. (ticket #145040, 146724, 148232)

Latest UI

  • When a user does not have any assigned roles, an empty chip no longer displays in the Roles column on the User Management page in the Admin Console.
  • We removed the Enabled and Locked columns from the AD Mapping page in the Admin Console, because they do not apply to AD users.
  • A warning message now appears when you attempt to load a temp file and the Temp File Upload option is disabled on the Security Settings page in the Admin Console.
  • You can now sort the Records column of the Rules tab on the Findings page.
  • Outlier findings now display in the correct column.
  • The section of the application containing Scorecards, List View, Assignments, and Pulse View is now called Views.
  • The links on the metadata bar now appear in a new order to better reflect the progression of how and when they are used.
  • You can now change the agent of a dataset during the edit and clone process.
  • If schemas or tables fail to load in Explorer, an error message now appears.

DQ Security