Release 2024.08

Release Information

  • Release date of Collibra Data Quality & Observability 2024.08: August 26, 2024
  • Publication dates:
    • Release notes: August 5, 2024
    • Documentation Center: August 9, 2024

Enhancements

Integration

  • When a job with an active integration with Collibra Platform runs, the Job Log on the Jobs page in Collibra Data Quality & Observability reflects the details of each step of the integration.

Pushdown

  • You can now scan for both fuzzy and exact match duplicate records in Trino Pushdown jobs.
  • All Pushdown-compatible data sources now support the use of temporal datasets in stat rule statements, for example, “SELECT @t1.$rowcount AS yesterday, @dataset.$rowcount AS today WHERE yesterday <> today”

Connections

  • You can now create BigQuery Pullup jobs on a cross-project connection without manually updating the command line to prepend the projectId in the source query.
  • NetApp now has a dedicated connection tile under the Remote File Connections tab of the Add New Connection modal. Previously, to connect to a NetApp data source, you had to follow the Amazon S3 path and add the NetApp connection properties to the Properties tab.
  • You can now archive breaking records to NetApp locations.

Jobs

  • You can now execute command line queries that end with double quotes around the table name, for example, select * from "<SCHEMA>"."<TABLE>"

Profile

  • When you hover your cursor over the histogram on a dataset profile page, the upper quartile, median, and lower quartile statistics now display.

Findings

  • NULL values are now excluded from the calculation of duplicate values for Pullup jobs.
  • The indicator representing the number of findings for a given layer are now in the upper right corner of the associated layer.
  • The Actions button is now always visible at the far right side of the Adaptive Rules modal. Additionally, the Adaptive Rule types are now color-coded.
  • The chips in the Observations column of the Records tab are now color-coded.

Rules

  • The Copy Results and Download Results buttons from the Dataset Overview are now available on the Rule Workbench.

Alerts

  • We cleaned up the mailTemplate.html file within the dq-webapp to improve user experience.

Dataset Overview

  • The ability to preview data on the Dataset Overview now requires access to the connection upon which your job is based and at least one of the following roles:
    • ROLE_VIEW_DATA
    • ROLE_ADMIN
    • ROLE_CONNECTION_MANAGER

Scorecards

  • We shortened the height of the scorecard blocks to reduce the amount of time it takes to scroll the Scorecards page when multiple scorecard blocks are present.

Assignments

  • You can now use the Date Range filter on the Assignments Queue to sort and define a range of dates in the Update Ts (timestamps) column.

Dataset Manager

  • Admins can now bulk update the agent and Spark settings of Pullup datasets.

SQL Assistant for Data Quality

  • You can now see details of the Vertex AI model in the ‘About’ modal in the upper right corner of your Collibra Data Quality & Observability instance.

APIs

  • We aligned the role requirements for the Jobs and Alerts V3 APIs.
    • When dataset security and DB connection security are disabled, users have full access to the Jobs and Alerts endpoints.
    • When dataset security is enabled and DB connection security is disabled, users without a role assignment to a dataset cannot use the following endpoints referencing that dataset:
      • GET v3/jobs/{dataset}/{rundate}/logs
      • GET v3/jobs/{jobId}
      • GET v3/jobs/{jobId}/waitForCompletion
      • GET v3/jobs/{jobId}/logs
      • GET v3/jobs/{jobId}/findings
      • GET v3/jobs/{jobId}/breaks/shapes
      • GET v3/jobs/{jobId}/breaks/rules
      • GET v3/jobs/{jobId}/breaks/outliers
      • GET v3/jobs/{jobId}/breaks/dupes
      • GET v3/alerts/{dataset}
      • GET v3/alerts/{dataset}/{alertname}
      • GET v3/alerts/notifications
      • DELETE v3/alerts/{dataset}/{alertname}
    • When dataset security is enabled and DB connection security is disabled, users without a role assignment to a dataset can use the following endpoints:
      • GET v3/jobs
      • GET v3/alerts
    • The Job and Alert APIs honor dataset security by preventing access to the alert or job details when the user making the request does not have role access to the related dataset.
    • By design, the Job APIs only honor connection security related to job creation or execution actions.
    • Currently, connection security is not enforced for the Alert APIs.

Fixes

Connections

  • You can again edit datasets created on Remote File Connections.

Jobs

  • When the Case Sensitive and Exact Match options are not selected in the Dupes layer, jobs that run in Pullup mode now scan for all case-insensitive fuzzy match duplicates.
  • When using the Partial Scan option in the latest UI, you can now use the ‘select all’ checkbox option in the column header to select all columns when they contain unsupported data types, such as CLOB.
  • After setting up a partial scan of an Oracle dataset in the latest UI, the job now runs without error.
  • You can again run jobs on Oracle datasets where source-to-target mappings to Databricks connections are configured.
  • Columns now display on the Profile page when the Profile activity fails.
  • When adding a distribution rule on a column from the Profile page, the percentages are now correctly calculated based on the total number or rows.
  • You can again edit jobs based on temp files in Standalone deployments of Collibra Data Quality & Observability.

Reports

  • The Coverage Report now returns a maximum of 1 calendar year of the statistics of database-, schema-, and table-level jobs. If each level does not have an existing structure, the report returns a helpful error message.

Agent

  • When you select an option from the Master Default dropdown menu on the Edit Agent dialog of the Agent Configuration page, the correct value now displays based on your selection.

Integration

  • Schemas and tables now correctly map to Collibra Platform assets when you automap them from the connection mapping step of the Integration Admin Console page
  • The total row count now correctly displays in the Loaded Rows field on the asset page after the integration of a Pushdown dataset.
  • The Run Job Again option is no longer visible on the View Monitoring modal of the At a glance sidebar for table assets of scheduled and non-scheduled jobs.
  • The scheduler in Collibra Data Quality & Observability that previously monitored for triggers sent from Collibra Platform to run a job in Collibra Data Quality & Observability is now disabled and no longer scans for these inputs.

Pushdown

  • You can now see the histogram of BigQuery Pushdown jobs.
  • Snowflake Pushdown jobs with rules that reference secondary datasets with identical column names no longer return exceptions.
  • The Profile page now shows TopN and BottomN shape results for Snowflake Pushdown jobs.
  • The timestamp portion of the Run ID is now supported in Pushdown jobs that are configured to run on a schedule.
  • When Archive Break Records for Rules is enabled and the rules of a Pullup job includes at least one DATATYPECHECK rule, the Rules page now shows the correct statuses when the rules are copied to a new Pushdown job. Additionally, the DATATYPECHECK rules from the Pullup job do not copy to the Pushdown job, as DATATYPECHECK rules are not supported in Pushdown mode.

APIs

  • When using the POST v3/rules endpoint to add an inactive rule (isActive option set to 0) to a dataset, the rule is now added to the dataset correctly.

Latest UI

  • We improved the performance of the Explorer page so that schemas with many tables no longer lock the page in an unresponsive state while they load.
  • The errors that occurred when compiling a dataset source query using a date variable in Explorer are now resolved.
  • Dataset pages now load correctly when an invalid run ID is applied to a Pushdown job.
  • When editing a DQ Job on a Remote File Connection, the Compile button is now disabled and includes a note instructing you to edit the query from the command line instead.
  • The issues that prevented some users from editing certain datasets created from Remote File Connections are now resolved.
  • You can again edit DQ Jobs created on Temp File connections.
  • The Agent Master Default option on the Agent Configuration page now displays correctly.
  • You can now deselect unsupported column types when performing a partial scan in Explorer.
  • The row count in the job estimate now reflects the source query row count.
  • When using Validate Source, JDBC source connections mapped to Databricks connections no longer return errors.
  • We added labels to the Histogram portion of the Profile page that are available when you hover your cursor over the histogram.
  • Percentages for Quick Rules (Distribution) on the Profile page now display correctly.
  • Observations on the Record tab of the Findings page are now color-coded.
  • We fixed an issue with the Rule Workbench where the rule body became uneditable after loading a data type rule.
  • Unintended changes to out-of-the-box Sensitive Labels are now prevented.
  • File names of exports from the Schedule page now include date and timestamps.
  • The Coverage Report now returns connection-level metrics from the latest UI and APIs.
  • We cleaned up typos on the Completeness Report page.

DQ Security