About archive break records

Archive break records allows you to store data quality rule violations to a storage location in your connection. This feature enables you to track, analyze, and remediate data quality issues. It preserves detailed information about which rows fail your rules, allowing you to improve quality over time.

When you run data quality jobs, archived break records provide you with:

  • Detailed failure tracking: Know exactly which rows fail your rules and why.
  • Historical analysis: Track data quality trends and improvements over time.
  • Remediation support: Access the specific information needed to fix source data issues.
  • Enhanced reporting: Build comprehensive BI dashboards.

For a complete list of data sources that support archive break records, go to Supported data sources for Data Quality & Observability.

How break record archival works

You configure archive break records at the connection level. When enabled, Collibra automatically stores break records in a specified location within your source database whenever a data quality job identifies rule violations.

diagram of break record storage architecture

  1. Data Quality & Observability component
    • Includes the Data Quality & Observability sections of the Collibra Platform UI and REST APIs, where you can run Data Quality Jobs, view results, and so on.
    • You can enable previews of sensitive data, or break records, from the Collibra Platform UI. These previews require live queries to the data source. Sensitive data is never stored in Collibra Cloud.
  2. Data Quality Job results and aggregated metadata are securely stored in a schema within Collibra Platform.
  3. You can optionally integrate Data Quality & Observability with Data Catalog and the Knowledge Graph.
  4. When performing certain actions that require a connection to data within your data source, all requests and responses are securely routed through the Edge management component.
  5. The Edge site can be installed on either bundled k3s or a managed Kubernetes cluster.
  6. Edge sites
    • You may have one or more Edge sites.
    • Edge manages all connections and authentication to your data sources. This is the only component that must be installed and maintained within your environment.
    • You can optionally host your Edge site within Collibra.
  7. Customer's data source
    • Breaking records from your job are archived to your database. You can choose a database and schema within the data source being monitored for data quality to archive break records.
    • You may choose to enable user previews of break records from the application UI. This requires sensitive data to be transmitted through the Collibra environment to the user's browser, but it is never stored in the Collibra environment.

Prerequisites

To archive break records, you need the following permissions and configurations:

Requirement Details
Collibra permissions
  • To configure your connection, you have a global role with the Data Quality > Manage Data Sources global permission.
  • To view and download break records, you have Preview Rule Break Records resource permission.
Data source privileges for break records

Because exact permission models vary significantly across different data platforms, consult your organization's database administrator to provision the exact permissions.

The service account connecting to your data source requires the following generalized privileges:

On the source data (to monitor records):

  • Read access (SELECT) on the specific tables and views you intend to monitor.

On the destination schema (to archive break records):

  • Create access (CREATE TABLE and CREATE VIEW) in the designated archive schema.
  • Note You only need create access once per break records table, or you can create it manually with explicit update access. All monitors write to the same break records table, so create access can be safely revoked after initial setup. If the break records storage location changes, create access is required again to generate a new table.
  • Modify access (ALTER TABLE and ALTER VIEW) to update schema structures, such as adding new columns.
  • Drop access (DROP VIEW) to clean up or replace views as needed.
  • Read access (SELECT) on the newly created tables and views.
  • Write/edit access (INSERT, UPDATE, and DELETE) on those created tables to actively manage the archived records over time.
Connection configuration

You have an established connection.

Archive locations

The system provides two storage location options:

  • Same location as job: Break records are stored in the same schema where your data quality job runs.
  • Specify a location: You select a specific database and schema from existing options in your connection.

Break record previews adapt based on your configuration. When the archive break records setting is enabled, previews on the Job Details page display data from the archived records. When disabled, previews show the live results of the data quality rule.

After you select the storage location, Collibra creates the following tables in the database to store the break records:

Database tables and views Description
COLLIBRA_DQ_BREAK_RECORDS

Provides a user-friendly view of break records based on the COLLIBRA_DQ_RULES table. Use this view to analyze and interpret rule violations.

COLLIBRA_DQ_BREAKS

Stores internal tracking data and the database query used for troubleshooting.

COLLIBRA_DQ_RULES Stores raw break record data for internal processing.

A breakdown of the COLLIBRA_DQ_BREAK_RECORDS view

The COLLIBRA_DQ_BREAK_RECORDS view is the primary source for interacting with break records in your data source.

Column name Data type Description
job_name varchar The name of the data quality job.
run_uuid varchar The unique ID of the job run.
run_date timestamp The date of the job run.
updt_ts timestamp The date and time the job ran.
monitor_name varchar The name of the rule monitor.
monitor_type varchar The type of monitor, for example, "Rule."
description varchar The description of the rule monitor.
details varchar Contains the JSON output of rule results, including column names and values for all columns in the rule query.

Example: Data quality remediation workflow

A data steward at a financial services company runs weekly data quality checks on customer transaction data stored in Snowflake. When the job identifies 150 records with invalid account numbers, the archived break records feature stores these violations in a dedicated schema.

The data steward can then:

  • Review the specific rows that failed validation.
  • Export the break records to share with the source system team.
  • Track remediation progress by comparing break records across job runs.
  • Generate monthly reports showing data quality improvements.

Over time, the archived break records create a comprehensive audit trail that demonstrates the organization's data quality improvements to regulatory auditors.

What's next

Configuring archive break records