Overview: Core concepts

Data Quality & Observability offers many out-of-the-box tools to assess the quality of your data and help you gain confidence in it. This topic provides a high-level overview of the foundational components and terminology used across the application, organized by the standard data quality implementation lifecycle.

Data quality basics

These foundational concepts drive the core of Data Quality & Observability.

Concept Description
Data Quality Job

A configuration that regularly monitors specific data to identify issues before they affect your business. While not a formal catalog asset, jobs are presented similarly in the UI and empower you to conduct immediate and scheduled checks, create data profiles, apply automatic monitoring, and trigger automated notifications. Every job begins with a scope query to define exactly which data will be evaluated.

Data profiling Detailed behavioral analysis that evaluates the structure of your data and trends over time. Profiling includes column-level statistics such as median, minimum and maximum, and null counts.
Monitoring Out-of-the-box or user-defined SQL queries that actively check your data for anomalies. Examples include row count monitors, schema change detection, and data type checks.
Data quality score An aggregated percentage (0–100) that summarizes the integrity of your data. Scores are categorized into tiers: Passing (90–100%), Warning (76–89%), and Failing (0–75%).
Dimensions Categories used to group data quality findings and communicate the specific type of issue detected. Standard dimensions include Accuracy, Completeness, Consistency, Integrity, Validity, and Duplication.

Phase 1: Preparation and architecture

Before running jobs, administrators must configure secure connections to Edge or Collibra Cloud sites, determine processing methods, and user access.

Concept Description
Edge or Collibra Cloud sites, connections, and capabilities

An Edge or Collibra Cloud site is a secure management component installed in your environment that handles authentication. Connections are configurations on Edge that link Collibra to your data sources, such as Databricks, Snowflake, and SAP HANA. To run jobs on data from your Edge connections, administrators must assign a specific Data Quality capability (Pushdown or Pullup processing) to the connection.

For a complete list of data sources available in Data Quality & Observability, go to Supported data sources for Data Quality & Observability.

Processing methods

In Pushdown processing mode, jobs execute directly inside a compatible data source. This ensures that data stays within your data source and processes efficiently. Conversely, Pullup processing mode uses an Apache Spark engine on the Edge site to process data. This method is ideal for processing massive data volumes and for sources lacking scalable native compute.

Pushdown-compatible data sources use the Data Quality Pushdown Processing capability, whereas Pullup connections use Data Quality Pullup Processing.

Roles and permissions Roles and permissions are security controls. They ensure users have the appropriate access to your Data Quality & Observability environment. Global roles grant system-wide capabilities , while resource roles dictate the specific access levels users have to individual jobs and connections.
Limits and dimensions settings

Limits are global administrative thresholds that control scoring and data processing. They include settings for maximum concurrent jobs, adaptive rule lookback limits, custom score thresholds, and Spark compute sizing. Many of these settings apply specifically to Pullup jobs, ensuring they have the necessary Spark compute resources to run consistently and efficiently.

Dimensions settings allow you to customize how data quality findings are categorized. You can review out-of-the-box dimensions or create custom dimensions. You can the map them to specific data quality monitors to match your organization's terminology.

Phase 2: Schema-level discovery

The first step in observability is gaining a rapid, high-level impression of the health of your data across entire schemas.

Concept Description
Quick monitoring Quick monitoring is a fast deployment method that generates basic Data Quality Jobs across an entire schema in one click, generating foundational profile data before you build complex rules. This is the recommended starting place for achieving basic, schema-level profiling and observability. It returns baseline descriptive statistics, such as minimum, maximum, median, and quartiles, to provide an immediate, initial snapshot of your data values.

Phase 3: Deploying targeted monitoring

For in-depth monitoring, deploy targeted table-level jobs and configure custom business logic.

Concept Description
Monitoring overview The monitoring overview provides a centralized dashboard where you can view all available data sources, schemas, and tables to understand your current monitoring coverage at a glance.
Job Details page The Job Details page provides a comprehensive view of a specific job's execution history, displaying profile data, data quality monitors, and emerging data trends over time.
Out-of-the-box and custom monitors Out-of-the-box monitors (adaptive rules) automatically learn expected behavioral trends over time. User-defined monitors (custom rules) are user-defined SQL queries designed to enforce specific business logic.
Rule workbench The rule workbench provides a dedicated workspace where you can write, format, validate, and test user-defined and AI-generated SQL queries. Within this interface, you can preview sample results to ensure the rule returns the expected outputs before a job runs, add secondary filter queries, and configure rule-specific settings like data quality dimensions, tolerances, and automated notifications.

Phase 4: Reporting, scoring, and governance

How raw monitoring results roll up into business-level reporting, alerts, and end-to-end governance.

Concept Description
Notifications Notifications are automated alerts sent to targeted users or groups when a custom rule passes, breaks, or throws an execution exception, allowing teams to take immediate action on anomalies. Notifications are available via the Notification Center, as well as via the Slack and Microsoft Teams integrations.
Aggregation paths Aggregation paths are the chains of relations that define how raw data quality monitor results automatically roll up to calculate the overall data quality score for higher-level logical assets, like a Business Term.
Quality tab The Quality tab contains the health dashboard of an asset in the Data Catalog. It shows a list of active monitors, a score history chart, and aggregated quality score ring charts. It communicates data health via color-coded statuses (Green for Passing, Orange for Warning, Red for Failing) across specific dimensions, helping you confirm if data is trustworthy for decision-making. It can even show scores brought in from external, third-party quality tools when external data quality tools are enabled.