Monitoring data quality in assets
Business Analysts, Data Product Managers, and Governance Managers use the Quality tab to monitor the health of assets. The Quality tab shows a list of monitors and jobs for the asset, the overall data quality score and its history, as well as data quality score ring charts for each data quality dimension. This helps you confirm the data is trustworthy. You can then confidently use it in your reports and decision-making processes.
In this topic, you will learn:
- Why the Quality tab is a necessary tool for assessing the health of assets.
- How to interpret the quality scores.
- A use case for the Quality tab.
Choose an option below to explore the documentation for the latest user interface (UI) or the classic UI.
The Quality tab: the health dashboard of your assets
The Quality tab is the primary source for data quality insights on asset pages. It shows:
- A score history chart showing the evolution of the data quality score of the asset.
- The aggregated score of the asset.
- The aggregated scores of individual dimensions linked to the asset.
The values on the Quality tab come from aggregated asset data. This aggregation relies on the chain of relations defined in quality score aggregations. Admins can configure these on the Quality score aggregations tab of the Operating model page.
The Quality tab contains two sections: Collibra Data Quality and External Data Quality. This allows Collibra to capture the source of your data quality scores and organize them appropriately.
These sections show different information depending on the source tool from which your data quality scores originate. The source tool can be Collibra Data Quality & Observability, a third-party data quality tool, or both.
- Collibra Data Quality: This section shows data quality insights from Collibra Data Quality & Observability.
- External Data Quality: This section shows data quality insights from third-party data quality tools, including Collibra Data Quality & Observability Classic.
Regardless of the tool you use, a data quality health dashboard is available when an asset has active data quality monitors. This dashboard shows a score history chart, quality overview table, and quality score ring charts for the asset and its monitors. This dashboard gives you a high-level view of the health of your data. It also provides an entry point for further investigating any data quality issues that may arise.
Quality score history chart
The quality score history chart shows the historical data quality scores of the asset. Each point on the chart represents the quality score on a given date. The information in the chart varies depending on the data quality score source. It shows a monthly aggregated score for Collibra Data Quality and a 7-day history for external data quality tools. This helps you track changes and trends to data accuracy, consistency, and completeness. You can also view this as a list.
Quality score tiles
Quality score tiles display ring charts for the asset overview or specific dimensions. Each tile shows:
- The score (out of 100).
- The status color.
- The number of passing monitors.
When you open the Quality tab, the Overview tile and all monitors are shown. When you click a dimension tile, the quality overview table shows its related monitors.
Quality overview table
The table shows the following information:
- Name: The name of the schema, table, column, or job. When you click the name of a job, you are directed to the Monitors tab of the Job Details page, with the monitor actions drawer for the selected monitor open.
- Type: The database object or monitor type.
- State: The state of the monitor, including Breaking, Passing, Learning, and Suppressed.
- Score: The data quality score aggregated, as an average of the quality scores of the underlying schema, table, or column assets.
- Last updated: The date and time of the last run in MM/DD/YYYY, hh:mm:ss AM/PM format.
- Dimensions: The data quality dimensions associated with a monitor.
Interpreting quality scores
The Quality tab presents scores using ring charts. These charts represent aggregated data quality scores for the overall score and out-of-the-box and custom data quality dimensions, which can indicate the health of your data.
The following colors indicate the quality status:
| Color | Range | Status | Description |
|---|---|---|---|
| Green | 85-100% | Passing | The score is above the passing threshold. |
| Orange | 50-85% | Warning | The score is below the acceptance threshold but above the warning threshold. |
| Red | 0-50% | Failing | The score is below the warning threshold. |
Use case: Enforcing data contracts with quality insights
Consider a scenario where a Governance Manager at a global financial institution requires that the "Customer_Transactions" Column asset adheres to a specific data contract. The Service Level Agreement (SLA) for this contract requires that the data must maintain a global quality score above 90% to be considered fit for use in regulatory reporting.
To verify compliance, the Governance Manager monitors the Quality tab. This view aggregates scores based on the chain of relations defined in the data quality rules.
The manager interprets the score history and ring charts to determine the status of the SLA:
- Passing the SLA: Overall ring chart scores in Green (85-100%) indicate a passing status. These scores signal that the asset is trustworthy and meets the acceptance threshold for the data contract.
- Breaching the SLA: Scores in Orange (50-85%) or Red (0-50%) indicate a warning or failing status. These scores signal a potential SLA breach, prompting the Governance Manager to investigate specific failing monitors.
Asset quality
The Quality tab of an asset shows the aggregated passing fraction (quality score) for the asset in the form of ring charts.
Each ring chart shows the quality score in the form of:
- A quality score as a percentage.
- A color code indicating the quality of this passing fraction:
- Red: 0-50%
- Orange: 50-85%
- Green: 85 - 100%
The first ring chart shows the general score of the asset. The ring charts next to it shows subscores for a specific dimension, such as Accuracy, Conformity, Completeness, and Consistency. Only values that belong to that specific dimension are then considered. The dimensions to use are configured in the metric group. In this example, it is the relation: Data Quality Rule is Classified By Data Quality Dimension.
Underneath the top pane, three selection boxes show additional overview, details, and history.
Overview
The Overview pane shows more information about each level in the aggregation path for the selected general score or dimension. For each level, it shows the number of involved assets of a certain type and what their results are: failing (red) or passing (gray). It also shows the total number of rows, the number of failing rows (red), and the number of passing rows (gray) that resulted in the given scores.
In the following example, the Conformity dimension consists of a total of 38070 rows, 26575 of which were failing. 2 data quality rules were involved, 1 of which was failing. These data quality rules were used by 1 data entity, which has an aggregated failing result.
Details
The Details pane shows more information about all the involved assets in a tabular format.
For each asset, a row with the following default columns is shown:
- Data Asset: Data asset signifier.
- Rows Passed: Number of passing rows, aggregated as a sum of the passing rows of the underlying assets.
- Rows Failed: Number of failing rows, aggregated as a sum of the failing rows of the underlying assets.
- Quality Score: Score aggregated, as an average of the quality scores of the underlying assets.
- Result (failing or passing): Aggregated result, as a logical conjunction of the results of the underlying assets.
You can show some extra columns in the table by clicking
→ Columns. These include:
- Data Element: Unique full name of the asset.
- Domain: Domain to which the asset belongs.
- Type: Type of the asset.
- Quality Score: Aggregated value between 0 and 100 that represents a summary of the integrity of your data.
If you use an external data quality tool (anything other than Data Quality & Observability) and the Quality Score calculation is incorrect because a null/empty value is treated as a zero, then you can disassociate the data quality rule or metric from the asset. To disassociate the data quality rule or metric from the asset:
-
Navigate to the Summary tab for the asset.
-
Scroll to the section that displays the relationship between the asset and the rule or metric. For example, if the asset is of type Data Quality Job, then scroll to the “is governed by Data Quality Rule” section.
-
Locate the rule or metric with a missing Passing Fraction value.
-
In the Actions column, click
next to the rule or metric to remove the association.
- Dimensions: Dimension that applies to these assets, if any. Dimensions are used to calculate subscores.
History
The History pane shows the evolution of the quality score over the last 7 days of data.
You can show the date and the score for a specific period at the upper-right corner of the pane by hovering your pointer over that period. When you select a period by clicking it, the upper-left corner of the pane shows a trend of the score compared to the period before it.
- Create a quality score aggregation to control how data quality scores are shown in the Quality tab.
- Open the Job Details page for a closer look at the quality of your data.
- Open the rule workbench to add a custom rule to enforce specific business logic that adaptive rules cannot predict.