About data quality rules

Data quality rules are rules that calculate the quality of a certain asset based on a predefined aggregation path and metrics.

The results of a data quality rule are available in a data quality dashboard via the Quality tab on the asset page of the asset for which you calculated its data quality. Data quality rules define for which assets the data quality dashboard is created and how the data quality values are aggregated. A data quality result aggregates values that have been collected over time on attributes and that are aggregated from different assets along a number of predefined relations.

Example 

The example shows the data quality rule Default Insurance data quality rule for business term and is explained in more detail in the following sections.

Fields

Name and Description

The data quality rule consists of a unique name and a description that is shown in the Data quality rules table in the DGC Settings and in the assignments of an asset type.

Path

The data quality rule aggregates values that are collected through a defined aggregation path. An aggregation path consists of a chain of relations that is to be followed from the asset to which the data quality rule is assigned, to the asset containing the actual values.

In the example above, values from 'Governance Assets' are aggregated for 'Business Assets' by looking up:

  • The 'Data Assets' that these business assets are represented by.
  • The 'Governance Assets' that these data assets are governed by.

Categorization

Select a relation in the categorization field to create subscores for assets at the end of the aggregation path that have this relation. The data quality dashboard on the asset's page then shows these subscores. These subscores are attributes belonging to the asset that has a relation to the asset at the end of the aggregation path.

The data quality dashboard also shows subscores limited to certain dimensions, such as 'Accuracy'. The values of an asset at the end of the aggregation path are only taken into account for these subscores if the asset 'belongs' to the given dimension. An asset 'belongs' to a dimension when it has a relation of the type defined in the Categorization to that dimension. In the given example, the 'Data Quality Rule' should have a relation Classified by to, for example, the 'Accuracy' Data Quality Dimension.

Metrics

The metrics of a data quality rule define the values that are displayed in the data quality dashboard and which operation should be used when aggregating the value. These values are attributes that are available on the last asset of the aggregation path, in this example the 'Governance Asset'.

This section contains a couple of metrics that are fixed for each metrics group:

Name

Operation

Description

Rows Passed

Total

The aggregated sum of passing rows.

Rows Failed

Total

The aggregated sum of failing rows.

Result

Logical AND

The aggregated logical and of the result: failing or passing.

Passing Fraction

Average

The aggregated average of the passing fraction (quality score).

You can add extra metrics and their corresponding operation by clicking Add Metric.