Exploring data quality findings

When Collibra DQ observes a change or anomaly, it records the finding for each dimension and includes a numeric indicator next to the tab where it detects the issue. Drill down into the various data quality dimension tabs to better understand the quality and reliability of your dataset.

This section shows the available dimensions and what they mean when you drill into a specific finding.

Behavior

Collibra DQ learns from column-level profiling to create AdaptiveRules, which contribute to the overall Behavior score. AdaptiveRules are rules that automatically observe and adapt to changes in numeric representations of data over time and downscore any values outside defined boundaries.

Behavioral anomalies most commonly appear when you use behavior lookback or Replay, which allows Collibra DQ to learn the behavior of a dataset over a period of time.

The following table describes the information available on the Behaviors tab of the Findings page.

Column Description
Blind Spot The column where a behavioral anomaly is detected.
Type

The type of AdaptiveRules that the behavioral model detects from the profiling activity on a given column.

The following table shows the possible AdaptiveRules:

AdaptiveRules Description
Row Count

Row count change in your table.

Loading Time

Loading time changes.

Uniqueness

Cardinality changes to a column within the range of previous DQ Jobs.

Null Values

Null values detected in a column.

Empty Values

Empty data detected in a column.

Min

Columns with min values outside the normal range.

Mean

Columns with mean values outside the normal range.

Max

Columns with max values outside the normal range.

Data Type Check (Integer, String, Date) Columns that shift from one data type to another.
Schema Change (Add, Alter, Delete) Schema evolution changes, such as columns that are added or dropped from a data set.
Baseline The mean of the preceding number of scans determined by the Data Lookback value on the Settings modal in Explorer.
Today The value of the behavioral observation on the day that Collibra DQ detects it.
% Change

The percent change from the value of one row to another.

percent change formula

Δ % Change

The delta percent change from the value of one row to another. Delta percent change does not apply to absolute baselines, such as min, max, and mean.

delta percent change formula

Note Delta percent change is only available for volume-weighted metrics, such as null, empty, and shift.

Zscore The number of standard deviations away from the expected baseline value.
Description The description of the type of DQ check performed on a given column.
Score The value subtracted from your overall DQ score. The distance from the expected ranges set by the variance and boundaries of the baseline value. Expected ranges are also visible in the AR panel with graphs available in the Details panel for each line item.
Action The Item Labeling you can apply to an observation that let you train the behavioral model on future runs. Available options are Validate, Invalidate, and Resolve.
Status The status of your data quality item, for example, Observation.
Profile N/A
Details

Click actions button to open the Change Detection modal.

Pass All

Pass All gives you the option to pass all observations on a given day at once. When you pass all observations, your DQ score updates when you refresh the page.

View AR

View AR opens the Rule Check Details of a DQ Job.

Collibra DQ profiles the data and builds a model for each dataset it scans. This allows Collibra DQ to learn what normal means within the context of each dataset. As the data changes, the definition of normal also changes. Instead of requiring you to adjust rule settings, Collibra DQ continues to adjust its model. This approach enables Collibra DQ to provide automated, enterprise-grade data quality coverage that removes the need to write dozens or even hundreds of rules per dataset.

Rules

Collibra DQ takes a strong stance that data should first be profiled, auto-discovered and learned before applying basic rules. This methodology commonly removes thousands of rules that will never need to be written and evolve naturally overtime. However there are still many cases to add a simple rule, complex rule or domain specific rule.

The following table describes the information available on the Rules tab of the Findings page.

Column Description
Rule Name The name of your rule.
Condition The rule condition that is defined in the rule. An alert generates when the condition is met.
Points The number of points to deduct from the DQ Score for a given rule break.
% The percentage of your overall DQ Score that are deducted because of a Breaking rule.
Records The number of rows with Breaking records. If the status of the rule is Exception or Passing, then the value should be 0.
Status

The status of the rule. The following table shows the possible statuses:

Status Description
passing status Passing indicates that the dataset passes the rule check and no points are deducted from the data quality score.
exception status Exception indicates that there is an error with the rule check. Expand the row to view the error message.
breaking status Breaking indicates that the rule has detected an anomaly in a column and points are deducted from the data quality score. Expand the row to view the data preview of the break record.
Dimension The DQ Dimension that is identified in the rule finding, for example, Completeness.
isActive Shows if a rule is active. When the value is 1, the rule is active. When the value is 0, the rule is inactive.
Status The status of your data quality item, for example, Observation.
Profile

The user account that is assigned to this rule finding. When the Status is Assigned, a user profile displays in this column.

Note When a rule finding is unassigned, the profile column is empty.

Action

Rule Breaks lets you preview the rule break export file and download either a CSV or JSON file, depending on your processing mode, giving you more control over how you use and share break records.

  • In Pullup mode, you can download a CSV file containing details of the break records.
  • In Pushdown mode, you can download either a CSV or JSON file containing details of the break records.

Rule Discovery

Rule Discovery detects the data classes assigned to a selected data category. The Rule Discovery algorithm automatically selects the best match if a column matches two or more data classes. Data class match criteria are determined by percent match and name distance.

Click Rule Discovery and then select an option from the Data Category dropdown menu. Click Run Discovery to assign your selection as a data category and run the discovery job.

Break records in the PostgreSQL Metastore

When a rule returns breaking records, the following query inserts unique records into the PostgreSQL Metastore rule_breaks table based on dataset, run_id, rule_nm, and link_id:

INSERT INTO rule_breaks (dataset, run_id, rule_nm, link_id)
               VALUES (:dataset, :runId, :ruleNm, :linkId)
                ON CONFLICT (dataset, run_id, rule_nm, link_id)
                    DO UPDATE SET dataset = :dataset,                                  run_id  = :runId,                                 rule_nm = :ruleNm,                                 link_id = :linkId
                WHERE rule_breaks.dataset = :dataset
                  AND rule_breaks.run_id = :runId
                  AND rule_breaks.rule_nm = :ruleNm
                  AND rule_breaks.link_id = :linkId;

Note Because only unique records are inserted into the rule_breaks table, the number of records in the rule_breaks table might not match the number of breaking records displayed on the Findings page.

Pulse View Preview

The Pulse View gives you a data preview box plot for simple rules that are breaking. You can click any available box to drill into the data of that day.

Note Pulse View Preview is not available for passing rules or runs without data.

Exporting rule break records

There are three options to export the details of your rule break records as .xlsx files:

  • Export generates an Excel file with the details from the drill-in.
  • Export LinkIds generates an Excel file with the name of the dataset, Run Id, Rule Name, and Link Id, when available.
  • Export with Details generates an Excel file with the details from the drill-in and the data preview, when available.

Outliers

The Outliers activity detects values that differ significantly from the rest of the data and may indicate bad or incorrect data. Numerical outliers are detected with the IQR and box plot methods.

The following table describes the information available on the Outliers tab of the Findings page.

Column Description
Key The column assigned as a Key column. Any outliers that Collibra DQ are grouped by this column if and when a Key is assigned.
Column The column where a potential outlier is detected.
Value The value of the column of the detected outlier.
Count The number of potential outliers in the column.
Predicted The value that Collibra DQ predicts for a given run. This value is based on the observed values of previous runs.
Conf The confidence score. A low confidence score starts at 0.
Status

The status of your data quality item, for example, Observation.

Status also lets you label and train a finding. The available dropdown menu options are Validate, Invalidate, and Resolve.

Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation.

Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:

  • Save lets you mark a finding as invalidated.
  • Save & Retrain lets you invalidate a finding and any previously saved invalidated findings, if any.
  • Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed.

Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores.

Profile

The user account that is assigned to this outlier finding. When the Status is Assigned, a user profile displays in this column.

Note When an outlier finding is unassigned, the profile column is empty.

Link ID

Links back to the detected record for remediation.

Note Link ID is not available for categorical outliers.

Action

In Pushdown mode, you can download either a CSV or JSON file containing details of the break records.

Note When you assign a Date and Key column in an Outlier configuration, Collibra DQ may also discover Record finding.

Invalidate All

Invalidate All instructs Collibra DQ to ignore all outlier findings and allow the values to pass.

Exporting outlier records

There are two options above the drill-in table to export the details of your outlier records as .xlsx files:

  • Export generates an Excel file with the details from the drill-in.
  • Export with Details generates an Excel file with the details from the drill-in and the data preview, when available.

Patterns

Patterns detect similarities among string values across columns and down-score any uncommon observations.

The following table describes the information available on the Patterns tab of the Findings page.

Column Description

A data preview of the actual value and the predicted value.

Click to expand the observation. The first row in the data preview is the actual value and the second row is the predicted value.

Type The type of pattern detected, for example, Suggestive.
Observation An overview of the pattern observation. The format of this column is [actual value in columns where a pattern is detected -> predicted value in columns where a pattern is detected].
Count The number of patterns detected in the column.
Action

Lets you label and train a finding. The available dropdown menu options are Validate, Invalidate, and Resolve.

Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation.

Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:

  • Save lets you mark a finding as invalidated.
  • Save & Retrain lets you invalidate a finding and any previously saved invalidated findings, if any.

Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed.

Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores.

Status The status of your data quality item, for example, Observation.
Profile

The user account that is assigned to this pattern finding. When the Status is Assigned, a user profile displays in this column.

Note When a pattern finding is unassigned, the profile column is empty.

Adjusting the downscore value

You may find that the default downscore value of 1 does not meet your requirements. Click the input field to the upper-left of the findings table and enter any whole number next to DownScore Value to adjust the downscore value of the pattern observations that Collibra DQ detects.

Exporting rule break records

There are two options to export the details of your rule break records as .xlsx files:

  • Export generates an Excel file with the details from the drill-in.
  • Export with Details generates an Excel file with the details from the drill-in and the data preview, when available.

Source

Mapping detects row count, schema, and cell value inconsistencies between the source file or table and the target file or table and down-scores any observations.

The Source tab provides doughnut charts to show an overview of any source issues and whether they occur at the Count-, Schema-, or Cell-level.

source doughnut charts

Count

Column Description
Type The type of count issue, for example, Dataset Column Count and Dataset Row Count.
Target Count The number of columns or rows that Collibra DQ counts in the target dataset.
Source Count The number of columns or rows that Collibra DQ counts in the source dataset.
Change (%) The percentage count difference between the target and source datasets.
Description The description of the count issue.
Action

Lets you label and train a finding. The available dropdown menu options are Validate, Invalidate, and Resolve.

Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation.

Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:

  • Save lets you mark a finding as invalidated.
  • Save & Retrain lets you invalidate a finding and any previously saved invalidated findings, if any.

Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed.

Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores.

Status The status of your data quality item, for example, Observation.
Profile

The user account that is assigned to this source finding. When the Status is Assigned, a user profile displays in this column.

Note When a source finding is unassigned, the profile column is empty.

Column Schema

Column Description
Target Column The column of your target table, file, or view where a an anomaly is detected. This value is mapped to the Source Column.
(T) Col Order The column order of the target dataset.
(T) Type The data type of the target dataset.
Source Column The column of your source table, file, or view where a an anomaly is detected. This value is mapped to the Target Column.
(S) Col Order The column order of the source dataset.
(S) Type The data type of the source dataset.
Matches

Describes whether the target column schema matches the source column schema.

If the column schemas between the target and source match, then a Y displays.

If the column schemas between the target and source do not match, then a N displays.

Passing

There are two possible options:

Option Description
source passing The checkmark indicates that the source or target passes the column schema check. This is only visible when the Show failing only option is not selected.
source failing The X indicates that the source or target does not pass the column schema check.
Description The description of the column schema issue.
Action

Lets you label and train a finding. The available dropdown menu options are Validate, Invalidate, and Resolve.

Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation.

Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:

  • Save lets you mark a finding as invalidated.
  • Save & Retrain lets you invalidate a finding and any previously saved invalidated findings, if any.

Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed.

Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores.

Status The status of your data quality item, for example, Observation.
Profile

The user account that is assigned to this source finding. When the Status is Assigned, a user profile displays in this column.

Note When a source finding is unassigned, the profile column is empty.

Cell

# of rows compared are the total number of rows from the target and source datasets that Collibra DQ includes in its cell check.

Column shifts are the number of columns that Collibra DQ flags as potential cell issues.

Issues/count are the number of issues in a particular column that Collibra DQ flags as a potential issue.

Column Description
System (Type) The database or file of the source and target data.
Key The column assigned as a Key column. Any cell issues that Collibra DQ detects are grouped by this column if and when a Key is assigned.
Column The column where an anomaly is detected.
Value

The value of the column where the anomaly is detected.

Note You can select an option from the Value column dropdown menu to downtrain any finding based on column and value.

Passing

There are two possible options:

Option Description
source passing The checkmark indicates that the source or target passes the cell check.
source failing The X indicates that the source or target does not pass the cell check.
Count The number of rows in either the source or target dataset.
Description The description of the column schema issue.
Action

Lets you label and train a finding. The available dropdown menu options are Validate, Invalidate, and Resolve.

Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation.

Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:

  • Save lets you mark a finding as invalidated.
  • Save & Retrain lets you invalidate a finding and any previously saved invalidated findings, if any.

Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed.

Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores.

Status The status of your data quality item, for example, Observation.
Profile

The user account that is assigned to this source finding. When the Status is Assigned, a user profile displays in this column.

Note When a source finding is unassigned, the profile column is empty.

Record

The Record activity detects rows of data that drop out of or are added to a dataset and down-score any observations. To see record findings, select at least one numeric data type column in the Outliers layer, a Key column, and a Date column. If you do not select at least one numeric column, a Key column, and a Date column, Collibra DQ skips the record check activity. Alternatively, you can exclude Outlier findings and detect only Records by selecting a Key column and a Date column but excluding the numeric data type column from your Outlier configuration.

The following table describes the information available for the Record tab on the Findings page.

Column Description
Observation The type of record anomaly detected.
Action

The available dropdown menu options are Validate and Resolve.

Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation.

Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores.

Status The status of your data quality item, for example, Observation.
Profile

The user account that is assigned to this record finding. When the Status is Assigned, a user profile displays in this column.

Note When a record finding is unassigned, the profile column is empty.

Schema

Schema detects changes to columns and data types and down-scores any observations.

Note Collibra DQ requires a minimum of 4 previous runs to establish a baseline calculation of a dataset's table properties before it can detect any schema changes.

Column Description
Observation The type of schema anomaly detected.
Action

Lets you label and train a finding. The available dropdown menu options are Validate, Invalidate, and Resolve.

Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation.

Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:

  • Save lets you mark a finding as invalidated.
  • Save & Retrain lets you invalidate a finding and any previously saved invalidated findings, if any.

Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed.

Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores.

Status The status of your data quality item, for example, Observation.
Profile

The user account that is assigned to this schema finding. When the Status is Assigned, a user profile displays in this column.

Note When a schema finding is unassigned, the profile column is empty.

Exporting schema records

Click Export above the drill-in table to generate an Excel file with the details from the drill-in.

Dupes

Dupes are column values that match other existing column values.

The following table shows the available columns in the Dupes tab.

Column Description
Type The type of finding, for example, DUPE.
Score The percentage that two or more duplicate values match. A score of 100 indicates that the duplicate values are exact matches of each other, whereas a score of 85 indicates that the duplicate values are fuzzy matches.
[Column]

The column where the duplicate value appears.

Note The name of this column is dynamic depending on how it appears in your table, file, or view. For example, when the name of the column in your data source that contains the duplicate values is called last name, then the column in the Findings table will be last name.

Occurs The number of duplicate values in a column.
Profile

The user account that is assigned to this dupes finding. When the Status is Assigned, a user profile displays in this column.

Note When a dupes finding is unassigned, the profile column is empty.

Status

The status of your data quality item, for example, Observation.

Status also lets you label and train a finding. The available dropdown menu options are Validate, Invalidate, and Resolve.

Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation.

Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:

  • Save lets you mark a finding as invalidated.
  • Save & Retrain lets you invalidate a finding and any previously saved invalidated findings, if any.
  • Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed.

Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores.

Action

In Pushdown mode, you can download either a CSV or JSON file containing details of the break records.

Exporting dupes records

Click Export above the drill-in table to generate an Excel file with the details from the drill-in.

Shapes

Shapes detect rare or inconsistent data formats in string columns.

The following table shows the information available on the Shapes tab of the Findings page.

Column Description
Column The column where Collibra DQ detects a shape.
Schema The datatype schema of the column where Collibra DQ detects a shape.
Shape The format of the shape.
Count The number of times a particular shape format appears in a column.
Row Count The number of rows in the table, file, or view where Collibra DQ detects shape issues.
Percent The percentage a shape conforms to the format that Collibra DQ identifies as normal.
Shapes/Col The number of shape issues in a particular column.
Status

The status of your data quality item, for example, Observation.

Status also lets you label and train a finding. The available dropdown menu options are Validate, Invalidate, and Resolve.

Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation.

Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:

  • Save lets you mark a finding as invalidated.
  • Save & Retrain lets you invalidate a finding and any previously saved invalidated findings, if any.
  • Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed.

Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores.

Profile

The user account that is assigned to this shapes finding. When the Status is Assigned, a user profile displays in this column.

Note When a shapes finding is unassigned, the profile column is empty.

Action

In Pushdown mode, you can download either a CSV or JSON file containing details of the break records.

Exporting shape records

There are two options above the drill-in table to export the details of your outlier records as .xlsx files:

  • Export generates an Excel file with the details from the drill-in.
  • Export with Details generates an Excel file with the details from the drill-in and the data preview, when available.

Configuring manual options

You can configure additional shape options on the Findings page. Click the above the upper-right corner of the Shapes Findings table.

The following table shows the available options.

Option Description Default Value
Occurrence % < [X]

Only shows shapes below the given percentage threshold.

0.001
Format per Col < [X] Only shows columns with less than the given number of formats. For example, when there are 30 formats and the Format per Column value is set to 5, then only 5 columns display. 20
Character Length < [X] Only shows Shape issues with fewer than the given number of characters. For example, if the value is set to 9, then all values less than 9 will show. 12