Findings

The Findings page is a dashboard that displays the results and health of a DQ job run. Findings lets you explore the details of a job run and gives you the ability to drill down into the various data quality dimensions to better understand your dataset.

Collibra DQ profiles the data and builds a model for each dataset it scans. This allows Collibra DQ to learn what normal means within the context of each dataset. As the data changes, the definition of normal also changes. Instead of requiring you to adjust rule settings, Collibra DQ continues to adjust its model. This approach enables Collibra DQ to provide automated, enterprise-grade data quality coverage that removes the need to write dozens or even hundreds of rules per dataset.

Findings overview

When Collibra DQ observes a change or anomaly, it records the finding for each dimension and includes a numeric indicator next to the tab where Collibra DQ detects the issue.

dq findings page

The following table provides an overview of the key elements of the Findings page with numbers that correspond to the image above.

No. Component Description
Metadata bar A dataset anchor that empowers you with the ability to access crucial data quality components related to your dataset, such as its Findings, Profile, and associated rules and alerts.
Data quality score

The Data Quality score is an aggregated value between 0-100 that represents a summary of the integrity of your data. A score of 100 means that Collibra DQ does not detect data quality issues in your dataset. Conversely, a score of 0 indicates that Collibra DQ has observed enough potential data quality issues that it has down-scored the findings to the lowest possible value.

Note The findings from each data quality dimension where Collibra DQ detects potential data quality issues contribute to the overall data quality score.

Below the score meter are five clickable options.

Option Description
P Pass (P) indicates a job run that initially failed but is determined to be passing after further investigation of the findings. This option trains the Collibra DQ learning model to pass future runs that are similar.
F Fail (F) indicates a job run that initially passed but was determined to be failing after further investigation of the findings. This option trains the Collibra DQ learning model to fail future runs that are similar.
O Off-peak (O) indicates a job run on a weekend or holiday that is valid but does not align with peak time runs. This option trains Collibra DQ to label similar future runs as off peak.
I Ignore (I) indicates a valid run that fits the run cycle profile but should not be used for training the model. This option ignores the findings of a given run.
R Remove (R) indicates a run that triggered by mistake and should be removed.
Timestamp

The runId of the current DQ Job. You can click this option to set the runId to a specific timezone.

When there is only one DQ Job run on a given runId date, the timestamp is always 00:00. However, when there are multiple DQ Job runs on a given runId date, the timestamp reflects the time each DQ Job ran, for example, 01:00 or 10:45.

Note The timestamp displays in 24-hour time format. For example, 01:00 is 1 AM, 13:00 is 1 PM, and so on.

Score chart view selector

The Scores, Pulse, Row Count, and Pass/Fail tabs each provide different graphical representations of your data quality scores, run status, row count evolution, and pass or fail status over time.

Note The times along the x-axis use random values that do not reflect or relate to the run ID of a given DQ Job run.

Data quality dimension findings tabs When Collibra DQ detects potential data quality issues for a given dimension, a numeric indicator shows how many issues were discovered and the data quality score impact aggregated from the findings. Click the tabs to reveal more details about the findings.
Behavior
Automatically detected. Behavior is an assortment of behavioral observations based on Collibra DQ's AdaptiveRules, which automatically observe and adapt to changes in numeric representations of data over time, down-scoring any values outside the defined boundaries.
Rules
Opt-in for detection. Rules are user-defined SQL conditions that help you identify and understand anomalies in your data, and ensure that your data is fit for use to meet your business requirements.
Outliers
Opt-in for detection. Outliers are values that differ significantly from the rest of the data and may indicate bad or incorrect data.
Patterns
Opt-in for detection. Patterns detect similarities among string values across columns and down-score any observations.
Source
Opt-in for detection. Detects row count, schema, and cell value inconsistencies between the source file or table and the target file or table and down-scores any observations.
Record
Opt-in for detection. Records detect rows of data that drop out of a data set and down-score any observations.
Schema
Automatically detected. Schema detects changes to columns and data types and down-score any observations.
Dupes
Opt-in for detection. Duplicates (dupes) are values that match other existing values in columns and can be set to detect exact and fuzzy matches.
Shapes
Automatically detected. Shapes detect rare or inconsistent data formats in string columns and down-score any observations.
Labels, Job, and Export tabs

These three tabs let you do the following:

  • Labels - Apply labels and annotations.
  • Job - View and edit the run command line of the current and previous job.
  • Export - Export individual or all data quality findings as .xlsx files.
Finding drill-in Click data quality dimension tabs with findings to discover more insights about potential data quality issues.
Whitespace and masking

When you click the Whitespace toggle above and to the right of the Data Preview, gray rectangles highlight empty spaces between string data in the Data Preview.

When you select the Masked checkbox option at the top of a Data Preview column and click Update Masking above and to the right of the Data Preview, the Findings page refreshes and masks column entries with "XXXXX" instead of displaying potentially sensitive data.

Data Preview A preview of the columns in your dataset where you can view cell values, column data types, passing or failing columns, and where Shapes are detected. You can also add column-level Quick Rules, view sensitive labels, parent key columns labels, and column statistics, such as max, mean, in, and number of uniques.

What's next?

Drill down into your data quality findings to learn more about what Collibra DQ discovered and how the observations contribute to the overall data quality score of your dataset.