Exploring data quality findings
When Collibra DQ observes a change or anomaly, it records the finding for each dimension and includes a numeric indicator next to the tab where it detects the issue. Drill down into the various data quality dimension tabs to better understand the quality and reliability of your dataset.
This section shows the available dimensions and what they mean when you drill into a specific finding.
- Behaviors
- Rules
- Outliers
- Patterns
- Source
- Record
- Schema
- Dupes
- Shapes
Behavior
Collibra DQ learns from column-level profiling to create AdaptiveRules, which contribute to the overall Behavior score. AdaptiveRules are rules that automatically observe and adapt to changes in numeric representations of data over time and downscore any values outside defined boundaries.
Behavioral anomalies most commonly appear when you use behavior lookback or Replay, which allows Collibra DQ to learn the behavior of a dataset over a period of time.
The following table describes the information available on the Behaviors tab of the Findings page.
Column | Description | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Blind Spot | The column where a behavioral anomaly is detected. | ||||||||||||||||||||||
Type |
The type of AdaptiveRules that the behavioral model detects from the profiling activity on a given column. The following table shows the possible AdaptiveRules:
|
||||||||||||||||||||||
Baseline | The mean of the preceding number of scans determined by the Data Lookback value on the Settings modal in Explorer. | ||||||||||||||||||||||
Today | The value of the behavioral observation on the day that Collibra DQ detects it. | ||||||||||||||||||||||
% Change |
The percent change from the value of one row to another.
|
||||||||||||||||||||||
Δ % Change |
The delta percent change from the value of one row to another. Delta percent change does not apply to absolute baselines, such as min, max, and mean.
Note Delta percent change is only available for volume-weighted metrics, such as null, empty, and shift. |
||||||||||||||||||||||
Zscore | The number of standard deviations away from the expected baseline value. | ||||||||||||||||||||||
Description | The description of the type of DQ check performed on a given column. | ||||||||||||||||||||||
Score | The value subtracted from your overall DQ score. The distance from the expected ranges set by the variance and boundaries of the baseline value. Expected ranges are also visible in the AR panel with graphs available in the Details panel for each line item. | ||||||||||||||||||||||
Status | Allows you to validate or resolve an observation and, when applicable, assign it to a user for further analysis. | ||||||||||||||||||||||
Profile |
The user account that is assigned to this behavioral finding. When the Status is Assigned, a user profile displays in this column. Note When a behavioral finding is unassigned, the profile column is empty. |
||||||||||||||||||||||
Details |
Click to open the Change Detection modal. |
Pass All
Pass All gives you the option to pass all observations on a given day at once. When you pass all observations, your DQ score updates when you refresh the page.
View AR
View AR opens the Rule Check Details of a DQ Job.
Collibra DQ profiles the data and builds a model for each dataset it scans. This allows Collibra DQ to learn what normal means within the context of each dataset. As the data changes, the definition of normal also changes. Instead of requiring you to adjust rule settings, Collibra DQ continues to adjust its model. This approach enables Collibra DQ to provide automated, enterprise-grade data quality coverage that removes the need to write dozens or even hundreds of rules per dataset.
Rules
Collibra DQ takes a strong stance that data should first be profiled, auto-discovered and learned before applying basic rules. This methodology commonly removes thousands of rules that will never need to be written and evolve naturally overtime. However there are still many cases to add a simple rule, complex rule or domain specific rule.
The score on the Rules tab is the rule percentage (% column) multiplied by the number of points (Points column), rounded to a whole number. If a rule with a Percent Scoring Type does not meet or exceed its defined percentage, the score on the Rules tab remains 0.
The following table describes the information available on the Rules tab of the Findings page.
Column | Description | ||||||||
---|---|---|---|---|---|---|---|---|---|
Rule Name | The name of the rule. | ||||||||
Condition | The rule condition that is defined in the rule. An alert generates when the condition is met. | ||||||||
Points | The number of points to deduct from the quality score for a given rule break. | ||||||||
Perc | The ratio of the total number of breaking records over the total number of rows. | ||||||||
Breaking Records | The number of rows with records that did not pass the conditions of the rule. If the status of the rule is Exception or Passing, then the value should be 0. | ||||||||
Passing Records | The number of rows with records that passed the conditions of the rule. | ||||||||
Status |
The status of the rule. The following table shows the possible statuses:
|
||||||||
Dimension | The DQ Dimension that is identified in the rule finding, for example, Completeness. | ||||||||
State | Shows if a rule is enabled or disabled. | ||||||||
Status | Allows you to validate or resolve an observation and, when applicable, assign it to a user for further analysis. | ||||||||
Profile |
The user account that is assigned to this rule finding. When the Status is Assigned, a user profile displays in this column. Note When a rule finding is unassigned, the profile column is empty. |
||||||||
Action |
Rule Breaks lets you preview the rule break export file and download either a CSV or JSON file, depending on your processing mode, giving you more control over how you use and share break records.
Note The ability to copy rule breaks is limited to secure Cloud Native deployments of Collibra DQ. You cannot copy rule breaks in Standalone deployments. |
Rule Discovery
Rule Discovery detects the data classes assigned to a selected data category. The Rule Discovery algorithm automatically selects the best match if a column matches two or more data classes. Data class match criteria are determined by percent match and name distance.
Click Rule Discovery, then select an option from the Data Category dropdown menu. Click Run Discovery to assign your selection as a data category and run the discovery job.
Break records in the PostgreSQL Metastore
When a rule returns breaking records, the following query inserts unique records into the PostgreSQL Metastore rule_breaks table based on dataset, run_id, rule_nm, and link_id:
INSERT INTO rule_breaks (dataset, run_id, rule_nm, link_id)
VALUES (:dataset, :runId, :ruleNm, :linkId)
ON CONFLICT (dataset, run_id, rule_nm, link_id)
DO UPDATE SET
dataset = :dataset,
run_id = :runId,
rule_nm = :ruleNm,
link_id = :linkId
WHERE
rule_breaks.dataset = :dataset
AND rule_breaks.run_id = :runId
AND rule_breaks.rule_nm = :ruleNm
AND rule_breaks.link_id = :linkId;
Note Because only unique records are inserted into the rule_breaks table, the number of records in the rule_breaks table might not match the number of breaking records displayed on the Findings page.
Pulse View Preview
The Pulse View gives you a data preview box plot for simple rules that are breaking. You can click any available box to drill into the data of that day.
Note Pulse View Preview is not available for passing rules or runs without data.
Exporting rule break records
There are three options to export the details of your rule break records as .xlsx files:
- Export generates an Excel file with the details from the drill-in.
- Export LinkIds generates an Excel file with the name of the dataset, Run Id, Rule Name, and Link Id, when available.
- Export with Details generates an Excel file with the details from the drill-in and the data preview, when available.
Note The ability to copy rule breaks is limited to secure Cloud Native deployments of Collibra DQ. You cannot copy rule breaks in Standalone deployments.
Limitations
Total records are calculated by adding the total number of breaking and passing records. Therefore, when referencing a secondary dataset in a native rule, the extra rows from the secondary dataset are included in the query results on the Findings page, skewing the total rows calculation and the percentage of breaking records.
Outliers
The Outliers activity detects values that differ significantly from the rest of the data and may indicate bad or incorrect data. Numerical outliers are detected with the IQR and box plot methods.
The following table describes the information available on the Outliers tab of the Findings page.
Column | Description |
---|---|
Key | The column assigned as a Key column. Any outliers that Collibra DQ are grouped by this column if and when a Key is assigned. |
Column | The column where a potential outlier is detected. |
Value | The value of the column of the detected outlier. |
Count | The number of potential outliers in the column. |
Predicted | The type of value that Collibra DQ predicts for a given run, for example, categorical. This prediction is based on the observed values of previous runs. |
Conf |
The confidence score, ranging from 0 to 100, indicates how far the current value is from the lower or upper bound. Lower scores such as 0 or 1, indicate a higher likelihood of the value being an outlier. Conversely, higher scores, such as 97, suggest a lower likelihood of the value being an outlier. |
Status |
Lets you label and train a finding. a finding. The available dropdown menu options are Validate, Invalidate, and Resolve. Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation. Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:
Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed. Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores. |
Profile |
The user account that is assigned to this outlier finding. When the Status is Assigned, a user profile displays in this column. Note When an outlier finding is unassigned, the profile column is empty. |
Link ID |
Links back to the detected record for remediation. Note Link ID is not available for categorical outliers. |
Action |
In Pushdown mode, you can download either a CSV or JSON file containing details of the break records. |
Note When you assign a Date and Key column in an Outlier configuration, Collibra DQ may also discover Record finding.
Invalidate All
Invalidate All instructs Collibra DQ to ignore all outlier findings and allow the values to pass.
Exporting outlier records
There are two options above the drill-in table to export the details of your outlier records as .xlsx files:
- Export generates an Excel file with the details from the drill-in.
- Export with Details generates an Excel file with the details from the drill-in and the data preview, when available.
Patterns
Patterns detect similarities among string values across columns and down-score any uncommon observations.
The following table describes the information available on the Patterns tab of the Findings page.
Column | Description |
---|---|
A data preview of the actual value and the predicted value. Click to expand the observation. The first row in the data preview is the actual value and the second row is the predicted value. |
|
Type | The type of pattern detected, for example, Suggestive. |
Observation | An overview of the pattern observation. The format of this column is [actual value in columns where a pattern is detected -> predicted value in columns where a pattern is detected]. |
Count | The number of patterns detected in the column. |
Status |
Lets you label and train a finding. The available dropdown menu options are Validate, Invalidate, and Resolve. Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation. Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:
Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed. Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores. |
Profile |
The user account that is assigned to this pattern finding. When the Status is Assigned, a user profile displays in this column. Note When a pattern finding is unassigned, the profile column is empty. |
Adjusting the downscore value
You may find that the default downscore value of 1 does not meet your requirements. Click the input field to the upper-left of the findings table and enter any whole number next to DownScore Value to adjust the downscore value of the pattern observations that Collibra DQ detects.
Exporting rule break records
There are two options to export the details of your rule break records as .xlsx files:
- Export generates an Excel file with the details from the drill-in.
- Export with Details generates an Excel file with the details from the drill-in and the data preview, when available.
Source
Mapping detects row count, schema, and cell value inconsistencies between the source file or table and the target file or table and down-scores any observations.
The Source tab provides doughnut charts to show an overview of any source issues and whether they occur at the Count-, Schema-, or Cell-level.
Count
Column | Description |
---|---|
Type | The type of count issue, for example, Dataset Column Count and Dataset Row Count. |
Target Count | The number of columns or rows that Collibra DQ counts in the target dataset. |
Source Count | The number of columns or rows that Collibra DQ counts in the source dataset. |
Change (%) | The percentage count difference between the target and source datasets. |
Description | The description of the count issue. |
Action |
Lets you label and train a finding. The available dropdown menu options are Validate, Invalidate, and Resolve. Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation. Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:
Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed. Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores. |
Profile |
The user account that is assigned to this source finding. When the Status is Assigned, a user profile displays in this column. Note When a source finding is unassigned, the profile column is empty. |
Column Schema
Column | Description | ||||||
---|---|---|---|---|---|---|---|
Target Column | The column of your target table, file, or view where a an anomaly is detected. This value is mapped to the Source Column. | ||||||
(T) Col Order | The column order of the target dataset. | ||||||
(T) Type | The data type of the target dataset. | ||||||
Source Column | The column of your source table, file, or view where a an anomaly is detected. This value is mapped to the Target Column. | ||||||
(S) Col Order | The column order of the source dataset. | ||||||
(S) Type | The data type of the source dataset. | ||||||
Matches |
Describes whether the target column schema matches the source column schema. If the column schemas between the target and source match, then a Y displays. If the column schemas between the target and source do not match, then a N displays. |
||||||
Passing |
There are two possible options:
|
||||||
Description | The description of the column schema issue. | ||||||
Action |
Lets you label and train a finding. The available dropdown menu options are Validate, Invalidate, and Resolve. Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation. Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:
Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed. Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores. |
||||||
Status | The status of your data quality item, for example, Observation. | ||||||
Profile |
The user account that is assigned to this source finding. When the Status is Assigned, a user profile displays in this column. Note When a source finding is unassigned, the profile column is empty. |
Cell
# of rows compared are the total number of rows from the target and source datasets that Collibra DQ includes in its cell check.
Column shifts are the number of columns that Collibra DQ flags as potential cell issues.
Issues/count are the number of issues in a particular column that Collibra DQ flags as a potential issue.
Column | Description | ||||||
---|---|---|---|---|---|---|---|
System (Type) | The database or file of the source and target data. | ||||||
Key | The column assigned as a Key column. Any cell issues that Collibra DQ detects are grouped by this column if and when a Key is assigned. | ||||||
Column | The column where an anomaly is detected. | ||||||
Value |
The value of the column where the anomaly is detected. Note You can select an option from the Value column dropdown menu to downtrain any finding based on column and value. |
||||||
Passing |
There are two possible options:
|
||||||
Count | The number of rows in either the source or target dataset. | ||||||
Description | The description of the column schema issue. | ||||||
Action |
Lets you label and train a finding. The available dropdown menu options are Validate, Invalidate, and Resolve. Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation. Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:
Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed. Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores. |
||||||
Status | The status of your data quality item, for example, Observation. | ||||||
Profile |
The user account that is assigned to this source finding. When the Status is Assigned, a user profile displays in this column. Note When a source finding is unassigned, the profile column is empty. |
Record
The Record activity detects rows of data that drop out of or are added to a dataset and down-score any observations. To see record findings, select at least one numeric data type column in the Outliers layer, a Key column, and a Date column. If you do not select at least one numeric column, a Key column, and a Date column, Collibra DQ skips the record check activity. Alternatively, you can exclude Outlier findings and detect only Records by selecting a Key column and a Date column but excluding the numeric data type column from your Outlier configuration.
The following table describes the information available for the Record tab on the Findings page.
Column | Description |
---|---|
Observation | The type of record anomaly detected. |
Status | Allows you to validate or resolve an observation and, when applicable, assign it to a user for further analysis. |
Profile |
The user account that is assigned to this record finding. When the Status is Assigned, a user profile displays in this column. Note When a record finding is unassigned, the profile column is empty. |
Schema
Schema detects changes to columns and data types and down-scores any observations.
Note Collibra DQ requires a minimum of 4 previous runs to establish a baseline calculation of a dataset's table properties before it can detect any schema changes.
Column | Description |
---|---|
Observation | The type of schema anomaly detected. |
Action |
Lets you label and train a finding. The available dropdown menu options are Validate, Invalidate, and Resolve. Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation. Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:
Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed. Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores. |
Status | The status of your data quality item, for example, Observation. |
Profile |
The user account that is assigned to this schema finding. When the Status is Assigned, a user profile displays in this column. Note When a schema finding is unassigned, the profile column is empty. |
Exporting schema records
Click Export above the drill-in table to generate an Excel file with the details from the drill-in.
Dupes
Dupes are column values that match other existing column values.
The following table shows the available columns in the Dupes tab.
Column | Description |
---|---|
Type | The type of finding, for example, DUPE. |
Score | The percentage that two or more duplicate values match. A score of 100 indicates that the duplicate values are exact matches of each other, whereas a score of 85 indicates that the duplicate values are fuzzy matches. |
[Column] |
The column where the duplicate value appears. Case insensitive exact match duplicates for Pullup datasets display in all lower case. While the casing of exact match duplicates may vary, this is the expected behavior. Note The name of this column is dynamic depending on how it appears in your table, file, or view. For example, when the name of the column in your data source that contains the duplicate values is called last name, then the column in the Findings table will be last name. |
Occurs | The number of duplicate values in a column. |
Profile |
The user account that is assigned to this dupes finding. When the Status is Assigned, a user profile displays in this column. Note When a dupes finding is unassigned, the profile column is empty. |
Status |
Lets you label and train a finding. The available dropdown menu options are Validate, Invalidate, and Resolve. Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation. Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:
Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed. Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores. |
Action |
In Pushdown mode, you can download a CSV or JSON file containing details of the break records. Note This column does not display for DQ Jobs created in Pullup mode. |
Exporting dupes records
Click Export above the drill-in table to generate an Excel file with the details from the drill-in.
Shapes
Shapes detect rare or inconsistent data formats in string columns.
The following table shows the information available on the Shapes tab of the Findings page.
Column | Description |
---|---|
Column | The column where Collibra DQ detects a shape. |
Schema | The datatype schema of the column where Collibra DQ detects a shape. |
Shape | The format of the shape. |
Count | The number of times a particular shape format appears in a column. |
Row Count | The number of rows in the table, file, or view where Collibra DQ detects shape issues. |
Percent | The percentage a shape conforms to the format that Collibra DQ identifies as normal. |
Shapes/Col | The number of shape issues in a particular column. |
Status |
Lets you label and train a finding. The available dropdown menu options are Validate, Invalidate, and Resolve. Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation. Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:
Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed. Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores. |
Profile |
The user account that is assigned to this shapes finding. When the Status is Assigned, a user profile displays in this column. Note When a shapes finding is unassigned, the profile column is empty. |
Action |
In Pushdown mode, you can download either a CSV or JSON file containing details of the break records. Note This column does not display for DQ Jobs created in Pullup mode. |
Exporting shape records
There are two options above the drill-in table to export the details of your outlier records as .xlsx files:
- Export generates an Excel file with the details from the drill-in.
- Export with Details generates an Excel file with the details from the drill-in and the data preview, when available.
Configuring manual options
You can configure additional shape options on the Findings page. Click the above the upper-right corner of the Shapes Findings table.
The following table shows the available options.
Option | Description | Default Value |
---|---|---|
Occurrence % < [X] |
Only shows shapes below the given percentage threshold. |
0.001 |
Format per Col < [X] | Only shows columns with less than the given number of formats. For example, when there are 30 formats and the Format per Column value is set to 5, then only 5 columns display. | 20 |
Character Length < [X] | Only shows Shape issues with fewer than the given number of characters. For example, if the value is set to 9, then all values less than 9 will show. | 12 |