Source

The Source monitor is configured in the Mapping step in Explorer, and is used to detect row count, schema, and cell value inconsistencies between the source file or table and the target file or table and down-scores any observations.

The Source tab provides donut charts to show an overview of any source issues and whether they occur at the Count-, Schema-, or Cell-level. Only the items marked with a "Y" in the data tables are included in the calculations.

source doughnut charts

Donut charts

Chart Description
Row Count The percentage of target row counts compared to the source row counts.
Column Count The percentage of target column counts compared to the source column counts.
Matching The percentage of target column schemas that match the source column schemas. The data tables on this page provide the details on the matched schemas.
Passing The percentage of source and target column schemas that pass the column schema check. The calculation also includes columns and datatypes that exist in the source but not in the target, or in the target but not in the source. The data tables on this page provide the details on the passed schemas.
# of rows with mismatched cells The number of rows where the target cells don't match the source cells. This chart provides the inverse of the % rows matched chart. For example, if this value is 0, then the % rows matched chart will be 100%.
% rows matched The percentage of rows where the target cells match the source cells. This chart provides the inverse of the # of rows with mismatched cells chart. For example, if this value is 100%, then the # of rows with mismatched cells chart will be 0.

Count

Column Description
Type The type of count issue, for example, Dataset Column Count and Dataset Row Count.
Target Count The number of columns or rows that Collibra DQ counts in the target dataset.
Source Count The number of columns or rows that Collibra DQ counts in the source dataset.
Change (%) The percentage count difference between the target and source datasets.
Description The description of the count issue.
Action

Lets you label and train a finding. The available dropdown menu options are Validate, Invalidate, and Resolve.

Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation.

Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:

  • Save lets you mark a finding as invalidated.
  • Save & Retrain lets you invalidate a finding and any previously saved invalidated findings, if any.

Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed.

Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores.

Profile

The user account that is assigned to this source finding. When the Status is Assigned, a user profile displays in this column.

Note When a source finding is unassigned, the profile column is empty.

Column Schema

Column Description
Target Column

The column of your target table, file, or view where a an anomaly is detected. This value is mapped to the Source Column.

Note If a selected column exists in the target dataset but not in the source, the resulting outer join table will display "null" for that column in the source dataset.

(T) Col Order The column order of the target dataset.
(T) Type The data type of the target dataset.
Source Column

The column of your source table, file, or view where a an anomaly is detected. This value is mapped to the Target Column.

Note If a selected column exists in the source dataset but not in the target, the resulting outer join table will display "null" for that column in the target dataset.

(S) Col Order The column order of the source dataset.
(S) Type The data type of the source dataset.
Matches

Describes whether the target column schema matches the source column schema.

If the column schemas between the target and source match, then a Y displays.

If the column schemas between the target and source do not match, then a N displays.

Passing

There are two possible options:

Option Description
source passing The checkmark indicates that the source or target passes the column schema check. This is only visible when the Show failing only option is not selected.
source failing The X indicates that the source or target does not pass the column schema check.
Description The description of the column schema issue.
Action

Lets you label and train a finding. The available dropdown menu options are Validate, Invalidate, and Resolve.

Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation.

Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:

  • Save lets you mark a finding as invalidated.
  • Save & Retrain lets you invalidate a finding and any previously saved invalidated findings, if any.

Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed.

Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores.

Status The status of your data quality item, for example, Observation.
Profile

The user account that is assigned to this source finding. When the Status is Assigned, a user profile displays in this column.

Note When a source finding is unassigned, the profile column is empty.

Cell

# of rows compared are the total number of rows from the target and source datasets that Collibra DQ includes in its cell check.

Column shifts are the number of columns that Collibra DQ flags as potential cell issues.

Issues/count are the number of issues in a particular column that Collibra DQ flags as a potential issue.

Column Description
System (Type) The database or file of the source and target data.
Key

The column assigned as a Key column. Any cell issues that Collibra DQ detects are grouped by this column if and when a Key is assigned.

Note An outer join is used to find discrepancies between source and target datasets. The extracted keys will be "null" for any source record that lacks a matching target record in the selected columns. The displayed keys are only from the target dataset.

Column The column where an anomaly is detected.
Value

The value of the column where the anomaly is detected.

Note You can select an option from the Value column dropdown menu to downtrain any finding based on column and value.

Passing

There are two possible options:

Option Description
source passing The checkmark indicates that the source or target passes the cell check.
source failing The X indicates that the source or target does not pass the cell check.
Count The number of rows in either the source or target dataset.
Description The description of the column schema issue.
Action

Lets you label and train a finding. The available dropdown menu options are Validate, Invalidate, and Resolve.

Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation.

Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:

  • Save lets you mark a finding as invalidated.
  • Save & Retrain lets you invalidate a finding and any previously saved invalidated findings, if any.

Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed.

Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores.

Status The status of your data quality item, for example, Observation.
Profile

The user account that is assigned to this source finding. When the Status is Assigned, a user profile displays in this column.

Note When a source finding is unassigned, the profile column is empty.

Retrain

Updates the DQ models used for automated data-quality checks. The Retrain process consumes current training data (and optionally labels/config), builds a new model or rule-set, validates it, and publishes a new version that will be used to score incoming data and compute DQ metrics. Use Retrain when data distributions change, new labels/ground truth are available, business rules change, or on a scheduled cadence to prevent model drift.