Source
The Source monitor is configured in the Mapping step in Explorer, and is used to detect row count, schema, and cell value inconsistencies between the source file or table and the target file or table and down-scores any observations.
The Source tab provides donut charts to show an overview of any source issues and whether they occur at the Count-, Schema-, or Cell-level. Only the items marked with a "Y" in the data tables are included in the calculations.
Donut charts
| Chart | Description |
|---|---|
| Row Count | The percentage of target row counts compared to the source row counts. |
| Column Count | The percentage of target column counts compared to the source column counts. |
| Matching | The percentage of target column schemas that match the source column schemas. The data tables on this page provide the details on the matched schemas. |
| Passing | The percentage of source and target column schemas that pass the column schema check. The calculation also includes columns and datatypes that exist in the source but not in the target, or in the target but not in the source. The data tables on this page provide the details on the passed schemas. |
| # of rows with mismatched cells | The number of rows where the target cells don't match the source cells. This chart provides the inverse of the % rows matched chart. For example, if this value is 0, then the % rows matched chart will be 100%. |
| % rows matched | The percentage of rows where the target cells match the source cells. This chart provides the inverse of the # of rows with mismatched cells chart. For example, if this value is 100%, then the # of rows with mismatched cells chart will be 0. |
Count
| Column | Description |
|---|---|
| Type | The type of count issue, for example, Dataset Column Count and Dataset Row Count. |
| Target Count | The number of columns or rows that Collibra DQ counts in the target dataset. |
| Source Count | The number of columns or rows that Collibra DQ counts in the source dataset. |
| Change (%) | The percentage count difference between the target and source datasets. |
| Description | The description of the count issue. |
| Action |
Lets you label and train a finding. The available dropdown menu options are Validate, Invalidate, and Resolve. Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation. Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:
Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed. Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores. |
| Profile |
The user account that is assigned to this source finding. When the Status is Assigned, a user profile displays in this column. Note When a source finding is unassigned, the profile column is empty. |
Column Schema
| Column | Description | ||||||
|---|---|---|---|---|---|---|---|
| Target Column |
The column of your target table, file, or view where a an anomaly is detected. This value is mapped to the Source Column. Note If a selected column exists in the target dataset but not in the source, the resulting outer join table will display "null" for that column in the source dataset. |
||||||
| (T) Col Order | The column order of the target dataset. | ||||||
| (T) Type | The data type of the target dataset. | ||||||
| Source Column |
The column of your source table, file, or view where a an anomaly is detected. This value is mapped to the Target Column. Note If a selected column exists in the source dataset but not in the target, the resulting outer join table will display "null" for that column in the target dataset. |
||||||
| (S) Col Order | The column order of the source dataset. | ||||||
| (S) Type | The data type of the source dataset. | ||||||
| Matches |
Describes whether the target column schema matches the source column schema. If the column schemas between the target and source match, then a Y displays. If the column schemas between the target and source do not match, then a N displays. |
||||||
| Passing |
There are two possible options:
|
||||||
| Description | The description of the column schema issue. | ||||||
| Action |
Lets you label and train a finding. The available dropdown menu options are Validate, Invalidate, and Resolve. Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation. Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:
Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed. Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores. |
||||||
| Status | The status of your data quality item, for example, Observation. | ||||||
| Profile |
The user account that is assigned to this source finding. When the Status is Assigned, a user profile displays in this column. Note When a source finding is unassigned, the profile column is empty. |
Cell
# of rows compared are the total number of rows from the target and source datasets that Collibra DQ includes in its cell check.
Column shifts are the number of columns that Collibra DQ flags as potential cell issues.
Issues/count are the number of issues in a particular column that Collibra DQ flags as a potential issue.
| Column | Description | ||||||
|---|---|---|---|---|---|---|---|
| System (Type) | The database or file of the source and target data. | ||||||
| Key |
The column assigned as a Key column. Any cell issues that Collibra DQ detects are grouped by this column if and when a Key is assigned. Note An outer join is used to find discrepancies between source and target datasets. The extracted keys will be "null" for any source record that lacks a matching target record in the selected columns. The displayed keys are only from the target dataset. |
||||||
| Column | The column where an anomaly is detected. | ||||||
| Value |
The value of the column where the anomaly is detected. Note You can select an option from the Value column dropdown menu to downtrain any finding based on column and value. |
||||||
| Passing |
There are two possible options:
|
||||||
| Count | The number of rows in either the source or target dataset. | ||||||
| Description | The description of the column schema issue. | ||||||
| Action |
Lets you label and train a finding. The available dropdown menu options are Validate, Invalidate, and Resolve. Validate instructs Collibra DQ to either assign a finding to a specific user for review, which then appears in the View the Assignment Queue or acknowledge without an assignee that the finding is a valid observation. Invalidate instructs Collibra DQ to ignore a finding and allow the value to pass. There are two invalidation options:
Tip When you have many findings to invalidate, it may be best to use the Save option to invalidate them at the same time, once all findings are reviewed. Resolve Instructs Collibra DQ to mark the finding as an observation and prevents it from appearing in future runs. Resolving a finding does not immediately affect data quality scores. |
||||||
| Status | The status of your data quality item, for example, Observation. | ||||||
| Profile |
The user account that is assigned to this source finding. When the Status is Assigned, a user profile displays in this column. Note When a source finding is unassigned, the profile column is empty. |
Retrain
Updates the DQ models used for automated data-quality checks. The Retrain process consumes current training data (and optionally labels/config), builds a new model or rule-set, validates it, and publishes a new version that will be used to score incoming data and compute DQ metrics. Use Retrain when data distributions change, new labels/ground truth are available, business rules change, or on a scheduled cadence to prevent model drift.