Rules

Collibra DQ takes a strong stance that data should first be profiled, auto-discovered and learned before applying basic rules. This methodology commonly removes thousands of rules that will never need to be written and evolve naturally overtime. However there are still many cases to add a simple rule, complex rule or domain specific rule.

The score on the Rules tab is the rule percentage (% column) multiplied by the number of points (Points column), rounded to a whole number. If a rule with a Percent Scoring Type does not meet or exceed its defined percentage, the score on the Rules tab remains 0.

The following table describes the information available on the Rules tab of the Findings page.

Column Description
Rule Name The name of the rule.
Condition

The rule condition that is defined in the rule. An alert generates when the condition is met.

In Pullup mode, the Condition column is derived from the rule’s SQL column.

In Pushdown mode:

  • If the rule is PASSING, the Condition column is derived from the rule’s SQL column.
  • If the rule is BREAKING:
    • If the Rule Type is SQLG, the Condition column is derived from the rule's SQL column.
    • If the Rule Type is a Check type (EMPTYCHECK, NULLCHECK, DATECHECK, INTCHECK, DOUBLECHECK, STRINGCHECK), the Condition column is derived from the rule's SQL column.
    • If the Rule Type is a Data Class (CUSTOM), the Condition column displays the Regular Expression.
    • If the Rule Type is SQLF, the Condition column displays the SQL query.
Points The number of points to deduct from the quality score for a given rule break.
Perc The ratio of the total number of breaking records over the total number of rows.
Breaking Records The number of rows with records that did not pass the conditions of the rule. If the status of the rule is Exception or Passing, then the value should be 0.
Passing Records The number of rows with records that passed the conditions of the rule.
Status

The status of the rule. The following table shows the possible statuses:

Status Description
passing status Passing indicates that the dataset passes the rule check and no points are deducted from the data quality score.
exception status Exception indicates that there is an error with the rule check. Expand the row to view the error message.
breaking status Breaking indicates that the rule has detected an anomaly in a column and points are deducted from the data quality score. Expand the row to view the data preview of the break record.
Dimension The DQ Dimension that is identified in the rule finding, for example, Completeness.
State Shows if a rule is enabled or disabled.
Status Allows you to validate or resolve an observation and, when applicable, assign it to a user for further analysis.
Profile

The user account that is assigned to this rule finding. When the Status is Assigned, a user profile displays in this column.

Note When a rule finding is unassigned, the profile column is empty.

Action

Rule Breaks lets you preview the rule break export file and download either a CSV or JSON file, depending on your processing mode, giving you more control over how you use and share break records.

  • In Pullup mode, you can download a CSV file containing details of the break records.
  • In Pushdown mode, you can download either a CSV or JSON file containing details of the break records.

Note The ability to copy rule breaks is limited to secure Cloud Native deployments of Collibra DQ. You cannot copy rule breaks in Standalone deployments.

Rule Discovery

Rule Discovery detects the data classes assigned to a selected data category. The Rule Discovery algorithm automatically selects the best match if a column matches two or more data classes. Data class match criteria are determined by percent match and name distance.

Click Rule Discovery, then select an option from the Data Category dropdown menu. Click Run Discovery to assign your selection as a data category and run the discovery job.

Break records in the PostgreSQL Metastore

When a rule returns breaking records, the following query inserts unique records into the PostgreSQL Metastore rule_breaks table based on dataset, run_id, rule_nm, and link_id:

INSERT INTO rule_breaks (dataset, run_id, rule_nm, link_id)
VALUES (:dataset, :runId, :ruleNm, :linkId)
ON CONFLICT (dataset, run_id, rule_nm, link_id)
DO UPDATE SET 
    dataset = :dataset,
    run_id  = :runId,
    rule_nm = :ruleNm,
    link_id = :linkId
WHERE 
    rule_breaks.dataset = :dataset
    AND rule_breaks.run_id = :runId
    AND rule_breaks.rule_nm = :ruleNm
    AND rule_breaks.link_id = :linkId;

Note Because only unique records are inserted into the rule_breaks table, the number of records in the rule_breaks table might not match the number of breaking records displayed on the Findings page.

Pulse View Preview

The Pulse View gives you a data preview box plot for simple rules that are breaking. You can click any available box to drill into the data of that day.

Note Pulse View Preview is not available for passing rules or runs without data.

Exporting rule break records

There are three options to export the details of your rule break records as .xlsx files:

  • Export generates an Excel file with the details from the drill-in.
  • Export LinkIds generates an Excel file with the name of the dataset, Run Id, Rule Name, and Link Id, when available.
  • Export with Details generates an Excel file with the details from the drill-in and the data preview, when available.

Note The ability to copy rule breaks is limited to secure Cloud Native deployments of Collibra DQ. You cannot copy rule breaks in Standalone deployments.

Limitations

Total records are calculated by adding the total number of breaking and passing records. Therefore, when referencing a secondary dataset in a native rule, the extra rows from the secondary dataset are included in the query results on the Findings page, skewing the total rows calculation and the percentage of breaking records.