Working with Dataset Rules
The Dataset Rules page lists all previously saved rules for a given dataset and provides an overview of their details, such as the definitions of SQL conditions, rule types, and whether or not rules pass validation checks. You can also take several quick actions from this page, such as accessing the Rule Workbench or deleting a rule from a dataset. The ability to manage rules from a single page can shorten the amount of time you spend assessing what is needed to meet your business requirements.
In this topic
Opening Dataset Rules
The following steps show you how to open the list of applicable rules on your dataset.
- Click the
in the sidebar menu, then click Rule Builder.
The Dataset Rules page opens.
- On the Dataset Rules page, enter the name of your dataset in the Search for a Dataset searchbar, then select it. The Dataset Rules page displays any existing rules on your dataset.
Viewing rule details
When one or more rules are available on the selected dataset, the table on the Dataset Rules page contains the following detailed information.
Column | Description |
---|---|
Rule Name |
The name of the rule. Hover your cursor over the rule name and click |
Rule Query | The SQL condition of the rule. |
Type | The type of rule. |
Column | The primary column that the rule queries. |
Repo | The data class or template from which the rule is created. This only applies to custom rules, such as Data Type, Data Class, and Template. |
Valid |
Shows whether the rule passes rule validation.
Note If you see |
Active |
Shows whether the rule is active for future runs of the dataset. Click the icon to change the active status of the rule.
|
Scoring Type | Shows whether the rule uses absolute- or percentage-based scoring. For more information about scoring types, go to Create a data quality rule and Understanding the rules score. |
Tolerance | Specifies the tolerance threshold as a percentage. For more information, go to Understanding the rules score. |
Points |
The number of points that Collibra DQ deducts from the data quality score when data breaches the conditions of the rule. You can set this value on the Rule Details modal of the Rule Workbench. For more information, go to Understanding the rules score. If you do not customize this value, then Collibra DQ uses the default value of 1. |
% |
The ratio of the total number of breaking records over the total number of rows. For more information, go to Understanding the rules score. If you do not customize this value on the Rule Details modal, then Collibra DQ uses the default value of 1. |
Category |
The data category that you optionally define on the Workbench. |
Dimension |
The DQ Dimension that you optionally assign to the rule on the Workbench. Note Tagging rules with custom dimensions in the Metastore is not supported. |
Timeout Limit |
The number of minutes that any active rule can take to process before it times out and the tool automatically cancels the job. This limit is useful for keeping problematic rules from consuming too many resources. The maximum timeout limit is determined by the greatest Timeout Limit value of any active rule displayed on the table under the Rules tab. Example If there are 3 active rules associated with a dataset, where Rule 1 has a timeout limit of 20 minutes, Rule 2 has a limit of 30 minutes, and Rule 3 has a limit of 60 minutes, then the dataset has 60 minutes per rule to process. If Rule 3 is inactive, the maximum timeout limit for the dataset is 30 minutes, because Rule 2 has the next highest maximum limit of any active rule. Tip You can increase this value for an individual rule from Rule Details on the Rule Workbench. |
Actions |
Click Actions to edit, delete, rename, and view the history of the rule. Note
Warning When you rename a rule, all associated rule output records and rule break records on the Findings page, as well as exports stored in the Metastore, are renamed. To retain the history with the original rule name, create a new rule and deactivate the original one. |
Adding new rules
Click Add Rule to create new rules to run against your dataset.