Adding Quick Rules
You can create quick rules from the Profile page based on common column-level checks, including unique count, valid values, and percentage distribution.
Steps
- From the search bar at the top of the page, enter the name of a dataset, click it when it appears in the results, and then click Profile to the right of the search bar. The Profile page of your dataset opens.
- In the first column of the Profile table, click the name of a column to expand it. The column expands.
- In between the column-level statistics and rules sections, click + Add Rule.
- Select an option and enter the required information in the modal.
- Click Save.
Unique rules checks the specified column to verify that it only contains unique values. This can help you identify columns containing duplicate values by measuring the number of unique values ($uniqueCount) against the number of rows ($rowCount). If the number of unique values is not the same as the number of rows in the column, this indicates that the column contains duplicates.
Setting | Description | Default Value |
---|---|---|
Dataset |
The name of the dataset that this unique rule queries. Note This field is not editable. |
N/A |
Rule Name | The name of your unique rule. | The naming convention uses the following format: [column_name]_unique |
Where |
The condition of the quick rule. Note This field is not editable. |
The condition uses the following format: [column_name].$uniqueCount != $rowCount |
Points | The number of points deducted from the total DQ score when your rule breaks. Use the default or select a value greater than 0. | 1 |
Percentage | Deducts points according to the point threshold you set against every percentage of records in violation of your rule. For example, If 25 of 100 records break your rule, and the values you set in both the Points and Percent input fields are 1, then a total of 25 points are deducted from your overall DQ score. | 1 |
Preview Limit | The number of available rows to preview when you drill into a rule record on the Findings page. | 1 |
Run Time Limit (minutes) | The number of minutes that a rule takes to process before it times out and proceeds to the next activity in the job. | 1 |
Valid value rules check a variety of range- or match-based conditions within your column. For example, a valid value rule based on matching (valid) values on a stock exchange dataset with a column called "trade_date" where trade_date NOT IN ('2018-01-16 00:00:00.00')
checks if any rows in the trade_date column do not match the date stamp of 2018-01-16 00:00:00.00. This can help filter columns containing many rows of data to only show the rows that do not meet the criteria of the valid value rule.
Another type of valid value rule is based on whether a value within a column falls outside a range of specified values. For example, using the same stock exchange dataset, a valid value rule that checks the range of values within a column called "high" where high not between 0.00000 and 387831
checks if any rows in the high column are not within the range of 0.00000 and 387831. Like the match-based valid value rule, this also helps to filter out values that fall outside the constraints of the query.
Setting | Description | Default Value |
---|---|---|
Dataset |
The name of the dataset that this valid value rule queries. Note This field is not editable. |
N/A |
Rule Name | The name of your valid value rule. | The naming convention uses the following format: [column_name]_valid_value |
Where |
The condition of your valid rule. |
The condition uses the following formats: [column_name] not between [value] and [value] or [column_name] NOT IN ('$rowValue') |
Points | The number of points deducted from the total DQ score when your rule breaks. Use the default or select a value greater than 0. | 1 |
Percentage | Deducts points according to the point threshold you set against every percentage of records in violation of your rule. For example, If 25 of 100 records break your rule, and the values you set in both the Points and Percent input fields are 1, then a total of 25 points are deducted from your overall DQ score. | 1 |
Preview Limit | The number of available rows to preview when you drill into a rule record on the Findings page. | 1 |
Run Time Limit (minutes) | The number of minutes that a rule takes to process before it times out and proceeds to the next activity in the job. | 1 |
Distribution rules check if the uniqueness percentage of a column falls outside of a specified range of possible values. Distribution rules can be set up as either a combined or individual rule.
In a combined distribution rule, you can select from a list of TopN and BottomN values to automatically add the value to a compound condition using the operator "OR" to include multiple checks in the condition. So, when the uniqueness percentage of any of the TopN or BottomN values you include in your condition does not meet the specified percentage range, the condition is satisfied.
In an individual distribution rule, you can select from the same list of TopN or BottomN column values as in the combined distribution rule, but instead of combining each check into one condition, each check is treated as its own rule.
Setting | Description | Default Value | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TopN Values |
The most frequently occurring values in a column. The following table details the columns in the TopN Values table.
|
N/A | ||||||||||||
Dataset |
The name of the dataset that this unique rule queries. Note This field is not editable. |
N/A |
Under the Show Rule section, select Combined to combine all of the TopN or BottomN values you selected in the Add rule? column into one rule, or select Individual to create individual rules per TopN or BottomN value.
Setting | Description | Default Value |
---|---|---|
Rule Name | The name of the quick rule. |
The naming convention uses the following format: [column_name]_valid_value Note If you select Individual Rule and you include multiple TopN values, then multiple Rule Name columns will display. |
Where |
The condition of the quick rule. |
The condition uses the following formats: $columnName['$rowValue'].$uniquePercent not between [value] and [value] Note If you select Individual Rule and you include multiple TopN values, then multiple Where columns will display. |
Points | The number of points deducted from the total DQ score when your rule breaks. Use the default or select a value greater than 0. | 1 |
Percentage | Deducts points according to the point threshold you set against every percentage of records in violation of your rule. For example, If 25 of 100 records break your rule, and the values you set in both the Points and Percent input fields are 1, then a total of 25 points are deducted from your overall DQ score. | 1 |
Note Your quick rule will be checked against your dataset the next time your job runs.