Adding Quick Rules

You can create quick rules from the Profile page based on common column-level checks, including unique count, valid values, and percentage distribution.

Steps

  1. From the search bar at the top of the page, enter the name of a dataset, click it when it appears in the results, and then click Profile to the right of the search bar.
  2. The Profile page of your dataset opens.
  3. In the first column of the Profile table, click the name of a column to expand it.
  4. The column expands.
  5. In between the column-level statistics and rules sections, click + Add Rule.
  6. Select an option and enter the required information in the modal.

    Unique rules checks the specified column to verify that it only contains unique values. This can help you identify columns containing duplicate values by measuring the number of unique values ($uniqueCount) against the number of rows ($rowCount). If the number of unique values is not the same as the number of rows in the column, this indicates that the column contains duplicates.

    Setting Description Default Value
    Dataset

    The name of the dataset that this unique rule queries.

    Note This field is not editable.

    N/A

    Rule Name The name of your unique rule. The naming convention uses the following format: [column_name]_unique
    Where

    The condition of the quick rule.

    Note This field is not editable.

    The condition uses the following format: [column_name].$uniqueCount != $rowCount
    Points The number of points deducted from the total DQ score when your rule breaks. Use the default or select a value greater than 0. 1
    Percentage Deducts points according to the point threshold you set against every percentage of records in violation of your rule. For example, If 25 of 100 records break your rule, and the values you set in both the Points and Percent input fields are 1, then a total of 25 points are deducted from your overall DQ score. 1
    Preview Limit The number of available rows to preview when you drill into a rule record on the Findings page. 1
    Run Time Limit (minutes) The number of minutes that a rule takes to process before it times out and proceeds to the next activity in the job. 1

    Valid value rules check a variety of range- or match-based conditions within your column. For example, a valid value rule based on matching (valid) values on a stock exchange dataset with a column called "trade_date" where trade_date NOT IN ('2018-01-16 00:00:00.00') checks if any rows in the trade_date column do not match the date stamp of 2018-01-16 00:00:00.00. This can help filter columns containing many rows of data to only show the rows that do not meet the criteria of the valid value rule.

    Another type of valid value rule is based on whether a value within a column falls outside a range of specified values. For example, using the same stock exchange dataset, a valid value rule that checks the range of values within a column called "high" where high not between 0.00000 and 387831 checks if any rows in the high column are not within the range of 0.00000 and 387831. Like the match-based valid value rule, this also helps to filter out values that fall outside the constraints of the query.

    Setting Description Default Value
    Dataset

    The name of the dataset that this valid value rule queries.

    Note This field is not editable.

    N/A

    Rule Name The name of your valid value rule. The naming convention uses the following format: [column_name]_valid_value
    Where

    The condition of your valid rule.

    The condition uses the following formats: [column_name] not between [value] and [value] or [column_name] NOT IN ('$rowValue')
    Points The number of points deducted from the total DQ score when your rule breaks. Use the default or select a value greater than 0. 1
    Percentage Deducts points according to the point threshold you set against every percentage of records in violation of your rule. For example, If 25 of 100 records break your rule, and the values you set in both the Points and Percent input fields are 1, then a total of 25 points are deducted from your overall DQ score. 1
    Preview Limit The number of available rows to preview when you drill into a rule record on the Findings page. 1
    Run Time Limit (minutes) The number of minutes that a rule takes to process before it times out and proceeds to the next activity in the job. 1

    Distribution rules check if the uniqueness percentage of a column falls outside of a specified range of possible values. Distribution rules can be set up as either a combined or individual rule.

    In a combined distribution rule, you can select from a list of TopN and BottomN values to automatically add the value to a compound condition using the operator "OR" to include multiple checks in the condition. So, when the uniqueness percentage of any of the TopN or BottomN values you include in your condition does not meet the specified percentage range, the condition is satisfied.

    In an individual distribution rule, you can select from the same list of TopN or BottomN column values as in the combined distribution rule, but instead of combining each check into one condition, each check is treated as its own rule.

    Setting Description Default Value
    TopN Values

    The most frequently occurring values in a column. The following table details the columns in the TopN Values table.

    Column Description
    Value The value of a row in the selected column.
    Count The number of times the value occurs in the column.
    % The percentage of occurrences of the value within the selected column.
    Change %

    Set a distributed range percentage to check for values that are outside of the +/- of that value, applied to the % column.

    Example If a value in the % column is 2.273 and you set the Change % to 10, then you can expect the condition in the Where column to be .$uniquePercent not between 2.045 and 2.500 because 2.045 and 2.500 are respectively -10% and +10% of 2.273.

    Add rule?

    Select the checkbox option to include the rule in the Where condition. The value you set in the Change % field determines the not between distribution values when you add the rule to the Where condition.

    N/A
    Dataset

    The name of the dataset that this unique rule queries.

    Note This field is not editable.

    N/A

    Under the Show Rule section, select Combined to combine all of the TopN or BottomN values you selected in the Add rule? column into one rule, or select Individual to create individual rules per TopN or BottomN value.

    Setting Description Default Value
    Rule Name The name of the quick rule.

    The naming convention uses the following format: [column_name]_valid_value

    Note If you select Individual Rule and you include multiple TopN values, then multiple Rule Name columns will display.

    Where

    The condition of the quick rule.

    The condition uses the following formats: $columnName['$rowValue'].$uniquePercent not between [value] and [value]

    Note If you select Individual Rule and you include multiple TopN values, then multiple Where columns will display.

    Points The number of points deducted from the total DQ score when your rule breaks. Use the default or select a value greater than 0. 1
    Percentage Deducts points according to the point threshold you set against every percentage of records in violation of your rule. For example, If 25 of 100 records break your rule, and the values you set in both the Points and Percent input fields are 1, then a total of 25 points are deducted from your overall DQ score. 1

  7. Click Save.

Note Your quick rule will be checked against your dataset the next time your job runs.