Stat Rules

Stat rules are a powerful method for accessing profile statistics. These are typically sub-second operations that do not require scanning or iterating. There are several cases where SQL struggles to support rules, such as: isNull but not "null count" or nullRatio or nullPercent. Or having access to types without doing crazy cast() operations.

For example, here is a simple stat rule that breaks when the symbol column contains less than 800 rows of data that are not strings:

Copy

select * from @dataset where 
fname.$type != 'String' AND $rowCount < 800

Dataset-level stat rules

Provides observability into your data at the dataset level, enabling you to detect issues with the entire dataset.

Dataset-level Stat Rule	Rule Example	Description of Rule Example
$totalTimeInSeconds	$totalTimeInSeconds > 25	Breaks when the DQ job runs longer than 25 seconds.
$totalTimeInMinutes	$totalTimeInMinutes > 5	Breaks when the DQ job runs longer than 5 minutes.
$totalTimeInHours	$totalTimeInHours > 1	Breaks when the DQ job runs longer than 1 hour.
$rowCount	$rowCount < 9000	Breaks when the row count less than 9,000.
$runDate	$runDate='2025-01-24'	Applies the rule to DQ jobs run on 2025-01-24.
$daysWithoutData	$daysWithoutData > 4	Breaks when a dataset has missing or no rows for 5 days.
$runsWithoutData	$runsWithoutData > 4	Breaks when a dataset runs but has missing or no rows for 5 days.
$daysSinceLastRun	$daysSinceLastRun > 4	Breaks when a dataset has not run for 5 days.

Column-level stat rules

Provides observability into your data at the column level.

Column-level Stat Rule	Rule Example	Description of Rule Example
.$type	symbol.$type != 'String'	Breaks when a row in the symbol column is not a string.
.$min	symbol.$min > 'apple'	Breaks when a row in the symbol column is greater than a given value. $min performs a lexicographical sort that works for strings and numbers.
.$minNum	high.$minNum > 13	Breaks when a row in the age column is greater than 13. The row is type-casted to a numeric for simple number checks.
.$mean	row_id.$mean > '4.500'	Breaks when the mean is greater than a given value.
.$max	symbol.$max > 'apple'	Breaks when a row in the symbol column is greater than a given value.
.$maxNum	high.$maxNum > 13	Breaks when the numeric value falls outside an acceptable range.
.$uniqueCount	id.$uniqueCount != $rowCount	Breaks when the uniqueCount of a field doesn't match the rowCount.
.$uniqueRatio	gender.$uniqueRatio between .4 and .6	Breaks when the ratio of uniqueCounts of a given field doesn't match the rowCount.
.$nullRatio	lname.$nullRatio not between .4 and .6	Breaks when the ratio of nulls no longer falls within acceptable range.
.$nullPercent	lname.$nullPercent not between 40 and 60	Breaks when the percent of nulls no longer falls within acceptable range
.$nullCount	symbol.$nullCount >= 1	Test for a single null.
.$emptyRatio	open.$emptyRatio > 0.2	Breaks when the ratio of empties no longer falls within acceptable range.
.$emptyPercent	open.$emptyPercent > 20	Breaks when the percent of empties no longer falls within an acceptable range.
.$emptyCount	open.$emptyCount > 100	Breaks when the emptyCounts of a field no longer fall within an acceptable range.
.$mixedTypeRatio	open.$mixedTypeRatio > 0.2	Breaks when the ratio of mixed data types no longer falls within an acceptable range. For example, Strings and Ints in the same field.
.$mixedTypePercent	open.$mixedTypeRatio > 20	Breaks when the percent of mixed data types no longer falls within an acceptable range. For example, Strings and Ints in the same field.
.$mixedTypeCount	open.$mixedTypeCount >= 1	Breaks when the mixed data typeCount no longer falls within an acceptable range. For example, Strings and Ints in the same field.

Distribution Rule

There is a common case in DQ where you want to know the distribution of a column's value. Consider gender. It can be expected that a column named gender consists of roughly 40-60% males and roughly 40-60% females if the data set is large and represents the population. This can be difficult to express in plain SQL, but it is very easy with the below syntax.

Copy

gender['Male'].$uniquePercent between 40 and 60

Column Value-level	Rule
.$uniqueCount	credit_rating['FAIR'].$uniqueCount > 7
.$uniquePercent	credit_rating['GOOD'].uniquePercent between 40 and 60

Known Limitations

The $daysWithoutData stat rule has a limitation where rules that are not named "daysWithoutData" do not display their Pulse View charts when you drill into the rule on the Findings page. A possible workaround for this is to name your stat rule exactly as it appears here: daysWithoutData.
You cannot combine stat rules or distribution rules with regex rules in the same rule. For example, car_vin rlike '$[asdf][0-9]' and car_vin.$uniqueCount
If the row count of a dataset is zero, the rule may still display as BREAKING if other conditions in the rule are met. For example, if rowCount = 0 and the rule is $daysWithoutData > 0, and $daysWithoutData has a value greater than 0, then the condition (for example, 2 > 0) will be true and trigger a BREAKING status. This behavior applies to other stat rules as well such as $runsWithoutData or $daysSinceLastRun.