Supported SQL rule types

Updated: January 30, 2026

Data Quality & Observability offers different rule types to validate your data, depending on your specific needs:

Freeform SQL: Apply complex conditions across multiple tables or columns for maximum flexibility.
Stat rules: Efficiently access profile statistics without scanning the entire table.

Note Data Quality & Observability uses ANSI-compliant SQL.

Freeform SQL

Freeform SQL rules allow you to write basic or complex, custom SQL queries to validate data. This rule type offers the most flexibility, allowing you to perform joins, aggregations, and advanced logic that cannot be expressed with other rule types.

When you use freeform SQL, you write the full query. The rule passes if the query returns no rows or rows that meet your specific criteria, depending on your configuration.

Example The following query identifies customer_id values that appear more than once, which violates a uniqueness constraint:

SELECT customer_id, COUNT (*) FROM customers GROUP BY customer_id HAVING COUNT (*) > 1

Referencing secondary tables in freeform SQL

To validate data across multiple tables, you can add secondary tables to your freeform SQL rule. Secondary tables allow you to reference other tables from the same schema in your query, enabling cross-table consistency checks and referential integrity validation.

Example The following query checks for orphan records by joining the primary table (orders) with a secondary table (customers):

SELECT o.order_id FROM orders o LEFT JOIN customers c ON o.customer_id = c.id WHERE c.id IS NULL

Stat rules

Stat rules allow you to validate data profile statistics quickly and efficiently. Examples of these statistics include row counts, null ratios, and data types. Because stat rules rely on metadata rather than scanning the entire table, they typically return results in under a second. In contrast, standard SQL often requires complex casting operations to check these metrics, for example, distinguishing between isNull and nullCount. Stat rules allow you to define this logic easily.

Example select * from @dataset where fname.$type != 'String' AND $rowCount < 800 uses the column-level stat rule, $type, and the job-level stat rule $rowCount, to form a SQL query that checks if the "fname" column contains fewer than 800 rows of data that are not strings.

Using job-level stat rules

Job-level rules provide observability into your data at the job level, enabling you to detect issues with the entire job. The following table shows job-level stat rules, their descriptions, and examples of their usage.

Job-level stat rule	Description	Rule example
$totalTimeInSeconds	Breaks when the total execution time of the job in seconds exceeds the limit specified in the condition.	$totalTimeInSeconds > 25
$totalTimeInMinutes	Breaks when the total execution time of the job in minutes exceeds the limit specified in the condition.	$totalTimeInMinutes > 5
$totalTimeInHours	Breaks when the total execution time of the job in hours exceeds the limit specified in the condition.	$totalTimeInHours > 1
$rowCount	Breaks when the total number of rows in the job does not meet the criteria specified in the condition.	$rowCount < 9000
$runDate	Breaks when the date on which the job was executed matches the date specified in the condition.	$runDate='2025-01-24'
$daysWithoutData	Breaks when the number of consecutive days the job has contained no rows exceeds the limit specified in the condition.	$daysWithoutData > 4
$runsWithoutData	Breaks when the number of consecutive job runs where the job contained no rows exceeds the limit specified in the condition.	$runsWithoutData > 4
$daysSinceLastRun	Breaks when the number of days since the last job execution exceeds the limit specified in the condition.	$daysSinceLastRun > 4

Using column-level stat rules

Column-level rules provide observability into your data in a specific column. The following table shows column-level stat rules, their descriptions, and examples of their usage.

Column-level Stat Rule	Description	Rule example
.$type	Breaks when the data type of the values in the column does not match the type specified in the condition.	symbol.$type != 'String'
.$min	Breaks when the minimum value in the column (based on lexicographical sort) meets the criteria specified in the condition.	symbol.$min > 'apple'
.$minNum	Breaks when the minimum numeric value in the column (type-casted to numeric) meets the criteria specified in the condition.	high.$minNum > 13
.$mean	Breaks when the average (mean) value of the column meets the criteria specified in the condition.	row_id.$mean > '4.500'
.$max	Breaks when the maximum value in the column meets the criteria specified in the condition.	symbol.$max > 'apple'
.$maxNum	Breaks when the maximum numeric value in the column (type-casted to numeric) meets the criteria specified in the condition.	high.$maxNum > 13
.$uniqueCount	Breaks when the count of distinct values in the column does not match the count specified in the condition.	id.$uniqueCount != $rowCount
.$uniqueRatio	Breaks when the ratio of distinct values to the total row count falls outside the range specified in the condition.	gender.$uniqueRatio between .4 and .6
.$nullRatio	Breaks when the ratio of null values in the column falls outside the range specified in the condition.	lname.$nullRatio not between .4 and .6
.$nullPercent	Breaks when the percentage of null values in the column falls outside the range specified in the condition.	lname.$nullPercent not between 40 and 60
.$nullCount	Breaks when the count of null values in the column exceeds the limit specified in the condition.	symbol.$nullCount >= 1
.$emptyRatio	Breaks when the ratio of empty values in the column exceeds the limit specified in the condition.	open.$emptyRatio > 0.2
.$emptyPercent	Breaks when the percentage of empty values in the column exceeds the limit specified in the condition.	open.$emptyPercent > 20
.$emptyCount	Breaks when the count of empty values in the column exceeds the limit specified in the condition.	open.$emptyCount > 100
.$mixedTypeRatio	Breaks when the ratio of mixed data types (such as Strings and Ints) in the column exceeds the limit specified in the condition.	open.$mixedTypeRatio > 0.2
.$mixedTypePercent	Breaks when the percentage of mixed data types (such as Strings and Ints) in the column exceeds the limit specified in the condition.	open.$mixedTypeRatio > 20
.$mixedTypeCount	Breaks when the count of rows with mixed data types (such as Strings and Ints) exceeds the limit specified in the condition.	open.$mixedTypeCount >= 1

Using distribution rules

Distribution rules allow you to validate the frequency distribution of values within a specific column. This helps ensure that the data falls within expected ratios or counts for specific categories.

For example, in a large job that represents a general population, you might expect a "gender" column to contain roughly 40-60% "male" and 40-60% "female" values. While this relationship is difficult to express in plain SQL, you can use distribution rules to define this logic simply: gender['Male'].$uniquePercent between 40 and 60

The following table shows distribution rules, their descriptions, and examples of their usage.

Column value-level	Description	Rule
.$uniqueCount	Breaks when the count of unique values for the specified category exceeds the limit specified in the condition.	`credit_rating['FAIR'].$uniqueCount > 7`
.$uniquePercent	Breaks when the percentage of unique values for the specified category falls outside the range specified in the condition.	`credit_rating['GOOD'].uniquePercent between 40 and 60`