Supported SQL rule types
Data Quality & Observability offers different rule types to validate your data, depending on your specific needs:
- Freeform SQL: Apply complex conditions across multiple tables or columns for maximum flexibility.
- Stat rules: Efficiently access profile statistics without scanning the entire table.
Note Data Quality & Observability uses ANSI-compliant SQL.
Freeform SQL
Freeform SQL rules allow you to write basic or complex, custom SQL queries to validate data. This rule type offers the most flexibility, allowing you to perform joins, aggregations, and advanced logic that cannot be expressed with other rule types.
When you use freeform SQL, you write the full query. The rule passes if the query returns no rows or rows that meet your specific criteria, depending on your configuration.
Example The following query identifies customer_id values that appear more than once, which violates a uniqueness constraint:SELECT customer_id, COUNT (*)
FROM customers
GROUP BY customer_id
HAVING COUNT (*) > 1
Referencing secondary tables in freeform SQL
To validate data across multiple tables, you can add secondary tables to your freeform SQL rule. Secondary tables allow you to reference other tables from the same schema in your query, enabling cross-table consistency checks and referential integrity validation.
Example The following query checks for orphan records by joining the primary table (orders) with a secondary table (customers):SELECT o.order_id
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.id
WHERE c.id IS NULL
Stat rules
isNull and nullCount. Stat rules allow you to define this logic easily.
Example select * from @dataset where fname.$type != 'String' AND $rowCount < 800 uses the column-level stat rule, $type, and the job-level stat rule $rowCount, to form a SQL query that checks if the "fname" column contains fewer than 800 rows of data that are not strings.
Using job-level stat rules
Job-level rules provide observability into your data at the job level, enabling you to detect issues with the entire job. The following table shows job-level stat rules, their descriptions, and examples of their usage.
| Job-level stat rule | Description | Rule example |
|---|---|---|
| $totalTimeInSeconds | Breaks when the total execution time of the job in seconds exceeds the limit specified in the condition. | $totalTimeInSeconds > 25 |
| $totalTimeInMinutes | Breaks when the total execution time of the job in minutes exceeds the limit specified in the condition. | $totalTimeInMinutes > 5 |
| $totalTimeInHours | Breaks when the total execution time of the job in hours exceeds the limit specified in the condition. | $totalTimeInHours > 1 |
| $rowCount | Breaks when the total number of rows in the job does not meet the criteria specified in the condition. | $rowCount < 9000 |
| $runDate | Breaks when the date on which the job was executed matches the date specified in the condition. | $runDate='2025-01-24' |
| $daysWithoutData | Breaks when the number of consecutive days the job has contained no rows exceeds the limit specified in the condition. | $daysWithoutData > 4 |
| $runsWithoutData | Breaks when the number of consecutive job runs where the job contained no rows exceeds the limit specified in the condition. | $runsWithoutData > 4 |
| $daysSinceLastRun | Breaks when the number of days since the last job execution exceeds the limit specified in the condition. | $daysSinceLastRun > 4 |
Using column-level stat rules
Column-level rules provide observability into your data in a specific column. The following table shows column-level stat rules, their descriptions, and examples of their usage.
| Column-level Stat Rule | Description | Rule example |
|---|---|---|
| .$type | Breaks when the data type of the values in the column does not match the type specified in the condition. | symbol.$type != 'String' |
| .$min | Breaks when the minimum value in the column (based on lexicographical sort) meets the criteria specified in the condition. | symbol.$min > 'apple' |
| .$minNum |
Breaks when the minimum numeric value in the column (type-casted to numeric) meets the criteria specified in the condition. |
high.$minNum > 13 |
| .$mean | Breaks when the average (mean) value of the column meets the criteria specified in the condition. | row_id.$mean > '4.500' |
| .$max | Breaks when the maximum value in the column meets the criteria specified in the condition. | symbol.$max > 'apple' |
| .$maxNum | Breaks when the maximum numeric value in the column (type-casted to numeric) meets the criteria specified in the condition. | high.$maxNum > 13 |
| .$uniqueCount | Breaks when the count of distinct values in the column does not match the count specified in the condition. | id.$uniqueCount != $rowCount |
| .$uniqueRatio | Breaks when the ratio of distinct values to the total row count falls outside the range specified in the condition. | gender.$uniqueRatio between .4 and .6 |
| .$nullRatio | Breaks when the ratio of null values in the column falls outside the range specified in the condition. | lname.$nullRatio not between .4 and .6 |
| .$nullPercent | Breaks when the percentage of null values in the column falls outside the range specified in the condition. | lname.$nullPercent not between 40 and 60 |
| .$nullCount | Breaks when the count of null values in the column exceeds the limit specified in the condition. | symbol.$nullCount >= 1 |
| .$emptyRatio | Breaks when the ratio of empty values in the column exceeds the limit specified in the condition. | open.$emptyRatio > 0.2 |
| .$emptyPercent | Breaks when the percentage of empty values in the column exceeds the limit specified in the condition. | open.$emptyPercent > 20 |
| .$emptyCount | Breaks when the count of empty values in the column exceeds the limit specified in the condition. | open.$emptyCount > 100 |
| .$mixedTypeRatio | Breaks when the ratio of mixed data types (such as Strings and Ints) in the column exceeds the limit specified in the condition. | open.$mixedTypeRatio > 0.2 |
| .$mixedTypePercent |
Breaks when the percentage of mixed data types (such as Strings and Ints) in the column exceeds the limit specified in the condition. |
open.$mixedTypeRatio > 20 |
| .$mixedTypeCount | Breaks when the count of rows with mixed data types (such as Strings and Ints) exceeds the limit specified in the condition. | open.$mixedTypeCount >= 1 |
Using distribution rules
Distribution rules allow you to validate the frequency distribution of values within a specific column. This helps ensure that the data falls within expected ratios or counts for specific categories.
For example, in a large job that represents a general population, you might expect a "gender" column to contain roughly 40-60% "male" and 40-60% "female" values. While this relationship is difficult to express in plain SQL, you can use distribution rules to define this logic simply: gender['Male'].$uniquePercent between 40 and 60
The following table shows distribution rules, their descriptions, and examples of their usage.
| Column value-level | Description | Rule |
|---|---|---|
| .$uniqueCount | Breaks when the count of unique values for the specified category exceeds the limit specified in the condition. | credit_rating['FAIR'].$uniqueCount > 7
|
| .$uniquePercent | Breaks when the percentage of unique values for the specified category falls outside the range specified in the condition. | credit_rating['GOOD'].uniquePercent between 40 and 60
|