Create a data quality rule with SQL Assistant for Data Quality
SQL Assistant for Data Quality uses Collibra AI to automatically write and troubleshoot SQL queries. When working on a Pullup dataset, the type of SQL that SQL Assistant for Data Quality generates depends on the rule type. For instance, if your rule type is Freeform SQL, the SQL generated by SQL Assistant for Data Quality is Spark SQL, whereas a Native SQL rule returns the SQL specific to the datasource. Likewise, Pushdown datasets always return SQL native to the job source.
The Prompts are organized in two sections: Basic and Advanced.
Basic options for SQL query suggestions provided by SQL Assistant for Data Quality include:
Option | Description |
---|---|
Generate Rule | Collibra AI suggests a Spark SQL query to check where in your table, column, or view your input criteria are met. |
Troubleshoot Rule |
When a SQL syntax error occurs, you can use SQL Assistant for Data Quality to automatically troubleshoot and fix the error. |
Advanced options for SQL query suggestions provided by SQL Assistant for Data Quality include:
Option | Description |
---|---|
Categorical | Collibra AI suggests a SQL query to detect categorical outliers. |
Dupe | Collibra AI suggests a SQL query to detect duplicate values. |
Record | Collibra AI suggests a SQL query to check for values that appear on a previous day but not for the next day. |
Pattern | Collibra AI suggests a SQL query to check for infrequent combinations that appear less than 5 percent of the time in the columns you specify. |
Frequency Distribution | Collibra AI suggests a SQL query to check for the frequency distribution of all values within a column. |
Requirements and permissions
To use SQL Assistant for Data Quality in Collibra DQ, you first need to:
- Ensure the AI_TENANT application configuration is set to TRUE in the Admin Console Configuration Settings.
- Have one of the following Collibra DQ roles assigned to your user account:
- ROLE_ADMIN
- ROLE_GENAI_USER
- Obtain the Vertex AI encryption key from your Collibra CSM.
- If your organization's security protocol prohibits outbound traffic, ensure you safelist network egress for oauth2.googleapis.com and us-central1-aiplatform.googleapis.com over port 443.
Note Collibra DQ administrators can add or update roles from the Admin Console Role Management Roles.
Steps
Basic options
- Generate Rule
- Troubleshoot Rule
- Open the Dataset Rules page of your dataset.
- From the Dataset Rules page, there are two ways to open the Rule Workbench.
- In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
The SQL Assistant for Data Quality tool appears. - From the Prompt dropdown menu, select Generate Rule.
- In the query input field, enter a plain language prompt.
- "high is more than 50"
- "find all of the unique values in Column1 and show how many records exist for each"
- "state is not Michigan, California, or Illinois"
- "compare the 2 dates 2018-01-13 to 2018-01-14 - write 2 separate queries for each date, using trade date - write as a common table expression for tables a and b - join using symbol column - use a join to identify which values are in 2018-01-14 that are not in 2018-01-13 - the goal is to find values that are missing from the previous day"
- Click Submit to Collibra AI.
SQL Assistant for Data Quality generates your SQL query and displays the suggested query when it completes. - Choose one of the following options:
- Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
- Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
- Click Close to exit the SQL Assistant for Data Quality tool without copying the suggested query.
- Rewrite your prompt and click Submit to Collibra AI to generate a new SQL rule suggestion.
Note Generate Rule is the default option.
Tip When there are many columns in your dataset, you may need to scroll down to the bottom of the input field to enter your column names.
- Examples
When a SQL syntax error occurs, you can use SQL Assistant for Data Quality to automatically troubleshoot and fix the error.
- Open the Dataset Rules page of your dataset.
- From the Dataset Rules page, there are two ways to open the Rule Workbench.
- In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
The Collibra AI tool appears. - In the SQL editor, enter an invalid SQL query that returns a syntax error.
- In the upper right corner of the SQL editor, click Collibra AI.
The SQL Assistant for Data Quality tool appears. - From the Prompt dropdown menu, select Troubleshoot Rule.
The SQL Assistant for Data Quality tool shows the prompt to rewrite and fix your SQL query, an overview of the exception message, the SQL query that caused the error, the table and column names in your query, and helpful troubleshooting tips. - Click Submit to Collibra AI.
SQL Assistant for Data Quality troubleshoots your SQL query and displays the suggested query when it completes. - Choose one of the following options:
- Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
- Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
- Click Close to exit the SQL Assistant for Data Quality tool without copying the suggested query.
- Rewrite your prompt and click Submit to Collibra AI to generate a new SQL rule suggestion.
Example SELECT * fomr public.nyse2 were onpe > 4.75
Note You can optionally click the above and to the right of the overview text to manually edit the troubleshooting prompt.
Advanced options
- Categorical
- Dupe
- Record
- Pattern
- Frequency Distribution
- Open the Dataset Rules page of your dataset.
- From the Dataset Rules page, there are two ways to open the Rule Workbench.
- In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
The SQL Assistant for Data Quality tool appears. - From the Prompt dropdown menu, select Categorical. The prompt automatically uses the correct SQL type of your rule. For instance, a Freeform rule utilizes Spark SQL, while data source-specific SQL adapts based on the connection type of your dataset.
- In the query input field, enter the names of the columns in your dataset to include in the SQL query.
- Click Submit to Collibra AI.
SQL Assistant for Data Quality generates your SQL query and displays the suggested query when it completes. - Choose one of the following options:
- Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
- Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
- Click Close to exit the SQL Assistant for Data Quality tool without copying the suggested query.
- Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.
Example In this example, the default placeholder column "sale_state" will be replaced with the user-defined "cm_claimant_age" in the categorical outlier SQL suggestion.
Tip When there are many columns in your dataset, you may need to scroll down to the bottom of the input field to enter your column names.
Example
- Open the Dataset Rules page of your dataset.
- From the Dataset Rules page, there are two ways to open the Rule Workbench.
- In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
The SQL Assistant for Data Quality tool appears. - From the Prompt dropdown menu, select Dupe. The prompt automatically uses the correct SQL type of your rule. For instance, a Freeform rule utilizes Spark SQL, while data source-specific SQL adapts based on the connection type of your dataset.
- In the query input field, enter the name of the column in your dataset to include in the SQL query.
- Click Submit to Collibra AI.
SQL Assistant for Data Quality generates your SQL query and displays the suggested query when it completes. - Choose one of the following options:
- Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
- Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
- Click Close to exit the SQL Assistant for Data Quality tool without copying the suggested query.
- Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.
Example In this example, the default placeholder column "email" will be replaced with the user-defined "cm_denied_reason1" in the duplicate value SQL suggestion.
Tip When there are many columns in your dataset, you may need to scroll down to the bottom of the input field to enter your column name.
Example
- Open the Dataset Rules page of your dataset.
- From the Dataset Rules page, there are two ways to open the Rule Workbench.
- In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
The SQL Assistant for Data Quality tool appears. - From the Prompt dropdown menu, select Record. The prompt automatically uses the correct SQL type of your rule. For instance, a Freeform rule utilizes Spark SQL, while data source-specific SQL adapts based on the connection type of your dataset.
- In the query input field, enter the names of the columns in your dataset to include in the SQL query.
- Click Submit to Collibra AI.
SQL Assistant for Data Quality generates your SQL query and displays the suggested query when it completes. - Choose one of the following options:
- Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
- Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
- Click Close to exit the SQL Assistant for Data Quality tool without copying the suggested query.
- Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.
Example In this example, the default placeholder columns "vendor_type" and "transaction_date" will be replaced with the user-defined "cm_claim_status" and "cm_bdos" in the SQL suggestion for record changes.
Important You must specify one column with numeric data and one date column.
Tip When there are many columns in your dataset, you may need to scroll down to the bottom of the input field to enter your column names.
Example
- Open the Dataset Rules page of your dataset.
- From the Dataset Rules page, there are two ways to open the Rule Workbench.
- In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
The SQL Assistant for Data Quality tool appears. - From the Prompt dropdown menu, select Pattern. The prompt automatically uses the correct SQL type of your rule. For instance, a Freeform rule utilizes Spark SQL, while data source-specific SQL adapts based on the connection type of your dataset.
- In the query input field, enter the names of the columns in your dataset to include in the SQL query.
- Click Submit to Collibra AI.
SQL Assistant for Data Quality generates your SQL query and displays the suggested query when it completes. - Choose one of the following options:
- Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
- Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
- Click Close to exit the SQL Assistant for Data Quality tool without copying the suggested query.
- Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.
Example In this example, the default placeholder columns "sale_state" and "state_tax" will be replaced with the user-defined "cm_diagnosis_code" and "cm_denied_reason1" in the SQL suggestion for pattern detection.
Tip When there are many columns in your dataset, you may need to scroll down to the bottom of the input field to enter your column names.
Example
- Open the Dataset Rules page of your dataset.
- From the Dataset Rules page, there are two ways to open the Rule Workbench.
- In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
The SQL Assistant for Data Quality tool appears. - From the Prompt dropdown menu, select Frequency Distribution. The prompt automatically uses the correct SQL type of your rule. For instance, a Freeform rule utilizes Spark SQL, while data source-specific SQL adapts based on the connection type of your dataset.
- In the query input field, enter the name of the column in your dataset to include in the SQL query.
- Click Submit to Collibra AI.
SQL Assistant for Data Quality generates your SQL query and displays the suggested query when it completes. - Choose one of the following options:
- Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
- Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
- Click Close to exit the SQL Assistant for Data Quality tool without copying the suggested query.
- Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.
Example In this example, a SQL query for frequency distribution will be suggested for the column "cm_denied_reason1".
Tip When there are many columns in your dataset, you may need to scroll down to the bottom of the input field to enter your column name.
Example