Create a data quality rule with SQL Assistant for Data Quality

SQL Assistant for Data Quality uses Collibra AI to automatically write and troubleshoot SQL queries. When working on a Pullup dataset, the type of SQL that SQL Assistant for Data Quality generates depends on the rule type. For instance, if your rule type is Freeform SQL, the SQL generated by SQL Assistant for Data Quality is Spark SQL, whereas a Native SQL rule returns the SQL specific to the datasource. Likewise, Pushdown datasets always return SQL native to the job source.

The Prompts are organized in two sections: Basic and Advanced.

Basic options for SQL query suggestions provided by SQL Assistant for Data Quality include:

Option Description
Generate Rule Collibra AI suggests a Spark SQL query to check where in your table, column, or view your input criteria are met.
Troubleshoot Rule

When a SQL syntax error occurs, you can use SQL Assistant for Data Quality to automatically troubleshoot and fix the error.

Advanced options for SQL query suggestions provided by SQL Assistant for Data Quality include:

Option Description
Categorical Collibra AI suggests a SQL query to detect categorical outliers.
Dupe Collibra AI suggests a SQL query to detect duplicate values.
Record Collibra AI suggests a SQL query to check for values that appear on a previous day but not for the next day.
Pattern Collibra AI suggests a SQL query to check for infrequent combinations that appear less than 5 percent of the time in the columns you specify.
Frequency Distribution Collibra AI suggests a SQL query to check for the frequency distribution of all values within a column.

Requirements and permissions

The architecture of SQL Assistant for Data Quality differs depending on whether you use the platform path with a Collibra Platform integration or the beta path with a Collibra-provided encrypted service account key.

To use SQL Assistant for Data Quality with a Collibra Platform integration, you need to:

  • Have an active integration with Collibra Platform configured on the Admin Console > Integrations screen.
  • Note Only the Credentials step of the Integration Setup is required. You do not need to map connections, tenants, dimensions, or layers to use SQL Assistant for Data Quality.

  • Set AI_PLATFORM_PATH to TRUE (default) on the Admin Console > Application Configuration Settings screen.
  • Set AI_TENANT to TRUE on the Admin Console > Application Configuration Settings screen.
  • Update the gai_proxy_endpoint in the owl-env.sh or Helm Chart, depending on your installation type.

Important On April 9, 2025, Google will deprecate the Vertex text-bison AI model, which the beta path option uses. To continue using SQL Assistant for Data Quality, you must switch to the platform path, which requires an integration with Collibra Platform.

When using SQL Assistant for Data Quality without a Collibra Platform integration, you have:

Steps

Basic options

  1. Open the Dataset Rules page of your dataset.
  2. From the Dataset Rules page, there are two ways to open the Rule Workbench.
    1. From an existing rule on the Rules tab, click Actions, then click Edit.
      The Rule Workbench opens with the preferences of your existing rule.
    2. Click Add Rule in the upper right corner to create a new rule.
      The Rule Workbench opens.
  3. In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
    The SQL Assistant for Data Quality tool appears.
  4. From the Prompt dropdown menu, select Generate Rule.
  5. Note Generate Rule is the default option.

  6. In the query input field, enter a plain language prompt.
  7. Tip When there are many columns in your dataset, you may need to scroll down to the bottom of the input field to enter your column names.

      Examples
      • "high is more than 50"
      • "find all of the unique values in Column1 and show how many records exist for each"
      • "state is not Michigan, California, or Illinois"
      • "compare the 2 dates 2018-01-13 to 2018-01-14 - write 2 separate queries for each date, using trade date - write as a common table expression for tables a and b - join using symbol column - use a join to identify which values are in 2018-01-14 that are not in 2018-01-13 - the goal is to find values that are missing from the previous day"
  8. Click Submit to Collibra AI.
    SQL Assistant for Data Quality generates your SQL query and displays the suggested query when it completes.
  9. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL Assistant for Data Quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL rule suggestion.

When a SQL syntax error occurs, you can use SQL Assistant for Data Quality to automatically troubleshoot and fix the error.

  1. Open the Dataset Rules page of your dataset.
  2. From the Dataset Rules page, there are two ways to open the Rule Workbench.
    1. From an existing rule on the Rules tab, click Actions, then click Edit.
      The Rule Workbench opens with the preferences of your existing rule.
    2. Click Add Rule in the upper right corner to create a new rule.
      The Rule Workbench opens.
  3. In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
    The Collibra AI tool appears.
  4. In the SQL editor, enter an invalid SQL query that returns a syntax error.
  5. Example SELECT * fomr public.nyse2 were onpe > 4.75

  6. In the upper right corner of the SQL editor, click Collibra AI.
    The SQL Assistant for Data Quality tool appears.
  7. From the Prompt dropdown menu, select Troubleshoot Rule.
    The SQL Assistant for Data Quality tool shows the prompt to rewrite and fix your SQL query, an overview of the exception message, the SQL query that caused the error, the table and column names in your query, and helpful troubleshooting tips.
  8. Note You can optionally click the above and to the right of the overview text to manually edit the troubleshooting prompt.

  9. Click Submit to Collibra AI.
    SQL Assistant for Data Quality troubleshoots your SQL query and displays the suggested query when it completes.
  10. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL Assistant for Data Quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL rule suggestion.

Advanced options

  1. Open the Dataset Rules page of your dataset.
  2. From the Dataset Rules page, there are two ways to open the Rule Workbench.
    1. From an existing rule on the Rules tab, click Actions, then click Edit.
      The Rule Workbench opens with the preferences of your existing rule.
    2. Click Add Rule in the upper right corner to create a new rule.
      The Rule Workbench opens.
  3. In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
    The SQL Assistant for Data Quality tool appears.
  4. From the Prompt dropdown menu, select Categorical.
  5. The prompt automatically uses the correct SQL type of your rule. For instance, a Freeform rule utilizes Spark SQL, while data source-specific SQL adapts based on the connection type of your dataset.
  6. In the query input field, enter the names of the columns in your dataset to include in the SQL query.
  7. Example In this example, the default placeholder column "sale_state" will be replaced with the user-defined "cm_claimant_age" in the categorical outlier SQL suggestion.
    An example of a prompt for categorical outliers

    Tip When there are many columns in your dataset, you may need to scroll down to the bottom of the input field to enter your column names.

  8. Click Submit to Collibra AI.
    SQL Assistant for Data Quality generates your SQL query and displays the suggested query when it completes.
  9. Example 
    Example of the SQL suggestion for categorical

  10. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL Assistant for Data Quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.
  1. Open the Dataset Rules page of your dataset.
  2. From the Dataset Rules page, there are two ways to open the Rule Workbench.
    1. From an existing rule on the Rules tab, click Actions, then click Edit.
      The Rule Workbench opens with the preferences of your existing rule.
    2. Click Add Rule in the upper right corner to create a new rule.
      The Rule Workbench opens.
  3. In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
    The SQL Assistant for Data Quality tool appears.
  4. From the Prompt dropdown menu, select Dupe.
  5. The prompt automatically uses the correct SQL type of your rule. For instance, a Freeform rule utilizes Spark SQL, while data source-specific SQL adapts based on the connection type of your dataset.
  6. In the query input field, enter the name of the column in your dataset to include in the SQL query.
  7. Example In this example, the default placeholder column "email" will be replaced with the user-defined "cm_denied_reason1" in the duplicate value SQL suggestion.
    An example of a prompt for duplicate values

    Tip When there are many columns in your dataset, you may need to scroll down to the bottom of the input field to enter your column name.

  8. Click Submit to Collibra AI.
    SQL Assistant for Data Quality generates your SQL query and displays the suggested query when it completes.
  9. Example 
    Example of the SQL suggestion for dupes

  10. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL Assistant for Data Quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.
  1. Open the Dataset Rules page of your dataset.
  2. From the Dataset Rules page, there are two ways to open the Rule Workbench.
    1. From an existing rule on the Rules tab, click Actions, then click Edit.
      The Rule Workbench opens with the preferences of your existing rule.
    2. Click Add Rule in the upper right corner to create a new rule.
      The Rule Workbench opens.
  3. In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
    The SQL Assistant for Data Quality tool appears.
  4. From the Prompt dropdown menu, select Record.
  5. The prompt automatically uses the correct SQL type of your rule. For instance, a Freeform rule utilizes Spark SQL, while data source-specific SQL adapts based on the connection type of your dataset.
  6. In the query input field, enter the names of the columns in your dataset to include in the SQL query.
  7. Example In this example, the default placeholder columns "vendor_type" and "transaction_date" will be replaced with the user-defined "cm_claim_status" and "cm_bdos" in the SQL suggestion for record changes.
    An example of a prompt for record changes

    Important You must specify one column with numeric data and one date column.

    Tip When there are many columns in your dataset, you may need to scroll down to the bottom of the input field to enter your column names.

  8. Click Submit to Collibra AI.
    SQL Assistant for Data Quality generates your SQL query and displays the suggested query when it completes.
  9. Example 
    Example of the SQL suggestion for records

  10. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL Assistant for Data Quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.
  1. Open the Dataset Rules page of your dataset.
  2. From the Dataset Rules page, there are two ways to open the Rule Workbench.
    1. From an existing rule on the Rules tab, click Actions, then click Edit.
      The Rule Workbench opens with the preferences of your existing rule.
    2. Click Add Rule in the upper right corner to create a new rule.
      The Rule Workbench opens.
  3. In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
    The SQL Assistant for Data Quality tool appears.
  4. From the Prompt dropdown menu, select Pattern.
  5. The prompt automatically uses the correct SQL type of your rule. For instance, a Freeform rule utilizes Spark SQL, while data source-specific SQL adapts based on the connection type of your dataset.
  6. In the query input field, enter the names of the columns in your dataset to include in the SQL query.
  7. Example In this example, the default placeholder columns "sale_state" and "state_tax" will be replaced with the user-defined "cm_diagnosis_code" and "cm_denied_reason1" in the SQL suggestion for pattern detection.
    An example of a prompt for pattern detection

    Tip When there are many columns in your dataset, you may need to scroll down to the bottom of the input field to enter your column names.

  8. Click Submit to Collibra AI.
    SQL Assistant for Data Quality generates your SQL query and displays the suggested query when it completes.
  9. Example 
    Example of the SQL suggestion for patterns

  10. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL Assistant for Data Quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.
  1. Open the Dataset Rules page of your dataset.
  2. From the Dataset Rules page, there are two ways to open the Rule Workbench.
    1. From an existing rule on the Rules tab, click Actions, then click Edit.
      The Rule Workbench opens with the preferences of your existing rule.
    2. Click Add Rule in the upper right corner to create a new rule.
      The Rule Workbench opens.
  3. In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
    The SQL Assistant for Data Quality tool appears.
  4. From the Prompt dropdown menu, select Frequency Distribution.
  5. The prompt automatically uses the correct SQL type of your rule. For instance, a Freeform rule utilizes Spark SQL, while data source-specific SQL adapts based on the connection type of your dataset.
  6. In the query input field, enter the name of the column in your dataset to include in the SQL query.
  7. Example In this example, a SQL query for frequency distribution will be suggested for the column "cm_denied_reason1".
    An example of a prompt for frequency distribution

    Tip When there are many columns in your dataset, you may need to scroll down to the bottom of the input field to enter your column name.

  8. Click Submit to Collibra AI.
    SQL Assistant for Data Quality generates your SQL query and displays the suggested query when it completes.
  9. Example 
    Example of the SQL suggestion for frequency distribution

  10. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL Assistant for Data Quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.