Create a data quality rule with SQL Assistant for Data Quality

SQL Assistant for Data Quality uses Collibra AI to automatically write and troubleshoot SQL queries. When working on a Pullup dataset, the type of SQL that SQL Assistant for Data Quality generates depends on the rule type. For instance, if your rule type is Freeform SQL, the SQL generated by SQL Assistant for Data Quality is Spark SQL, whereas a Native SQL rule returns the SQL specific to the datasource. Likewise, Pushdown datasets always return SQL native to the job source.

The Prompts are organized in two sections: Basic and Advanced.

Basic options for SQL query suggestions provided by SQL Assistant for Data Quality include:

Option Description
Generate Rule Collibra AI suggests a Spark SQL query to check where in your table, column, or view your input criteria are met.
Troubleshoot Rule

When a SQL syntax error occurs, you can use SQL Assistant for Data Quality to automatically troubleshoot and fix the error.

Advanced options for SQL query suggestions provided by SQL Assistant for Data Quality include:

Option Description
Categorical Collibra AI suggests a SQL query to detect categorical outliers.
Dupe Collibra AI suggests a SQL query to detect duplicate values.
Record Collibra AI suggests a SQL query to check for values that appear on a previous day but not for the next day.
Pattern Collibra AI suggests a SQL query to check for infrequent combinations that appear less than 5 percent of the time in the columns you specify.
Frequency Distribution Collibra AI suggests a SQL query to check for the frequency distribution of all values within a column.

Requirements and permissions

To use SQL Assistant for Data Quality in Collibra DQ, you first need to:

  • Ensure the AI_TENANT application configuration is set to TRUE in the Admin Console Configuration Settings.
  • Have one of the following Collibra DQ roles assigned to your user account:
    • ROLE_ADMIN
    • ROLE_GENAI_USER
  • Note Collibra DQ administrators can add or update roles from the Admin Console Role Management Roles.

  • Obtain the Vertex AI encryption key from your Collibra CSM.
  • If your organization's security protocol prohibits outbound traffic, ensure you safelist network egress for oauth2.googleapis.com and us-central1-aiplatform.googleapis.com over port 443.

Steps

Basic options

  1. Open the Dataset Rules page of your dataset.
  2. From the Dataset Rules page, there are two ways to open the Rule Workbench.
    1. From an existing rule on the Rules tab, click Actions, then click Edit.
      The Rule Workbench opens with the preferences of your existing rule.
    2. Click Add Rule in the upper right corner to create a new rule.
      The Rule Workbench opens.
  3. In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
    The SQL Assistant for Data Quality tool appears.
  4. From the Prompt dropdown menu, select Generate Rule.
  5. Note Generate Rule is the default option.

  6. In the query input field, enter a plain language prompt.
  7. Tip When there are many columns in your dataset, you may need to scroll down to the bottom of the input field to enter your column names.

      Examples
      • "high is more than 50"
      • "find all of the unique values in Column1 and show how many records exist for each"
      • "state is not Michigan, California, or Illinois"
      • "compare the 2 dates 2018-01-13 to 2018-01-14 - write 2 separate queries for each date, using trade date - write as a common table expression for tables a and b - join using symbol column - use a join to identify which values are in 2018-01-14 that are not in 2018-01-13 - the goal is to find values that are missing from the previous day"
  8. Click Submit to Collibra AI.
    SQL Assistant for Data Quality generates your SQL query and displays the suggested query when it completes.
  9. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL Assistant for Data Quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL rule suggestion.

When a SQL syntax error occurs, you can use SQL Assistant for Data Quality to automatically troubleshoot and fix the error.

  1. Open the Dataset Rules page of your dataset.
  2. From the Dataset Rules page, there are two ways to open the Rule Workbench.
    1. From an existing rule on the Rules tab, click Actions, then click Edit.
      The Rule Workbench opens with the preferences of your existing rule.
    2. Click Add Rule in the upper right corner to create a new rule.
      The Rule Workbench opens.
  3. In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
    The Collibra AI tool appears.
  4. In the SQL editor, enter an invalid SQL query that returns a syntax error.
  5. Example SELECT * fomr public.nyse2 were onpe > 4.75

  6. In the upper right corner of the SQL editor, click Collibra AI.
    The SQL Assistant for Data Quality tool appears.
  7. From the Prompt dropdown menu, select Troubleshoot Rule.
    The SQL Assistant for Data Quality tool shows the prompt to rewrite and fix your SQL query, an overview of the exception message, the SQL query that caused the error, the table and column names in your query, and helpful troubleshooting tips.
  8. Note You can optionally click the above and to the right of the overview text to manually edit the troubleshooting prompt.

  9. Click Submit to Collibra AI.
    SQL Assistant for Data Quality troubleshoots your SQL query and displays the suggested query when it completes.
  10. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL Assistant for Data Quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL rule suggestion.

Advanced options

  1. Open the Dataset Rules page of your dataset.
  2. From the Dataset Rules page, there are two ways to open the Rule Workbench.
    1. From an existing rule on the Rules tab, click Actions, then click Edit.
      The Rule Workbench opens with the preferences of your existing rule.
    2. Click Add Rule in the upper right corner to create a new rule.
      The Rule Workbench opens.
  3. In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
    The SQL Assistant for Data Quality tool appears.
  4. From the Prompt dropdown menu, select Categorical.
  5. The prompt automatically uses the correct SQL type of your rule. For instance, a Freeform rule utilizes Spark SQL, while data source-specific SQL adapts based on the connection type of your dataset.
  6. In the query input field, enter the names of the columns in your dataset to include in the SQL query.
  7. Example In this example, the default placeholder column "sale_state" will be replaced with the user-defined "cm_claimant_age" in the categorical outlier SQL suggestion.
    An example of a prompt for categorical outliers

    Tip When there are many columns in your dataset, you may need to scroll down to the bottom of the input field to enter your column names.

  8. Click Submit to Collibra AI.
    SQL Assistant for Data Quality generates your SQL query and displays the suggested query when it completes.
  9. Example 
    Example of the SQL suggestion for categroical

  10. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL Assistant for Data Quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.
  1. Open the Dataset Rules page of your dataset.
  2. From the Dataset Rules page, there are two ways to open the Rule Workbench.
    1. From an existing rule on the Rules tab, click Actions, then click Edit.
      The Rule Workbench opens with the preferences of your existing rule.
    2. Click Add Rule in the upper right corner to create a new rule.
      The Rule Workbench opens.
  3. In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
    The SQL Assistant for Data Quality tool appears.
  4. From the Prompt dropdown menu, select Dupe.
  5. The prompt automatically uses the correct SQL type of your rule. For instance, a Freeform rule utilizes Spark SQL, while data source-specific SQL adapts based on the connection type of your dataset.
  6. In the query input field, enter the name of the column in your dataset to include in the SQL query.
  7. Example In this example, the default placeholder column "email" will be replaced with the user-defined "cm_denied_reason1" in the duplicate value SQL suggestion.
    An example of a prompt for duplicate values

    Tip When there are many columns in your dataset, you may need to scroll down to the bottom of the input field to enter your column name.

  8. Click Submit to Collibra AI.
    SQL Assistant for Data Quality generates your SQL query and displays the suggested query when it completes.
  9. Example 
    Example of the SQL suggestion for dupes

  10. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL Assistant for Data Quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.
  1. Open the Dataset Rules page of your dataset.
  2. From the Dataset Rules page, there are two ways to open the Rule Workbench.
    1. From an existing rule on the Rules tab, click Actions, then click Edit.
      The Rule Workbench opens with the preferences of your existing rule.
    2. Click Add Rule in the upper right corner to create a new rule.
      The Rule Workbench opens.
  3. In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
    The SQL Assistant for Data Quality tool appears.
  4. From the Prompt dropdown menu, select Record.
  5. The prompt automatically uses the correct SQL type of your rule. For instance, a Freeform rule utilizes Spark SQL, while data source-specific SQL adapts based on the connection type of your dataset.
  6. In the query input field, enter the names of the columns in your dataset to include in the SQL query.
  7. Example In this example, the default placeholder columns "vendor_type" and "transaction_date" will be replaced with the user-defined "cm_claim_status" and "cm_bdos" in the SQL suggestion for record changes.
    An example of a prompt for record changes

    Important You must specify one column with numeric data and one date column.

    Tip When there are many columns in your dataset, you may need to scroll down to the bottom of the input field to enter your column names.

  8. Click Submit to Collibra AI.
    SQL Assistant for Data Quality generates your SQL query and displays the suggested query when it completes.
  9. Example 
    Example of the SQL suggestion for records

  10. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL Assistant for Data Quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.
  1. Open the Dataset Rules page of your dataset.
  2. From the Dataset Rules page, there are two ways to open the Rule Workbench.
    1. From an existing rule on the Rules tab, click Actions, then click Edit.
      The Rule Workbench opens with the preferences of your existing rule.
    2. Click Add Rule in the upper right corner to create a new rule.
      The Rule Workbench opens.
  3. In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
    The SQL Assistant for Data Quality tool appears.
  4. From the Prompt dropdown menu, select Pattern.
  5. The prompt automatically uses the correct SQL type of your rule. For instance, a Freeform rule utilizes Spark SQL, while data source-specific SQL adapts based on the connection type of your dataset.
  6. In the query input field, enter the names of the columns in your dataset to include in the SQL query.
  7. Example In this example, the default placeholder columns "sale_state" and "state_tax" will be replaced with the user-defined "cm_diagnosis_code" and "cm_denied_reason1" in the SQL suggestion for pattern detection.
    An example of a prompt for pattern detection

    Tip When there are many columns in your dataset, you may need to scroll down to the bottom of the input field to enter your column names.

  8. Click Submit to Collibra AI.
    SQL Assistant for Data Quality generates your SQL query and displays the suggested query when it completes.
  9. Example 
    Example of the SQL suggestion for patterns

  10. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL Assistant for Data Quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.
  1. Open the Dataset Rules page of your dataset.
  2. From the Dataset Rules page, there are two ways to open the Rule Workbench.
    1. From an existing rule on the Rules tab, click Actions, then click Edit.
      The Rule Workbench opens with the preferences of your existing rule.
    2. Click Add Rule in the upper right corner to create a new rule.
      The Rule Workbench opens.
  3. In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
    The SQL Assistant for Data Quality tool appears.
  4. From the Prompt dropdown menu, select Frequency Distribution.
  5. The prompt automatically uses the correct SQL type of your rule. For instance, a Freeform rule utilizes Spark SQL, while data source-specific SQL adapts based on the connection type of your dataset.
  6. In the query input field, enter the name of the column in your dataset to include in the SQL query.
  7. Example In this example, a SQL query for frequency distribution will be suggested for the column "cm_denied_reason1".
    An example of a prompt for frequency distribution

    Tip When there are many columns in your dataset, you may need to scroll down to the bottom of the input field to enter your column name.

  8. Click Submit to Collibra AI.
    SQL Assistant for Data Quality generates your SQL query and displays the suggested query when it completes.
  9. Example 
    Example of the SQL suggestion for frequency distribution

  10. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL Assistant for Data Quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.