SQL assistant for data quality (Beta)

Note This feature is in Beta testing and is powered by Collibra AI. If you would like to share your feedback with our product team, please go to the Collibra beta platform.

SQL assistant for data quality allows you to automate SQL rule writing and troubleshooting to help you accelerate the discovery, curation, and visualization of your data. By leveraging the SQL query generation capabilities of SQL assistant for data quality, both advanced and beginner SQL users can quickly discover key data points and insights, and then convert them into rules. Furthermore, the troubleshooting capabilities of SQL assistant for data quality allow you to elegantly correct syntactical mistakes, resolving the once tedious SQL syntax reviews of the past.

You can use SQL assistant for data quality to assist with SQL rule writing and troubleshooting on the following pages:

Collibra AI screenshot

No. Component Description
Collibra AI options

Click the dropdown menu and select an option.

  • Generate Rule creates SQL suggestions based on your plain-text input in the request prompt.
  • Troubleshoot Rule provides suggestions to fix and rewrite queries when there are issues with your SQL.
  • Categorical provides a suggested SQL check to uncover any categorical outliers in the column you specify.
  • Dupe provides a suggested SQL check to uncover any duplicate values in the column you specify.
  • Record provides a suggested SQL check to uncover any values that appear on a previous day but not the next.
  • Pattern provides a suggested SQL check to uncover infrequent combinations that appear less than 5 percent of the time in the columns you specify.
Edit and reset prompt Optionally click to manually edit the default request and request prompt, or click to refresh your request prompt.
Default request The dataset metadata that instructs SQL assistant for data quality on what to include in the query. This is typically all of the details before the WHERE clause in a SQL query.
Request prompt An input field where you can enter a prompt for SQL assistant for data quality in plain-text format.
Results The results that SQL assistant for data quality returns based on the content of your prompt.
Action buttons
  • Click Submit to Collibra AI to generate a SQL rule suggestion.
  • Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
  • Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
  • Click Close to exit the SQL assistant for data quality tool without copying the suggested query.

Prerequisites

To use SQL assistant for data quality in Collibra DQ, you first need to:

  • Ensure that the AI_TENANT application configuration is set to TRUE in the Admin Console Configuration Settings.
  • Safelist network egress for oauth2.googleapis.com and us-central1-aiplatform.googleapis.com over port 443.
  • Have one of the following Collibra DQ roles assigned to your user account:
    • ROLE_ADMIN
    • ROLE_GENAI_USER
  • Note Collibra DQ administrators can add or update roles from the Admin Console Role Management Roles.

Using SQL assistant for data quality

SQL assistant for data quality uses Collibra AI to automatically write and troubleshoot SQL queries. When working on a Pullup dataset, the type of SQL that SQL assistant for data quality generates depends on the rule type. For instance, if your rule type is Freeform SQL, the SQL generated by SQL assistant for data quality is Spark SQL, whereas a Native SQL rule returns the SQL specific to the datasource. Likewise, Pushdown datasets always return SQL native to the job source.

The following steps show you how to use SQL assistant for data quality and from where you can access it.

From Dataset Overview

  1. From the Metadata Bar at the top of any of the following dataset-level pages, click the Dataset Overview icon to open Dataset Overview:
    1. Findings
    2. Profile
    3. Dataset Rules
    4. Alert Builder
  2. The Dataset Overview modal appears.
  3. In the upper right corner of the SQL editor, click Collibra AI.
    The SQL assistant for data quality tool appears.
  4. From the Prompt dropdown menu, select Generate Rule.
  5. Note Generate Rule is the default option.

  6. In the query input field, enter a plain language prompt.
    • Examples
      • "high is more than 50"
      • "create a complex rule that is 25 lines long and returns only 1 record"
      • "state is not Michigan, California, or Illinois"
      • "compare the 2 dates 2018-01-13 to 2018-01-14 - write 2 separate queries for each date, using trade date - write as a common table expression for tables a and b - join using symbol column - use a join to identify which values are in 2018-01-14 that are not in 2018-01-13 - the goal is to find values that are missing from the previous day"
  7. Click Submit to Collibra AI.
    SQL assistant for data quality generates your SQL query and displays the suggested query when it completes.
  8. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL assistant for data quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL rule suggestion.

When a SQL syntax error occurs, you can use SQL assistant for data quality to automatically troubleshoot and fix the error.

  1. From the Metadata Bar at the top of any of the following dataset-level pages, click the Dataset Overview icon to open Dataset Overview:
    1. Findings
    2. Profile
    3. Dataset Rules
    4. Alert Builder
  2. The Dataset Overview modal appears.
  3. In the SQL editor, enter an invalid SQL query that returns a syntax error.
  4. Example SELECT * fomr public.nyse2 were onpe > 4.75

  5. In the upper right corner of the SQL editor, click Collibra AI.
    The SQL assistant for data quality tool appears.
  6. From the Prompt dropdown menu, select Troubleshoot Rule.
    The SQL assistant for data quality tool shows the prompt to rewrite and fix your SQL query, an overview of the exception message, the SQL query that caused the error, the table and column names in your query, and helpful troubleshooting tips.
  7. Note You can optionally click the above and to the right of the overview text to manually edit the troubleshooting prompt.

  8. Click Submit to Collibra AI.
    SQL assistant for data quality troubleshoots your SQL query and displays the suggested query when it completes.
  9. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL assistant for data quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL rule suggestion.
  1. From the Metadata Bar at the top of any of the following dataset-level pages, click the Dataset Overview icon to open Dataset Overview:
    1. Findings
    2. Profile
    3. Dataset Rules
    4. Alert Builder
  2. The Dataset Overview modal appears.
  3. In the upper right corner of the SQL editor, click Collibra AI.
    The SQL assistant for data quality tool appears.
  4. From the Prompt dropdown menu, select Categorical.
  5. In the query input field, enter the name of the column in your dataset to include in the SQL query.
  6. Click Submit to Collibra AI.
    SQL assistant for data quality generates your SQL query and displays the suggested query when it completes.
  7. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL assistant for data quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.
  1. From the Metadata Bar at the top of any of the following dataset-level pages, click the Dataset Overview icon to open Dataset Overview:
    1. Findings
    2. Profile
    3. Dataset Rules
    4. Alert Builder
  2. The Dataset Overview modal appears.
  3. In the upper right corner of the SQL editor, click Collibra AI.
    The SQL assistant for data quality tool appears.
  4. From the Prompt dropdown menu, select Dupe.
  5. In the query input field, enter the name of the column in your dataset to include in the SQL query.
  6. Click Submit to Collibra AI.
    SQL assistant for data quality generates your SQL query and displays the suggested query when it completes.
  7. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL assistant for data quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.
  1. From the Metadata Bar at the top of any of the following dataset-level pages, click the Dataset Overview icon to open Dataset Overview:
    1. Findings
    2. Profile
    3. Dataset Rules
    4. Alert Builder
  2. The Dataset Overview modal appears.
  3. In the upper right corner of the SQL editor, click Collibra AI.
    The SQL assistant for data quality tool appears.
  4. From the Prompt dropdown menu, select Record.
  5. In the query input field, enter the names of the columns in your dataset to include in the SQL query.
  6. Important You must specify one column with numeric data and one date column.

  7. Click Submit to Collibra AI.
    SQL assistant for data quality generates your SQL query and displays the suggested query when it completes.
  8. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL assistant for data quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.
  1. From the Metadata Bar at the top of any of the following dataset-level pages, click the Dataset Overview icon to open Dataset Overview:
    1. Findings
    2. Profile
    3. Dataset Rules
    4. Alert Builder
  2. The Dataset Overview modal appears.
  3. In the upper right corner of the SQL editor, click Collibra AI.
    The SQL assistant for data quality tool appears.
  4. From the Prompt dropdown menu, select Pattern.
  5. In the query input field, enter the names of the columns in your dataset to include in the SQL query.
  6. Click Submit to Collibra AI.
    SQL assistant for data quality generates your SQL query and displays the suggested query when it completes.
  7. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL assistant for data quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.

From the Rule Workbench

  1. Open the Dataset Rules page of your dataset.
  2. From the Dataset Rules page, there are two ways to open the Rule Workbench.
    1. From an existing rule on the Rules tab, click Actions, then click Edit.
      The Rule Workbench opens with the preferences of your existing rule.
    2. Click Add Rule in the upper right corner to create a new rule.
      The Rule Workbench opens.
  3. In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
    The SQL assistant for data quality tool appears.
  4. From the Prompt dropdown menu, select Generate Rule.
  5. Note Generate Rule is the default option.

  6. In the query input field, enter a plain language prompt.
    • Examples
      • "high is more than 50"
      • "create a complex rule that is 25 lines long and returns only 1 record"
      • "state is not Michigan, California, or Illinois"
      • "compare the 2 dates 2018-01-13 to 2018-01-14 - write 2 separate queries for each date, using trade date - write as a common table expression for tables a and b - join using symbol column - use a join to identify which values are in 2018-01-14 that are not in 2018-01-13 - the goal is to find values that are missing from the previous day"
  7. Click Submit to Collibra AI.
    SQL assistant for data quality generates your SQL query and displays the suggested query when it completes.
  8. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL assistant for data quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL rule suggestion.

When a SQL syntax error occurs, you can use SQL assistant for data quality to automatically troubleshoot and fix the error.

  1. Open the Dataset Rules page of your dataset.
  2. From the Dataset Rules page, there are two ways to open the Rule Workbench.
    1. From an existing rule on the Rules tab, click Actions, then click Edit.
      The Rule Workbench opens with the preferences of your existing rule.
    2. Click Add Rule in the upper right corner to create a new rule.
      The Rule Workbench opens.
  3. In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
    The Collibra AI tool appears.
  4. In the SQL editor, enter an invalid SQL query that returns a syntax error.
  5. Example SELECT * fomr public.nyse2 were onpe > 4.75

  6. In the upper right corner of the SQL editor, click Collibra AI.
    The SQL assistant for data quality tool appears.
  7. From the Prompt dropdown menu, select Troubleshoot Rule.
    The SQL assistant for data quality tool shows the prompt to rewrite and fix your SQL query, an overview of the exception message, the SQL query that caused the error, the table and column names in your query, and helpful troubleshooting tips.
  8. Note You can optionally click the above and to the right of the overview text to manually edit the troubleshooting prompt.

  9. Click Submit to Collibra AI.
    SQL assistant for data quality troubleshoots your SQL query and displays the suggested query when it completes.
  10. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL assistant for data quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL rule suggestion.
  1. Open the Dataset Rules page of your dataset.
  2. From the Dataset Rules page, there are two ways to open the Rule Workbench.
    1. From an existing rule on the Rules tab, click Actions, then click Edit.
      The Rule Workbench opens with the preferences of your existing rule.
    2. Click Add Rule in the upper right corner to create a new rule.
      The Rule Workbench opens.
  3. In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
    The SQL assistant for data quality tool appears.
  4. From the Prompt dropdown menu, select Categorical.
  5. In the query input field, enter the name of the column in your dataset to include in the SQL query.
  6. Click Submit to Collibra AI.
    SQL assistant for data quality generates your SQL query and displays the suggested query when it completes.
  7. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL assistant for data quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.
  1. Open the Dataset Rules page of your dataset.
  2. From the Dataset Rules page, there are two ways to open the Rule Workbench.
    1. From an existing rule on the Rules tab, click Actions, then click Edit.
      The Rule Workbench opens with the preferences of your existing rule.
    2. Click Add Rule in the upper right corner to create a new rule.
      The Rule Workbench opens.
  3. In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
    The SQL assistant for data quality tool appears.
  4. From the Prompt dropdown menu, select Dupe.
  5. In the query input field, enter the name of the column in your dataset to include in the SQL query.
  6. Click Submit to Collibra AI.
    SQL assistant for data quality generates your SQL query and displays the suggested query when it completes.
  7. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL assistant for data quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.
  1. Open the Dataset Rules page of your dataset.
  2. From the Dataset Rules page, there are two ways to open the Rule Workbench.
    1. From an existing rule on the Rules tab, click Actions, then click Edit.
      The Rule Workbench opens with the preferences of your existing rule.
    2. Click Add Rule in the upper right corner to create a new rule.
      The Rule Workbench opens.
  3. In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
    The SQL assistant for data quality tool appears.
  4. From the Prompt dropdown menu, select Record.
  5. In the query input field, enter the names of the columns in your dataset to include in the SQL query.
  6. Click Submit to Collibra AI.
    SQL assistant for data quality generates your SQL query and displays the suggested query when it completes.
  7. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL assistant for data quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.
  1. Open the Dataset Rules page of your dataset.
  2. From the Dataset Rules page, there are two ways to open the Rule Workbench.
    1. From an existing rule on the Rules tab, click Actions, then click Edit.
      The Rule Workbench opens with the preferences of your existing rule.
    2. Click Add Rule in the upper right corner to create a new rule.
      The Rule Workbench opens.
  3. In the upper right corner of the Rule Workbench SQL editor, click Collibra AI.
    The SQL assistant for data quality tool appears.
  4. From the Prompt dropdown menu, select Pattern.
  5. In the query input field, enter the names of the columns in your dataset to include in the SQL query.
  6. Click Submit to Collibra AI.
    SQL assistant for data quality generates your SQL query and displays the suggested query when it completes.
  7. Choose one of the following options:
    1. Click Copy to Editor to automatically apply the suggested query to the SQL editor on the Dataset Overview modal.
    2. Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
    3. Click Close to exit the SQL assistant for data quality tool without copying the suggested query.
    4. Rewrite your prompt and click Submit to Collibra AI to generate a new SQL suggestion.

FAQ

  • What models are used?
    The generative AI features use foundational models from Google Model Garden on Vertex AI. The 2 models that we use are PaLM 2 for Text (text-bison) and Embeddings for text (textembedding-gecko).
  • Are the models fine-tuned or retrained in any way?
    No, we do not fine-tune or retrain them in the beta implementation.
  • Where, regionally, are the models located?
    The models are located in the US Central region.
  • Where, regionally, are the other components located?
    The US Central region.
  • How is data secured when it is sent from the proxy to the Vertex Models?
    All data is encrypted in transit using HTTPS with TLS protocols.
  • What data is sent to Vertex AI?
    When a user submits a request to Collibra AI, the following info is sent to Vertex AI:
    • Generate Rule
      • Table name
      • Column names
      • User-generated plain language query description
    • Troubleshoot Rule
      • Table name
      • Column names
      • Current query
      • Contents of error message from SQL engine
  • What data is stored in Vertex AI?
    The only data stored in Vertex AI is metadata regarding user queries, which does not currently include any user query or model output. This includes metadata such as total billable characters, character count, and latency metrics. Additionally, Google has made it clear that, in our current setup, no prompts are used to improve their foundational models. We do foresee a scenario where user queries might be stored in the future, but we will engage the Legal team and obtain approval prior to any changes to this.