About SQL Assistant for Data Quality

SQL Assistant for Data Quality allows you to automate SQL rule writing and troubleshooting to help you accelerate the discovery, curation, and visualization of your data. By leveraging the SQL query generation capabilities of SQL Assistant for Data Quality, both advanced and beginner SQL users can quickly discover key data points and insights, and then convert them into rules. Furthermore, the troubleshooting capabilities of SQL Assistant for Data Quality allow you to elegantly correct syntactical mistakes, resolving the once tedious SQL syntax reviews of the past.

You can use SQL Assistant for Data Quality to assist with SQL rule writing and troubleshooting on the following pages:

Tip This feature is powered by Collibra AI. For information on how we leverage AI in our products, go to the Collibra Trust Site.

Prerequisites

There are two different sets of prerequisites, depending on whether you use the platform path with a Collibra Data Intelligence Platform integration or the beta path with a Collibra-provided encrypted service account key.

To use SQL Assistant for Data Quality with a Collibra Data Intelligence Platform integration, you need to:

  • Have an active integration with Collibra Data Intelligence Platform configured on the Admin Console > Integrations screen.
  • Note Only the Credentials step of the Integration Setup is required. You do not need to map connections, tenants, dimensions, or layers to use SQL Assistant for Data Quality.

  • Set AI_PLATFORM_PATH to TRUE (default) on the Admin Console > Application Configuration Settings screen.
  • Set AI_TENANT to TRUE on the Admin Console > Application Configuration Settings screen.
  • Update the gai_proxy_endpoint in the owl-env.sh or Helm Chart, depending on your installation type.

Important Collibra will not provide long-term support for the beta path. Therefore, we recommend using the platform path instead.

When using SQL Assistant for Data Quality without a Collibra Data Intelligence Platform integration, you have:

How does SQL Assistant for Data Quality work?

The architecture of SQL Assistant for Data Quality differs depending on whether you use the platform path with a Collibra Data Intelligence Platform integration or the beta path with a Collibra-provided encrypted service account key.

collibra ai architecture leveraging public kong endpoint

  1. A user initiates a Collibra AI prompt request from Dataset Overview or Rule Workbench in the Collibra DQ UI.
  2. Collibra DQ registers the user and retrieves an OAuth key from their Collibra Data Intelligence Platform instance.
  3. The OAuth token is used to authenticate the request with the Public Kong Endpoint. Once authenticated, the request is redirected to the Vertex Proxy Endpoint in the Collibra Platform.
  4. The Collibra Platform makes a request to the Google Vertex API.
  5. Google Vertex AI returns a response to Collibra DQ, where it is logged in the Generative AI Audit Trail.
  6. The response routes through Collibra DQ, where its contents display in the UI.

Important Collibra will not provide long-term support for the beta path. Therefore, we recommend using the platform path instead.

Architecture diagram depicting how SQL assistant for data quality works

  1. A user initiates a Collibra AI prompt request from Dataset Overview or Rule Workbench in the Collibra DQ UI.
  2. The request routes through Collibra DQ where it decrypts the service account key.
  3. Note The encrypted service account key is specific to each customer.

  4. Collibra DQ sends the request to Google Vertex AI using the decrypted key.
  5. Google Vertex AI returns a response to Collibra DQ, where it is logged in the Generative AI Audit Trail.
  6. The response routes through Collibra DQ, where its contents display in the UI.

Overview of the SQL Assistant for Data Quality components

Collibra AI screenshot

No. Component Description
Collibra AI prompts

Click the dropdown menu and select a prompt.

  • Generate Rule creates SQL suggestions based on your plain-text input in the request prompt.
  • Troubleshoot Rule fixes and rewrites queries when there are issues with your SQL.
  • Categorical provides a suggested SQL check to uncover any categorical outliers in the column you specify.
  • Dupe provides a suggested SQL check to uncover any duplicate values in the column you specify.
  • Record provides a suggested SQL check to uncover any values that appear on a previous day but not the next.
  • Pattern provides a suggested SQL check to uncover infrequent combinations that appear less than 5 percent of the time in the columns you specify.
  • Frequency Distribution provides a SQL check to uncover the frequency distribution of all values within a column.
Reset prompt Click to reset your request prompt.

Request prompt

An input field where you can view dataset metadata and enter a prompt for SQL Assistant for Data Quality in plain-text format. There are three main sections in the request prompt:

Section Description

The name of the table. This is typically one of the details before the WHERE clause in a SQL query.

The columns in the table that are available to query.

The input field where you can enter a prompt for SQL Assistant for Data Quality to generate a SQL suggestion.

Click the empty space at the bottom of the input field to place your cursor in the correct request prompt location.

Tip When many columns exist in your dataset, you may need to scroll to the bottom of the request prompt to enter your desired query criteria.

Results The results that SQL Assistant for Data Quality returns based on the contents of your prompt.
Action buttons
  • Click Submit to Collibra AI to generate a SQL rule suggestion.
  • Click Copy to Editor to automatically apply the suggested query to the SQL editor.
  • Click Copy to Clipboard to copy the suggested query to your computer's clipboard, allowing you to paste the query elsewhere.
  • Click Close to exit the SQL assistant without copying the suggested query.