About SQL Assistant for Data Quality
SQL Assistant for Data Quality allows you to automate SQL rule writing and troubleshooting to help you accelerate the discovery, curation, and visualization of your data. By leveraging the SQL query generation capabilities of SQL Assistant for Data Quality, both advanced and beginner SQL users can quickly discover key data points and insights, and then convert them into rules. Furthermore, the troubleshooting capabilities of SQL Assistant for Data Quality allow you to elegantly correct syntactical mistakes, resolving the once tedious SQL syntax reviews of the past.
You can use SQL Assistant for Data Quality to assist with SQL rule writing and troubleshooting on the following pages:
Tip This feature is powered by Collibra AI. For information on how we leverage AI in our products, go to the Collibra Trust Site.
Prerequisites
There are two different sets of prerequisites, depending on whether you use the platform path with a Collibra Data Intelligence Platform integration or the beta path with a Collibra-provided encrypted service account key.
- Platform path
- Beta path
To use SQL Assistant for Data Quality with a Collibra Data Intelligence Platform integration, you need to:
- Have an active integration with Collibra Data Intelligence Platform configured on the Admin Console > Integrations screen.
- Set AI_PLATFORM_PATH to
TRUE
(default) on the Admin Console > Application Configuration Settings screen. - Set AI_TENANT to
TRUE
on the Admin Console > Application Configuration Settings screen. - Update the
gai_proxy_endpoint
in the owl-env.sh or Helm Chart, depending on your installation type. - Export the environment variable
export gai_proxy_endpoint=https://kong-prod-gcp-ue1-1.collibra.com
in the owl-env.sh file. - Save and close the owl-env.sh file.
- Restart the DQ web application and DQ agent services.
- Set the following value in the Helm Chart provided by Collibra.
- If you are updating an existing deployment of Collibra Data Quality & Observability, restart the DQ web and agent pods to complete the update.
Note Only the Credentials step of the Integration Setup is required. You do not need to map connections, tenants, dimensions, or layers to use SQL Assistant for Data Quality.
Important
You may need to safelist the proxy endpoint if the security settings of your installation do not allow outbound internet requests.
While this is unlikely if you have an active integration with Collibra Data Intelligence Platform set up, you may also need to safelist your Collibra Data Intelligence Platform URL to allow outbound requests to it.
Steps for Spark Standalone and Hadoop installations
Note The default value for the gai_proxy_endpoint is https://kong-prod-gcp-ue1-1.collibra.com. To change this value to a different cloud region, contact Collibra Support.
Steps for Kubernetes (cloud native) installations
--set gai_proxy_endpoint: "https://kong-prod-gcp-ue1-1.collibra.com"
Note The default value for the gai_proxy_endpoint is https://kong-prod-gcp-ue1-1.collibra.com. To change this value to a different cloud region, contact Collibra Support.
Important Collibra will not provide long-term support for the beta path. Therefore, we recommend using the platform path instead.
When using SQL Assistant for Data Quality without a Collibra Data Intelligence Platform integration, you have:
- AI_PLATFORM_PATH set to
FALSE
on the Admin Console > Application Configuration Settings screen. - AI_TENANT set to
TRUE
on the Admin Console > Application Configuration Settings screen. - An encrypted service account key provided by Collibra.
How does SQL Assistant for Data Quality work?
The architecture of SQL Assistant for Data Quality differs depending on whether you use the platform path with a Collibra Data Intelligence Platform integration or the beta path with a Collibra-provided encrypted service account key.
- Platform path
- Beta path
- A user initiates a Collibra AI prompt request from Dataset Overview or Rule Workbench in the Collibra DQ UI.
- Collibra DQ registers the user and retrieves an OAuth key from their Collibra Data Intelligence Platform instance.
- The OAuth token is used to authenticate the request with the Public Kong Endpoint. Once authenticated, the request is redirected to the Vertex Proxy Endpoint in the Collibra Platform.
- The Collibra Platform makes a request to the Google Vertex API.
- Google Vertex AI returns a response to Collibra DQ, where it is logged in the Generative AI Audit Trail.
- The response routes through Collibra DQ, where its contents display in the UI.
Important Collibra will not provide long-term support for the beta path. Therefore, we recommend using the platform path instead.
- A user initiates a Collibra AI prompt request from Dataset Overview or Rule Workbench in the Collibra DQ UI.
- The request routes through Collibra DQ where it decrypts the service account key.
- Collibra DQ sends the request to Google Vertex AI using the decrypted key.
- Google Vertex AI returns a response to Collibra DQ, where it is logged in the Generative AI Audit Trail.
- The response routes through Collibra DQ, where its contents display in the UI.
Note The encrypted service account key is specific to each customer.
Overview of the SQL Assistant for Data Quality components
No. | Component | Description | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Collibra AI prompts |
Click the dropdown menu and select a prompt.
|
|||||||||
Reset prompt | Click to reset your request prompt. | |||||||||
|
Request prompt |
An input field where you can view dataset metadata and enter a prompt for SQL Assistant for Data Quality in plain-text format. There are three main sections in the request prompt:
|
||||||||
Results | The results that SQL Assistant for Data Quality returns based on the contents of your prompt. | |||||||||
Action buttons |
|