Add a data quality capability
Note If you're using a Collibra Cloud site, go the Collibra Cloud site documentation to check if your data source is supported.
In Pushdown mode, Data Quality Jobs are submitted directly to Pushdown-compatible data source, such as Databricks, SAP HANA, or Snowflake. Processing occurs entirely within the SQL data warehouse. The resulting profile data is then shown on the Job Details page. No personally identifiable information (PII) is stored in Collibra.
Pushdown processes data in its source location, which helps reduce data transfer costs and can alleviate egress latency when running jobs on large sets of data. This method also contributes to lower latency and improved processing speeds by reducing reliance on external compute engines, such as Spark, for running Data Quality Jobs. Additionally, compared to Pullup, Pushdown features simplified management by avoiding the need for detailed Spark configuration tuning and supports auto-scaling with data warehouses like Snowflake and Databricks.
Before you can run Pushdown jobs, you need to add the Data Quality Pushdown Processing capability to a Pushdown-compatible data source configured as a connection on your Edge or Collibra Cloud site.
Prerequisites
- You have created and installed an Edge site.
- You have created a connection to a data source that is certified for data quality in your Edge or Collibra Cloud site.
- You have a global role that has the Manage connections and capabilities global permission, for example, Edge integration engineer.
Steps
- Open an Edge or Collibra Cloud site.
-
On the main toolbar, click
→
Settings.
The Settings page opens. -
In the tab pane, click Edge.
The Sites tab opens and shows a table with an overview of your sites. - In the table, click the name of an Edge or Collibra Cloud site with the status Healthy.
The Edge or Collibra Cloud site page opens.
-
On the main toolbar, click
- Verify that you are connected to a supported data source for data quality.
- If your Pushdown-compatible data source is not yet configured as a connection on your Edge or Collibra Cloud site, follow the steps on Create a JDBC connection for your data source.
- Ensure that your data source has the correct permissions to allow data quality queries to run effectively. Go to Data source-specific permissions to identify the required permissions for your data source.
- Click the Capabilities tab.
The Capabilities tab appears. - Click Add Capability.
The Add Capability dialog box appears. - Select Data Quality Pushdown Processing.
- Enter the required information.
- Click Create.
The capability is added to the Edge or Collibra Cloud site.
The fields become read-only. - Repeat the steps in this section for any connection where you want to run data quality with Pushdown processing.
Field | Description | Required |
---|---|---|
Name |
The name of the capability. |
|
Description |
The description of the capability. |
|
JDBC Connection |
The data quality connection to be used by the capability. Select a Pushdown-compatible option from the drop-down list. If your connection does not appear, you can create a new connection to a data source that is certified for data quality. |
|
Debug |
An option to automatically send Edge infrastructure log files to Collibra Platform. By default, this option is set to false. Note We highly recommend to only send Edge infrastructure log files to Collibra Platform when you have issues with Edge. If you set it to true, it will automatically revert to false after 24h.
|
|
Log level |
An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging. |
|
- Set global limits for Pushdown jobs that run in Collibra.
- Review the Monitoring Overview options to begin working with your data.
- Add quick monitoring to schemas in your data source.
- Create a Data Quality Job.