Pushdown processing

Pushdown is an alternative compute method for running a DQ Job, where all of the job's processing is submitted to a SQL data warehouse. DQ Jobs with Pushdown processing generate SQL queries that offload the compute directly to the data source, reducing the amount of data transfer and Spark compute.

With Pushdown, you can also scale your compute needs based on the specific requirements of your DQ Job. This is because the architecture of Snowflake and Databricks features auto-scaling, which allows you to automatically scale up, or burst, to 64 or 128 nodes when you require greater processing needs. Pushdown also automatically scales down when your DQ Job does not require robust processing. With auto-scaling, the processing of your data is enhanced, improving runtime performance and removing the egress costs of reading large amounts of data.

Throughout the app, you can identify Pushdown Jobs by the pushdown icon icon.

Pushdown is currently supported for the following data warehouses:

  • Snowflake
  • Databricks (beta)

Benefits of Pushdown

By running a Pushdown job, you can:

  • Reduce latency.
  • Eliminate dependencies on Spark compute to run Collibra DQ, and increase processing speeds.
  • Eliminate the egress latency when running DQ Jobs against large data sets.
  • Auto-scale to data warehouses, such as Snowflake and Databricks.

Prerequisites for using Pushdown

Before running Snowflake Pushdown jobs, a user with Admin permissions must:

  • Enable Pushdown from the Snowflake Connections template in Admin Console Connections.

Tip For first-time configurations, we recommend that you also successfully run the Pushdown setup script. If you do not run the setup script, ensure that you meet all the criteria that the script attempts to accomplish.

Known limitations

  • Pushdown is currently only available for on-premises deployments of Collibra DQ. Support for Pushdown in DQ Cloud is planned for a future release.
  • Pattern detection is not yet supported. Support for Patterns is planned for the May 2023 release.
  • Break records cannot yet be stored directly in the data warehouse. This functionality is planned for the May 2023 release.
  • You cannot currently run a Job from the command line (CLI).
  • Mixed data type rules are not supported in Pushdown.
  • We do not currently support an Okta integration. Support for this integration is planned for a future release.