Pushdown processing

Pushdown is an alternative computing method for running a DQ job, where all of the job's processing is submitted to a SQL data warehouse. DQ jobs with Pushdown processing generate SQL queries that offload the compute directly to the data source, reducing the amount of data transfer and Spark compute.

With Pushdown, you can also scale your computing needs based on the specific requirements of your DQ job. This is because the architecture of Pushdown data sources feature auto-scaling, which allows you to automatically scale up, or burst, to 64 or 128 nodes when you require more significant processing needs. Pushdown also automatically scales down when your DQ job does not require robust processing. With auto-scaling, the processing of your data is enhanced, improving runtime performance and removing the egress costs of reading large amounts of data.

Throughout the app, you can identify Pushdown jobs by the icon.

Benefits of Pushdown

By running a Pushdown job, you can:

Reduce latency.
Eliminate dependencies on Spark compute to run Collibra DQ, and increase processing speeds.
Eliminate egress latency when running DQ jobs against large datasets.
Auto-scale to data warehouses, such as Snowflake and Databricks.

Prerequisites for using Pushdown

Before running Pushdown jobs, a Collibra DQ user with Admin permissions must enable Pushdown for your data source. Pushdown can be enabled from the data source connection template in Admin Console Connections.

Tip For first-time Snowflake Pushdown configurations, we recommend that you also successfully run the Pushdown setup script. If you do not run the setup script, ensure that you meet all the criteria that the script attempts to accomplish.

Available data sources and layers

Pushdown is currently available for the following data sources and layers:

		Basic Layers				Advanced Layers
Data Source	Lifecycle	Rules	Behavior (Adaptive Rules)	Schema	Archive Break Records	Outliers	Dupes	Shapes	Patterns	Records	Mapping (Source to Target)
Snowflake	Generally available	Yes	Yes	Yes	Yes	Yes	Yes	Yes	No	Yes	Yes
Databricks	Generally available	Yes	Yes	Yes	Yes	Yes	Yes	Yes	No	No	No
BigQuery	Generally available	Yes	Yes	Yes	Yes	Yes	Yes	Yes	No	No	No
Athena	Generally available	Yes	Yes	Yes	Yes	No	No	No	No	No	No
Redshift	Generally available	Yes	Yes	Yes	Yes	No	No	No	No	No	No
Trino	Generally available	Yes	Yes	Yes	Yes	Yes	Yes	Yes	No	No	Yes
SQL Server	Generally available	Yes	Yes	Yes	Yes	No	No	No	No	No	No
SAP HANA	Generally available	Yes	Yes	Yes	Yes	Yes	Yes	Yes	No	No	No
Oracle	Generally available	Yes	Yes	Yes	No	Yes	Yes	No	No	No	No

Running rules on Pushdown jobs

All rules that execute on Pushdown jobs use Database SQL dialects native to the specific data warehouse to which they submit your data for processing. Because Pushdown submits queries directly to the data source, the processing time of SQL rules reduces considerably.

The following table provides examples of how rules run on Pushdown jobs.

Note The Translated Query Example column in the following table demonstrates the translation of the rule shown in the Collibra DQ Query Example column when you submit it to the data warehouse for processing.

Feature	Query Type	Definition	From	Collibra DQ Query Example	Translated Query Example
Native SQL	Physical Query (PQ)	Runs the entire query in native database dialect.	nyse.dataset	`select * from nyse.dataset where fname like 'John'`	`select * from nyse.dataset where fname like 'John'`
Freeform SQL	Logical Query (LQ)	Runs the query including the original scope.	@nyse-dq-dataset	`select * from @nyse-dq-dataset where fname = 'John'`	`select * from (select * from nyse-dq-dataset where d_date='2023-01-01') where fname = 'John'`
Freeform SQL	Template Logical Query (TLQ)	Runs the query including the original scope without specifying the dataset name.	@dataset	`select * from @dataset where fname = 'John'`	`select * from (select * where d_date='2023-01-01') where fname = 'John'`

Known limitations

We strongly encourage avoiding the use of special characters in table and column names for Pushdown datasets.
Currently, you cannot run a job from the command line.
Mixed data type rules are not supported in Pushdown.
Binary data types are not supported in Pushdown, because there is no way to quantify the quality of unstructured binary data.
Cross-connection rules are not supported in Pushdown.
We do not currently support an Okta integration. Support for this integration is planned for a future version.
There is a limitation where Snowflake Pushdown jobs with queries containing REGEXP_REPLACE fail because Snowflake does not currently support arguments with collation specifications. There are two known workarounds:
- Make a clone of the table that is synced in Snowflake and has no collation set.
- Use Pullup mode instead of Pushdown to avoid this Snowflake limitation.