Documentation

Configuring Apache Hadoop to Execute DQ Jobs

For large-scale processing and concurrency, a single vertically scaled Spark server may not be sufficient. To address large-scale processing, Collibra Data Quality & Observability can push compute jobs to an external Hadoop cluster. This section describes how to configure the DQ Agent to push DQ jobs to Hadoop.

The following diagram shows the DQ architecture with a Hadoop cluster: