Agent

When an agent picks up a DQ Job, it launches the job either locally on the agent node itself or on a cluster as a Spark job (if the agent is set up as an edge node of the cluster). Depending on where the job launches, the results of the DQ Job will write back to the Metastore. The results then display in the DQ Web App, are exposed as REST API, and become available for direct SQL query against the Metastore.

The diagram below provides a high-level overview of how agents work within Collibra DQ. Job execution is driven by DQ Jobs that are written to an agent_q table inside the Metastore (DQ-Postgres) via the Web App or REST API endpoint. Each active and available agent queries the DQ-Postgres table every 5 seconds to execute DQ Jobs for which the agent is responsible. For example, the EMR agent DQ-Agent3 only executes DQ Jobs scheduled to run on EMR.

dq agent diagram

What's next?