Agent
Diagram
The diagram above provides a high-level overview of how agents work within Collibra DQ. Job execution is driven by DQ Jobs that are written to an agent_q
table inside the DQ Metastore (DQ-Postgres) via the Web App or REST API endpoint. Each active and available agent queries the DQ-Postgres table every 5 seconds to execute DQ Jobs for which the agent is responsible. For example, the EMR agent DQ-Agent3 only executes DQ Jobs scheduled to run on EMR.
When an agent picks up a DQ Job, it launches the job either locally on the agent node itself or on a cluster as a Spark job (if the agent is set up as an edge node of the cluster). Depending on where the job launches, the results of the DQ Job will write back to the DQ Metastore. The results then display in the DQ Web App, are exposed as REST API, and become available for direct SQL query against the DQ Metastore.
Setting up a DQ Agent with setup.sh
as part of the DQ package
Use the setup.sh
script located in /opt/owl/
(or other Base Path that your installation used). See the example in the code block below for setting up a DQ Agent with a Postgres server running localhost
on port 5432
with database postgres and Postgres username/password combo postgres/password
.
# PATH TO DIR THAT CONTAINS THE INSTALL DIR
export BASE_PATH=/opt
# PATH TO AGENT INSTALL DIR
export INSTALL_PATH=/opt/owl
# DQ Metadata Postgres Storage settings
export METASTORE_HOST=localhost
export METASTORE_PORT=5432
export METASTORE_DB=postgres
export METASTORE_USER=postgres
export METASTORE_PASSWORD=password
cd $INSTALL_PATH
# Install DQ Agent only
./setup.sh \
-owlbase=$BASE_PATH \
-options=owlagent \
-pguser=$METASTORE_USER \
-pgpassword=$METASTORE_PASSWORD \
-pgserver=${METASTORE_HOST}:${METASTORE_PORT}/${METASTORE_DB}
The setup script automatically generates the /opt/owl/config/owl.properties
file and encrypts the provided password.
Setting up a DQ Agent manually
Steps
- Open a terminal session and go to the directory with the installer.
- Run the following command to encrypt your DQ Metastore password before it is stored in the
/opt/owl/config/owl.properties
file:Copy# PATH TO AGENT INSTALL DIR
export INSTALL_PATH=/opt/owl
cd $INSTALL_PATH
#Encrypt DQ Metadata Postgres Storage password
./owlmanage.sh encrypt=passwordNote
owlmanage.sh
generates an encrypted string for the plain text password input. You can use the encrypted string in the/opt/owl/config/owl.properties
configuration file to avoid exposing the DQ Metadata Postgres Storage password. - Run the following command to open the
/opt/owl/config/owl.properties
configuration file:Copyvi $INSTALL_PATH/config/owl.properties
- Add the following properties to the configuration file:Copy
spring.datasource.url=jdbc:postgresql://{DB_HOST}:{DB_PORT}/{METASTORE_DB}
spring.datasource.username={METASTORE_USER}
spring.datasource.password={METASTORE_PASSWORD}
spring.datasource.driver-class-name=com.owl.org.postgresql.Driver
spring.agent.datasource.url=jdbc:postgresql://{DB_HOST}:{DB_PORT}/{METASTORE_DB}
spring.agent.datasource.username={METASTORE_USER}
spring.agent.datasource.password={METASTORE_PASSWORD}
spring.agent.datasource.driver-class-name=org.postgresql.Driver - Restart the DQ Web App.
Setting up the DQ Agent from the Admin Console
Steps
- On the Collibra DQ home page, hover your cursor over
Settings and select Admin Console.
The Admin Console opens. - Click Remote Agent.
The Agent Management page opens. - In the last column of the Agents table, to the right, click the pencil icon to edit your agent.
The Edit Agent modal appears. - Enter the required information.
Field Description Agent Id The numeric identifier of your agent. For example, 6.
This filed is pre-filled and cannot be edited.
Agent Name The unique name of your agent.
This field is pre-filled and cannot be edited.
Agent Display Name The descriptive name of your agent that displays anywhere agent information is present in the DQ Web App. You can customize the Agent Display Name to make it easier to identify your agent.
Tip There are no character restrictions for the Agent Display Name field, but it is best practice to use only alphanumeric characters, hyphens, and underscores.
Is Local Select for Hadoop deployments only. Is Livy Deprecated. Not used. Livy Host The location where your Livy agent is hosted. This field is only applicable when Livy is in use. Base Path The installation folder path for DQ. All other paths in the DQ Agent are relative to this installation path.
This is the location that is set as
OWL_BASE
in Full Standalone Setup and other installation setups followed byowl/
folder. For example, if the setup command isexport OWL_BASE=/home/centos
then the Base Path in the Agent configuration should be set to/home/centos/owl/
.Default:
/opt/owl/
.Collibra DQ Core JAR The file path to DQ Core jar file.
Default
<Base Path>/owl/bin/
.Collibra DQ Core Logs The folder path where DQ Core logs are stored. Logs from DQ Jobs are stored in this folder.
Default:
<Base Path>/owl/log
.Collibra DQ Script The file path to DQ execution script
owlcheck.sh
. This script is used to run DQ Job via command line without using agent. Usingowlcheck.sh
for running DQ Jobs is superseded by DQ Agent execution model. Default:<Base Path>/owl/bin/owlcheck
.Collibra DQ Web Logs The folder path where DQ Web logs are stored. Logs from the DQ Web App are stored in this folder.
Default:
<Base Path>/owl/log
.Default Queue The default resource queue for YARN. Deploy Deployment Mode The Spark deployment mode that takes one of Client
orCluster
.Default Master The Spark Master URL copied from the Spark cluster verification screen. For example, spark://...
.Dynamic Spark Allocation Deprecated. Not used. Spark Conf Key Deprecated. Not used. Spark Conf Value Deprecated. Not used. Number of Executor(s) The default number of executors allocated per DQ Job when using this Agent to run DQ Scans. The default is 1. Executor Memory (GB) The default RAM per executors allocated per DQ Job when using this Agent to run DQ Scans. The default is 1 gigabyte. Number of Core(s) The default number of cores per executors allocated per DQ Job when using this Agent to run DQ Scans. The default is 1. Driver Memory (GB) The default driver RAM allocated per DQ Job when using this Agent to run DQ Scans. The default is 1 gigabyte. Free Form (Appended) Other spark-submit
parameters to append to each DQ Job when using this Agent to run DQ Scans. - Click Save.
Linking data sources to the DQ Agent from the Admin Console
When you add new Data Sources, the DQ Agent requires permission to run DQ Jobs with them.
Steps
- On the Collibra DQ home page, hover your cursor over
Settings and select Admin Console.
The Admin Console opens. - Click Remote Agent.
The Agent Management page opens. - In the last column of the Agents table, to the right, click the chain link icon to link your agent to data source connections.
The Agent to Connection Management wizard appears.Note The left panel contains a list of available connections that are not yet linked to the DQ Agent and do not yet have permission to run DQ Jobs. The right panel contains a list of connections that are linked to the DQ Agent and have permission to run DQ Jobs.
- Click a connection in the left panel to link connections one at a time or click the double arrow icon to link all available connections at the same time.
- Click Update.
Tip You can unlink connections with the same methods listed above, but click the connections listed in the right panel instead of the left. Successfully unlinked connections appear in the left panel.