Agent

Diagram

dq agent diagram

The diagram above provides a high-level overview of how agents work within Collibra DQ. Job execution is driven by DQ Jobs that are written to an agent_q table inside the DQ Metastore (DQ-Postgres) via the Web App or REST API endpoint. Each active and available agent queries the DQ-Postgres table every 5 seconds to execute DQ Jobs for which the agent is responsible. For example, the EMR agent DQ-Agent3 only executes DQ Jobs scheduled to run on EMR.

When an agent picks up a DQ Job, it launches the job either locally on the agent node itself or on a cluster as a Spark job (if the agent is set up as an edge node of the cluster). Depending on where the job launches, the results of the DQ Job will write back to the DQ Metastore. The results then display in the DQ Web App, are exposed as REST API, and become available for direct SQL query against the DQ Metastore.

Setting up a DQ Agent with setup.sh as part of the DQ package

Use the setup.sh script located in /opt/owl/ (or other Base Path that your installation used). See the example in the code block below for setting up a DQ Agent with a Postgres server running localhost on port 5432 with database postgres and Postgres username/password combo postgres/password.

Copy
# PATH TO DIR THAT CONTAINS THE INSTALL DIR
export BASE_PATH=/opt

# PATH TO AGENT INSTALL DIR
export INSTALL_PATH=/opt/owl

# DQ Metadata Postgres Storage settings 
export METASTORE_HOST=localhost
export METASTORE_PORT=5432
export METASTORE_DB=postgres
export METASTORE_USER=postgres
export METASTORE_PASSWORD=password 

cd $INSTALL_PATH

# Install DQ Agent only
./setup.sh \
    -owlbase=$BASE_PATH \
    -options=owlagent \
    -pguser=$METASTORE_USER \
    -pgpassword=$METASTORE_PASSWORD \
    -pgserver=${METASTORE_HOST}:${METASTORE_PORT}/${METASTORE_DB}

The setup script automatically generates the /opt/owl/config/owl.properties file and encrypts the provided password.

Setting up a DQ Agent manually

Steps

  1. Open a terminal session and go to the directory with the installer.
  2. Run the following command to encrypt your DQ Metastore password before it is stored in the /opt/owl/config/owl.properties file:
    Copy
    # PATH TO AGENT INSTALL DIR
    export INSTALL_PATH=/opt/owl

    cd $INSTALL_PATH

    #Encrypt DQ Metadata Postgres Storage password
    ./owlmanage.sh encrypt=password

    Note owlmanage.sh generates an encrypted string for the plain text password input. You can use the encrypted string in the /opt/owl/config/owl.properties configuration file to avoid exposing the DQ Metadata Postgres Storage password.

  3. Run the following command to open the /opt/owl/config/owl.properties configuration file:
    Copy
    vi $INSTALL_PATH/config/owl.properties
  4. Add the following properties to the configuration file:
    Copy
    spring.datasource.url=jdbc:postgresql://{DB_HOST}:{DB_PORT}/{METASTORE_DB}
    spring.datasource.username={METASTORE_USER}
    spring.datasource.password={METASTORE_PASSWORD}
    spring.datasource.driver-class-name=com.owl.org.postgresql.Driver
     
    spring.agent.datasource.url=jdbc:postgresql://{DB_HOST}:{DB_PORT}/{METASTORE_DB}
    spring.agent.datasource.username={METASTORE_USER}
    spring.agent.datasource.password={METASTORE_PASSWORD}
    spring.agent.datasource.driver-class-name=org.postgresql.Driver
  5. Restart the DQ Web App.

Setting up the DQ Agent from the Admin Console

Steps

  1. On the Collibra DQ home page, hover your cursor over Settings and select Admin Console.
    The Admin Console opens.
  2. Click Remote Agent.
    The Agent Management page opens.
  3. In the last column of the Agents table, to the right, click the pencil icon to edit your agent.
    The Edit Agent modal appears.
  4. Enter the required information.

    FieldDescription
    Agent Id

    The numeric identifier of your agent. For example, 6.

    This filed is pre-filled and cannot be edited.

    Agent Name

    The unique name of your agent.

    This field is pre-filled and cannot be edited.

    Agent Display Name

    The descriptive name of your agent that displays anywhere agent information is present in the DQ Web App. You can customize the Agent Display Name to make it easier to identify your agent.

    Tip There are no character restrictions for the Agent Display Name field, but it is best practice to use only alphanumeric characters, hyphens, and underscores.

    Is LocalSelect for Hadoop deployments only.
    Is LivyDeprecated. Not used.
    Livy HostThe location where your Livy agent is hosted. This field is only applicable when Livy is in use.
    Base Path

    The installation folder path for DQ. All other paths in the DQ Agent are relative to this installation path.

    This is the location that is set as OWL_BASE in Full Standalone Setup and other installation setups followed by owl/ folder. For example, if the setup command is export OWL_BASE=/home/centos then the Base Path in the Agent configuration should be set to /home/centos/owl/.

    Default: /opt/owl/.

    Collibra DQ Core JAR

    The file path to DQ Core jar file.

    Default <Base Path>/owl/bin/.

    Collibra DQ Core Logs

    The folder path where DQ Core logs are stored. Logs from DQ Jobs are stored in this folder.

    Default: <Base Path>/owl/log.

    Collibra DQ Script

    The file path to DQ execution script owlcheck.sh. This script is used to run DQ Job via command line without using agent. Usingowlcheck.shfor running DQ Jobs is superseded by DQ Agent execution model. Default: <Base Path>/owl/bin/owlcheck.

    Collibra DQ Web Logs

    The folder path where DQ Web logs are stored. Logs from the DQ Web App are stored in this folder.

    Default: <Base Path>/owl/log.

    Default QueueThe default resource queue for YARN.
    Deploy Deployment ModeThe Spark deployment mode that takes one of Client or Cluster.
    Default MasterThe Spark Master URL copied from the Spark cluster verification screen. For example, spark://....
    Dynamic Spark AllocationDeprecated. Not used.
    Spark Conf KeyDeprecated. Not used.
    Spark Conf ValueDeprecated. Not used.
    Number of Executor(s)The default number of executors allocated per DQ Job when using this Agent to run DQ Scans. The default is 1.
    Executor Memory (GB)The default RAM per executors allocated per DQ Job when using this Agent to run DQ Scans. The default is 1 gigabyte.
    Number of Core(s)The default number of cores per executors allocated per DQ Job when using this Agent to run DQ Scans. The default is 1.
    Driver Memory (GB)The default driver RAM allocated per DQ Job when using this Agent to run DQ Scans. The default is 1 gigabyte.
    Free Form (Appended)Other spark-submit parameters to append to each DQ Job when using this Agent to run DQ Scans.

  5. Click Save.

Linking data sources to the DQ Agent from the Admin Console

When you add new Data Sources, the DQ Agent requires permission to run DQ Jobs with them.

Steps

  1. On the Collibra DQ home page, hover your cursor over Settings and select Admin Console.
    The Admin Console opens.
  2. Click Remote Agent.
    The Agent Management page opens.
  3. In the last column of the Agents table, to the right, click the chain link icon to link your agent to data source connections.
    The Agent to Connection Management wizard appears.

    Note The left panel contains a list of available connections that are not yet linked to the DQ Agent and do not yet have permission to run DQ Jobs. The right panel contains a list of connections that are linked to the DQ Agent and have permission to run DQ Jobs.

  4. Click a connection in the left panel to link connections one at a time or click the double arrow icon to link all available connections at the same time.
  5. Click Update.

Tip You can unlink connections with the same methods listed above, but click the connections listed in the right panel instead of the left. Successfully unlinked connections appear in the left panel.

Adding a connection to a DQ Agent