Set up a DQ agent

This topic shows you how to configure a DQ agent.

Setting up a DQ Agent with setup.sh as part of the DQ package

Use the setup.sh script located in /opt/owl/ (or other Base Path that your installation used). See the example in the code block below for setting up a DQ Agent with a Postgres server running localhost on port 5432 with database postgres and Postgres username/password combo postgres/password.

Copy
# PATH TO DIR THAT CONTAINS THE INSTALL DIR
                export BASE_PATH=/opt

                # PATH TO AGENT INSTALL DIR
                export INSTALL_PATH=/opt/owl

                # DQ Metadata Postgres Storage settings 
                export METASTORE_HOST=localhost
                export METASTORE_PORT=5432
                export METASTORE_DB=postgres
                export METASTORE_USER=postgres
                export METASTORE_PASSWORD=password 

                cd $INSTALL_PATH

                # Install DQ Agent only
                ./setup.sh \
                -owlbase=$BASE_PATH \
                -options=owlagent \
                -pguser=$METASTORE_USER \
                -pgpassword=$METASTORE_PASSWORD \
            -pgserver=${METASTORE_HOST}:${METASTORE_PORT}/${METASTORE_DB}

The setup script automatically generates the /opt/owl/config/owl.properties file and encrypts the provided password.

Setting up a DQ agent manually

  1. Open a terminal session and go to the directory with the installer.
  2. Run the following command to encrypt your DQ Metastore password before it is stored in the /opt/owl/config/owl.properties file:
    Copy
    # PATH TO AGENT INSTALL DIR
                            export INSTALL_PATH=/opt/owl

                            cd $INSTALL_PATH

                            #Encrypt DQ Metadata Postgres Storage password
                ./owlmanage.sh encrypt=password

    Note owlmanage.sh generates an encrypted string for the plain text password input. You can use the encrypted string in the /opt/owl/config/owl.properties configuration file to avoid exposing the DQ Metadata Postgres Storage password.

  3. Run the following command to open the /opt/owl/config/owl.properties configuration file:
    Copy
    vi $INSTALL_PATH/config/owl.properties
  4. Add the following properties to the configuration file:
    Copy
    spring.datasource.url=jdbc:postgresql://{DB_HOST}:{DB_PORT}/{METASTORE_DB}
                            spring.datasource.username={METASTORE_USER}
                            spring.datasource.password={METASTORE_PASSWORD}
                            spring.datasource.driver-class-name=com.owl.org.postgresql.Driver
     
                            spring.agent.datasource.url=jdbc:postgresql://{DB_HOST}:{DB_PORT}/{METASTORE_DB}
                            spring.agent.datasource.username={METASTORE_USER}
                            spring.agent.datasource.password={METASTORE_PASSWORD}
                spring.agent.datasource.driver-class-name=org.postgresql.Driver
  5. Restart the DQ Web App.

Setting up the DQ Agent from the Admin Console

  1. On the Collibra DQ home page, hover your cursor over Settings and select Admin Console.
    The Admin Console opens.
  2. Click Remote Agent.
    The Agent Management page opens.
  3. In the last column of the Agents table, to the right, click the pencil icon to edit your agent.
    The Edit Agent modal appears.
  4. Enter the required information.

    FieldDescription
    Agent Id

    The numerical identifier of your agent. For example, 6.

    This field auto-generates and cannot be edited.

    Agent Name

    The unique name of your agent.

    This field auto-generates and cannot be edited.

    Agent Display Name

    The descriptive name of your agent that displays anywhere agent information is present in the DQ Web App. You can customize the Agent Display Name to make it easier to identify your agent.

    Tip There are no character restrictions for the Agent Display Name field, but it is best practice to use only alphanumeric characters, hyphens, and underscores.

    Is LocalSelect this option for Hadoop deployments only.
    Use LivyNot applicable.
    Livy HostNot applicable.
    Base Path

    The installation folder path for DQ. All other paths in the DQ Agent are relative to this installation path.

    This is the location that is set as OWL_BASE in Full Standalone Setup and other installation setups followed by owl/ folder. For example, if the setup command is export OWL_BASE=/home/centos then the Base Path in the Agent configuration should be set to /home/centos/owl/.

    Default: /opt/owl/.

    Collibra DQ Core JAR

    The file path to the DQ Core jar file.

    Default <Base Path>/owl/bin/

    Collibra DQ Core Logs

    The folder path where DQ Core logs are stored. Logs from DQ Jobs are stored in this folder.

    Default: <Base Path>/owl/log

    Collibra DQ Script

    The file path to DQ execution script owlcheck.sh. This script is used to run DQ Job via command line without using the agent. Using owlcheck.sh for running DQ Jobs is superseded by DQ Agent execution model.

    Default: <Base Path>/owl/bin/owlcheck

    Collibra DQ Web Logs

    The folder path where DQ Web logs are stored. Logs from the DQ Web App are stored in this folder.

    Default: <Base Path>/owl/log

    Default QueueOnly used for Yarn.
    Deploy Deployment Mode

    The Spark deployment mode can be either Client or Cluster. While we recommend Cluster, there are best practices to follow:

    • If you only have one Spark Worker node, it is best practice to select Client.
    • If you have more than one Spark Worker node, it is best practice to select Cluster.
    Default Master

    The Spark Master URL copied from the Spark cluster verification screen. For example, spark://...

    Dynamic Spark AllocationNot applicable.
    Spark Configuration KeyNot applicable.
    Spark Configuration ValueNot applicable.
    Number of Executor(s)The default number of executors allocated per DQ Job when using this Agent to run DQ Scans. The default is 1.
    Executor Memory

    The default RAM per executors allocated per DQ Job when using this Agent to run DQ Scans. Go to Hardware Sizing for more information.

    The default is 1 gigabyte.

    Number of Core(s)

    The default number of cores per executors allocated per DQ Job when using this Agent to run DQ Scans. Go to Hardware Sizing for more information.

    The default is 1.

    Driver Memory

    The default driver RAM allocated per DQ Job when using this Agent to run DQ Scans. Go to Hardware Sizing for more information.

    The default is 1 gigabyte.

    Free Form (Appended)Other spark-submit parameters to append to each DQ Job when using this Agent to run DQ Scans.

    FieldDescription
    Agent Id

    The numerical identifier of your agent. For example, 6.

    This field auto-generates and cannot be edited.

    Agent Name

    The unique name of your agent.

    This field auto-generates and cannot be edited.

    Agent Display Name

    The descriptive name of your agent that displays anywhere agent information is present in the DQ Web App. You can customize the Agent Display Name to make it easier to identify your agent.

    Tip There are no character restrictions for the Agent Display Name field, but it is best practice to use only alphanumeric characters, hyphens, and underscores.

    Is LocalYou can select this option to form the driver location path, which is normally applicable only when you run your agent in the master or edge node.
    Use LivyNot applicable.
    Livy HostNot applicable.
    Base Path

    The installation folder path for DQ. All other paths in the DQ Agent are relative to this installation path.

    This is the location that is set as OWL_BASE in Full Standalone Setup and other installation setups followed by owl/ folder. For example, if the setup command is export OWL_BASE=/home/centos then the Base Path in the Agent configuration should be set to /home/centos/owl/.

    Default: /opt/owl/

    Collibra DQ Core JAR

    The file path to the DQ Core jar file.

    Default: <Base Path>/owl/bin/

    Collibra DQ Core Logs

    The folder path where DQ Core logs are stored. Logs from DQ Jobs are stored in this folder.

    Default: <Base Path>/owl/log

    Collibra DQ Script

    The file path to the DQ execution script owlcheck.sh. This script is used to run DQ Jobs via the command line without using an agent. Using owlcheck.sh for running DQ Jobs is superseded by the DQ Agent execution model.

    Default: <Base Path>/owl/bin/owlcheck

    Collibra DQ Web Logs

    The folder path where DQ Web logs are stored. Logs from the DQ Web App are stored in this folder.

    Default: <Base Path>/owl/log.

    Default QueueThe default resource queue to submit jobs.
    Default Deployment ModeThe Spark deployment mode for Yarn is Cluster.
    Default Master

    Set to Yarn.

    Click Edit Yarn Config to ensure you have the necessary Hadoop xml files. Edit the file templates as necessary:

    XML FileDescription
    core-site.xml

    Contains information about where authentication protocol, HDFS_RPC_PROTECTION, and the NAME_NODE run in the Hadoop cluster.

    hdfs-site.xml

    Contains the configuration settings for authentication protocol, the NAME_NODE, and DATA_NODE.

    yarn-site.xml

    Contains the Yarn resource manager settings.

    Dynamic Spark AllocationNot applicable.
    Spark Configuration KeyNot applicable.
    Spark Configuration ValueNot applicable.
    Number of Executor(s)

    The default number of executors allocated per DQ Job when using this Agent to run DQ Scans.

    The default is 1.

    Executor Memory

    The default RAM per executors allocated per DQ Job when using this Agent to run DQ Scans. Go to Hardware Sizing for more information.

    The default is 1 gigabyte.

    Number of Core(s)

    The default number of cores per executors allocated per DQ Job when using this Agent to run DQ Scans. Go to Hardware Sizing for more information.

    The default is 1.

    Driver Memory

    The default driver RAM allocated per DQ Job when using this Agent to run DQ Scans. Go to Hardware Sizing for more information.

    The default is 1 gigabyte.

    Free Form (Appended)Other spark-submit parameters to append to each DQ Job when using this Agent to run DQ Scans.

    Note Ensure that your service account has the permission to launch Spark executor pods. Refer to the executor launch template and permissions.

    FieldDescription
    Agent Id

    The numerical identifier of your agent. For example, 6.

    This field auto-generates and cannot be edited.

    Agent Name

    The unique name of your agent.

    This field auto-generates and cannot be edited.

    Agent Display Name

    The descriptive name of your agent that displays anywhere agent information is present in the DQ Web App. You can customize the Agent Display Name to make it easier to identify your agent.

    Tip There are no character restrictions for the Agent Display Name field, but it is best practice to use only alphanumeric characters, hyphens, and underscores.

    Is LocalSelect this option for Hadoop deployments only.
    Use LivyNot applicable.
    Livy HostNot applicable.
    Base Path

    The installation folder path for DQ. All other paths in the DQ Agent are relative to this installation path.

    This is the location that is set as OWL_BASE in Full Standalone Setup and other installation setups followed by owl/ folder. For example, if the setup command is export OWL_BASE=/home/centos then the Base Path in the Agent configuration should be set to /home/centos/owl/.

    Default: /opt/owl/

    Collibra DQ Core JAR

    The file path to the DQ Core jar file.

    Default: <Base Path>/owl/bin/

    Collibra DQ Core Logs

    The folder path where DQ Core logs are stored. Logs from DQ Jobs are stored in this folder.

    Default: <Base Path>/owl/log

    Collibra DQ Script

    The file path to DQ execution script owlcheck.sh. This script is used to run DQ Job via command line without using agent. Using owlcheck.sh for running DQ Jobs is superseded by DQ Agent execution model. Default: <Base Path>/owl/bin/owlcheck.

    Collibra DQ Web Logs

    The folder path where DQ Web logs are stored. Logs from the DQ Web App are stored in this folder.

    Default: <Base Path>/owl/log

    Default QueueOnly used for Yarn.
    Default Deployment ModeThe Spark deployment mode for Kubernetes is Cluster.
    Default Master

    The Kubernetes Master URL copied from the Kubernetes cluster verification screen.

    Set this value to k8s:// instead of a specific URL. When you leave this value set to k8s://, this helps Collibra DQ auto-discover the High Availability endpoint of the Kubernetes control plane at runtime.

    Warning Only set this to a specific URL, such as k8s://{hostname}:443, if you are an advanced user or if your specific use case requires it.

    Dynamic Spark AllocationNot applicable.
    Spark Configuration KeyNot applicable.
    Spark Configuration ValueNot applicable.
    Number of Executor(s)The default number of executors allocated per DQ Job when using this Agent to run DQ Scans. The default is 1.
    Executor Memory

    The default RAM per executors allocated per DQ Job when using this Agent to run DQ Scans. Go to Hardware Sizing for more information.

    The default is 1 gigabyte.

    Number of Core(s)

    The default number of cores per executors allocated per DQ Job when using this Agent to run DQ Scans. Go to Hardware Sizing for more information.

    The default is 1.

    Driver Memory

    The default driver RAM allocated per DQ Job when using this Agent to run DQ Scans. Go to Hardware Sizing for more information.

    The default is 1 gigabyte.

    Free Form (Appended)Other spark-submit parameters to append to each DQ Job when using this Agent to run DQ Scans.

    Note If you bring in your own Spark executor pod launch template, ensure that the service account used to launch Spark executor pods has the permission to do so. Refer to the executor launch template and for more information.

  5. Click Save.

What's next?