View Job logs in the DQ user interface

You can view the logs for individual DQ Jobs from the Jobs page. This section shows you how to access the logs from the DQ UI.

In Stage 2, Collibra DQ submits the DQ Job via Spark Submit to Spark for processing. Jobs that fail in this stage are generally due to issues with the Agent.

You can access the Stage 2 logs from the DQ UI with the following steps:

  1. Go to the Jobs page.
  2. Click Job Actions on the same line as your Job ID.
  3. The Job Actions drop-down list expands.
  4. Select Stage 2 Logs from the options.
  5. The Stage 2 Logs open in a new window.

Note Stage 2 logs are only applicable when Jobs submit to a cluster.

Unlike the Stage 2 logs, the DQ Core code is active and the Job is no longer under Spark's guidance, but back inside Collibra DQ. Because the Job runs in DQ Core at this point, the Stage 3 log is necessary to troubleshoot problems that happen within DQ Core.

You can access the Stage 3 logs from the DQ UI with the following steps:

  1. Go to the Jobs page.
  2. Click Job Actions on the same line as your Job ID.
  3. The Job Actions drop-down list expands.
  4. Select Stage 3 Logs from the options.
  5. The Stage 3 Logs open in a new window.

Note Stage 3 logs are only applicable when Jobs submit to a cluster.

Tip 
Stage 1 logs are not accessible from the DQ UI. If a DQ Job is stuck in Staging status, check the Agent logs from the agent log file (<INSTALL_HOME>/log/agent.log) or K8s kubectl logs (<agent-pod-name> -n <namespace>)

Retrieve Stage 2 and 3 Logs Using a Managed Hadoop Spark Cluster

When you are using a managed solution, such as AWS EMR, GCP Dataproc, etc., for Collibra DQ, you need to make certain files available to the Agent.

Where your Agent is installed, review the owl-env.sh file and find the value for the variable HADOOP_CONF_DIR and navigate to that path.

If the path does not exist or if that directory does not contain the core-site.xml, hdfs-site.xml, and yarn-site.xml files, it's possible that you need to do one of the following:

  • Update the HADOOP_CONF_DIR directory within the owl-env.sh, enabling the process to point to the folder where these files exist.

  • Move these XML files to the directory that already contains the files.

Typically for a typical Hadoop installation, this is under/etc/: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html#Configuring_Environment_of_Hadoop_Daemons

For the yarn-site.xml, the following configurations must be present:

Copy
yarn.resourcemanager.scheduler.address
yarn.resourcemanager.address
yarn.resourcemanager.webapp.address
yarn.resourcemanager.webapp.https.address
yarn.resourcemanager.principal

For the core-site.xml, the following minimum configurations must be present:

Copy
hadoop.security.authentication
hadoop.rpc.protection
fs.defaultFS

For the hdfs-site.xml, the following minimum configurations must be present:

Copy
dfs.client.use.datanode.hostname
dfs.namenode.kerberos.principal

Note The hdfs-site.xml is typically only needed for self-hosted/standalone Hadoop environments, which are not hosted in the Cloud. For more information, go to Hadoop Install.