Troubleshooting Standalone Install

This section provides a variety of tips for troubleshooting the Collibra DQ standalone installation process.

Note To see the directory structure created by setup.sh, go to Collibra DQ Directory Structure.
Note For additional configuration options in setup.sh, owl-env.sh, and owl.properties, go to Additional Standalone Configuration Options.

Start and stop components

The owlmanage.sh script enables you to stop and start services or some components of services. It is executed from the /owl/bin directory.

To start different components:

Copy
./owlmanage.sh start=postgres
./owlmanage.sh start=owlagent
./owlmanage.sh start=owlweb

To stop different components:

Copy
./owlmanage.sh stop=postgres
./owlmanage.sh stop=owlagent
./owlmanage.sh stop=owlweb

Increase Memory Usage

To increase memory usage of Java processes in DQ Web, add the following environment variable to owl-env.sh and restart the DQ Web service:

Copy
export EXTRA_JVM_OPTIONS="-Xms2g -Xmx2g"

To increase memory usage of Java processes in DQ Agent, update owlmanage.sh. In the start_owlagent() function (at line number 47), update the /java command to include the options -Xms2g -Xmx2g. Then, restart the DQ Agent service.

Verify that the working directory has permissions

For example, if I SSH into the machine with user owldq and use my default home directory location /home/owldq/

Copy
### Ensure appropriate permissions 
### drwxr-xr-x

chmod -R 755 /home/owldq

Reinstall PostgreSQL

Copy
### Postgres data directly initialization failed 
### Postgres permission denied errors
### sed: can't read /home/owldq/owl/postgres/data/postgresql.conf: Permission denied

sudo rm -rf /home/owldq/owl/postgres
chmod -R 755 /home/owldq

### Reinstall just postgres
./setup.sh -owlbase=$OWL_BASE -user=$OWL_METASTORE_USER -pgpassword=$OWL_METASTORE_PASS -options=postgres

Change PostgreSQL password from SSH

Copy
### If you need to update your postgres password, you can leverage SSH into the VM
### Connect to your hosted instance of Postgres

sudo -i -u postgres
psql -U postgres
\password
#Enter new password: ### Enter Strong Password
#Enter it again: ### Re-enter Strong Password
\q
exit

Warning The $ symbol is not a supported special character in your PostgreSQL Metastore password.

Add permissions for ssh keys when starting Spark

Copy
### Spark standalone permission denied after using ./start-all.sh 

ssh-keygen -t rsa -N "" -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Tip If the recommendation above is unsuccessful, use the following commands instead of ./start-all.sh:
./start-master.sh
./start-worker.sh spark://$(hostname -f):7077

Change permissions if log files are not writable

Copy
### Changing permissiongs on individual log files 

sudo chmod 777 /home/owldq/owl/pids/owl-agent.pid
sudo chmod 777 /home/owldq/owl/pids/owl-web.pid

Get the hostname of the instance

Copy
### Getting the hostname of the instance

hostname -f

Increase thread pool

If the thread pool is exhausted, you may need to increase the thread pool.

Update the owl-env.sh script

Copy
# vi owl-env.sh
# modify these lines

export SPRING_DATASOURCE_POOL_MAX_WAIT=500
export SPRING_DATASOURCE_POOL_MAX_SIZE=30
export SPRING_DATASOURCE_POOL_INITIAL_SIZE=5

# restart web and agent

Update the Spark agent configurations

If you see the following message, update the agent configurations within load-spark-env.sh:

Copy
Failed to obtain JDBC Connection; nested exception is org.apache.tomcat.jdbc.pool.PoolExhaustedException: [pool-29-thread-2] Timeout: Pool empty. Unable to fetch a connection in 0 seconds, none available[size:2; busy:1; idle:0; lastwait:200].

Adjust the following configurations to modify the connection pool available:

Copy
export SPRING_DATASOURCE_POOL_MAX_WAIT=1000
export SPRING_DATASOURCE_POOL_MAX_SIZE=30
export SPRING_DATASOURCE_POOL_INITIAL_SIZE=5

Note The load-spark-env.sh file is located in the $SPARK_HOME/bin folder.

Update the owl.properties file

Depending on client vs. cluster mode and cluster type, you may also need to add the following configurations in the owl.properties file:

Copy
export SPRING_DATASOURCE_TOMCAT_MAXIDLE=10
export SPRING_DATASOURCE_TOMCAT_MAXACTIVE=20 
export SPRING_DATASOURCE_TOMCAT_MAXWAIT=10000 
export SPRING_DATASOURCE_TOMCAT_INITIALSIZE=4

Jobs stuck in the Staged activity

If DQ Jobs are stuck in the Staged activity on the Jobs page, update the following properties in the owl-env.sh file to adjust the DQ Web component:

Copy
export SPRING_DATASOURCE_POOL_MAX_WAIT=2500
export SPRING_DATASOURCE_POOL_MAX_SIZE=1000
export SPRING_DATASOURCE_POOL_INITIAL_SIZE=150
export SPRING_DATASOURCE_TOMCAT_MAXIDLE=100
export SPRING_DATASOURCE_TOMCAT_MAXACTIVE=2000
export SPRING_DATASOURCE_TOMCAT_MAXWAIT=10000

Depending on whether your agent is set to Client or Cluster default deployment mode, you may also need to update the following configurations to the owl.properties file:

Copy
spring.datasource.tomcat.initial-size=5
spring.datasource.tomcat.max-active=30
spring.datasource.tomcat.max-wait=1000

Restart the DQ Web and Agent components.

Active database queries

Copy
select * from pg_stat_activity where state='active'

Too many open files error

Copy
### "Too many open files error message"
### check and modify that limits.conf file 
### Do this on the machine where the agent is running for Spark standalone version

ulimit -Ha 
cat /etc/security/limits.conf 

### Edit the limits.conf file
sudo vi /etc/security/limits.conf

### Increase the limit for example 
### Add these 3 lines 
fs.file-max=500000
*               soft    nofile           58192
*               hard    nofile           100000
### do not comment out the 3 lines (no '#'s in the 3 lines above)

Redirect Spark scratch

Copy
### Redirect Spark scratch to another location
SPARK_LOCAL_DIRS=/mnt/disks/sdb/tmp

Alternatively, you can add the following to the Free form (Appended) field on the Agent Configuration page to change Spark storage: -conf spark.local.dir=/home/owldq/owl/owltmp

Automate cleanup of Spark work folders

You can add the following line of code to owl/spark/conf/spark-env.sh, which must be copied from the spark-env.sh.template file, or to the bottom of owl/spark/bin/load-spark-env.sh.

Copy
### Set Spark to delete older files
export SPARK_WORKER_OPTS="${SPARK_WORKER_OPTS} -Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=1800 -Dspark.worker.cleanup.appDataTtl=3600"

Check disk space in the Spark work folder

Copy
### Check worker nodes disk space 
sudo du -ah | sort -hr | head -5

Delete files in the Spark work folder

Copy
### Delete any files in the Spark work directory
sudo find /home/owldq/owl/spark/work/* -mtime +1 -type f -delete

Troubleshooting Kerberos

For debug logging, add the following to the owl-env.sh file:

Copy
# For Kerberos debug logging
export EXTRA_JVM_OPTIONS="-Dsun.security.krb5.debug=true"

Reboot the Collibra DQ web service with the following:

Copy
./owlmanage.sh stop=owlweb
./owlmanage.sh start=owlweb

Note You can use the option for several purposes such as SSL debugging, setting HTTP/HTTPS proxy settings, setting additional keystore properties, and so on.

Add Spark home environment variables to profile

Copy
### Adding ENV variables to bash profile

### Variable 'owldq' below should be updated wherever installed e.g. centos

vi ~/.bash_profile
export SPARK_HOME=/home/owldq/owl/spark
export PATH=$SPARK_HOME/bin:$PATH

### Add to owl-env.sh for standalone install 

vi /home/owldq/owl/config/owl-env.sh 
export SPARK_HOME=/home/owldq/owl/spark
export PATH=$SPARK_HOME/bin:$PATH

Spark launch scripts

For information on working with Spark launch scripts, go to Spark Launch Scripts.

Check that processes are running

Copy
### Checking PIDS for different components

ps -aef|grep postgres
ps -aef|grep owl-web
ps -aef|grep owl-agent
ps -aef|grep spark