Troubleshooting Standalone Install
This section provides a variety of tips for troubleshooting the Collibra DQ standalone installation process.
setup.sh
, owl-env.sh
, and owl.properties
, go to Additional Standalone Configuration Options.Start and stop components
The owlmanage.sh
script enables you to stop and start services or some components of services. It is executed from the /owl/bin
directory.
To start different components:
./owlmanage.sh start=postgres
./owlmanage.sh start=owlagent
./owlmanage.sh start=owlweb
To stop different components:
./owlmanage.sh stop=postgres
./owlmanage.sh stop=owlagent
./owlmanage.sh stop=owlweb
Increase Memory Usage
To increase memory usage of Java processes in DQ Web, add the following environment variable to owl-env.sh
and restart the DQ Web service:
export EXTRA_JVM_OPTIONS="-Xms2g -Xmx2g"
To increase memory usage of Java processes in DQ Agent, update owlmanage.sh
. In the start_owlagent()
function (at line number 47), update the /java
command to include the options -Xms2g -Xmx2g
. Then, restart the DQ Agent service.
Verify that the working directory has permissions
For example, if I SSH into the machine with user owldq
and use my default home directory location /home/owldq/
### Ensure appropriate permissions
### drwxr-xr-x
chmod -R 755 /home/owldq
Reinstall PostgreSQL
### Postgres data directly initialization failed
### Postgres permission denied errors
### sed: can't read /home/owldq/owl/postgres/data/postgresql.conf: Permission denied
sudo rm -rf /home/owldq/owl/postgres
chmod -R 755 /home/owldq
### Reinstall just postgres
./setup.sh -owlbase=$OWL_BASE -user=$OWL_METASTORE_USER -pgpassword=$OWL_METASTORE_PASS -options=postgres
Change PostgreSQL password from SSH
### If you need to update your postgres password, you can leverage SSH into the VM
### Connect to your hosted instance of Postgres
sudo -i -u postgres
psql -U postgres
\password
#Enter new password: ### Enter Strong Password
#Enter it again: ### Re-enter Strong Password
\q
exit
Warning The $
symbol is not a supported special character in your PostgreSQL Metastore password.
Add permissions for ssh keys when starting Spark
### Spark standalone permission denied after using ./start-all.sh
ssh-keygen -t rsa -N "" -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Tip If the recommendation above is unsuccessful, use the following commands instead of ./start-all.sh
:./start-master.sh
./start-worker.sh spark://$(hostname -f):7077
Change permissions if log files are not writable
### Changing permissiongs on individual log files
sudo chmod 777 /home/owldq/owl/pids/owl-agent.pid
sudo chmod 777 /home/owldq/owl/pids/owl-web.pid
Get the hostname of the instance
### Getting the hostname of the instance
hostname -f
Increase thread pool
If the thread pool is exhausted, you may need to increase the thread pool.
Update the owl-env.sh script
# vi owl-env.sh
# modify these lines
export SPRING_DATASOURCE_POOL_MAX_WAIT=500
export SPRING_DATASOURCE_POOL_MAX_SIZE=30
export SPRING_DATASOURCE_POOL_INITIAL_SIZE=5
# restart web and agent
Update the Spark agent configurations
If you see the following message, update the agent configurations within load-spark-env.sh
:
Failed to obtain JDBC Connection; nested exception is org.apache.tomcat.jdbc.pool.PoolExhaustedException: [pool-29-thread-2] Timeout: Pool empty. Unable to fetch a connection in 0 seconds, none available[size:2; busy:1; idle:0; lastwait:200].
Adjust the following configurations to modify the connection pool available:
export SPRING_DATASOURCE_POOL_MAX_WAIT=1000
export SPRING_DATASOURCE_POOL_MAX_SIZE=30
export SPRING_DATASOURCE_POOL_INITIAL_SIZE=5
Note The load-spark-env.sh file is located in the $SPARK_HOME/bin folder.
Update the owl.properties file
Depending on client vs. cluster mode and cluster type, you may also need to add the following configurations in the owl.properties file:
export SPRING_DATASOURCE_TOMCAT_MAXIDLE=10
export SPRING_DATASOURCE_TOMCAT_MAXACTIVE=20
export SPRING_DATASOURCE_TOMCAT_MAXWAIT=10000
export SPRING_DATASOURCE_TOMCAT_INITIALSIZE=4
Jobs stuck in the Staged activity
If DQ Jobs are stuck in the Staged activity on the Jobs page, update the following properties in the owl-env.sh file to adjust the DQ Web component:
export SPRING_DATASOURCE_POOL_MAX_WAIT=2500
export SPRING_DATASOURCE_POOL_MAX_SIZE=1000
export SPRING_DATASOURCE_POOL_INITIAL_SIZE=150
export SPRING_DATASOURCE_TOMCAT_MAXIDLE=100
export SPRING_DATASOURCE_TOMCAT_MAXACTIVE=2000
export SPRING_DATASOURCE_TOMCAT_MAXWAIT=10000
Depending on whether your agent is set to Client or Cluster default deployment mode, you may also need to update the following configurations to the owl.properties
file:
spring.datasource.tomcat.initial-size=5
spring.datasource.tomcat.max-active=30
spring.datasource.tomcat.max-wait=1000
Restart the DQ Web and Agent components.
Active database queries
select * from pg_stat_activity where state='active'
Too many open files error
### "Too many open files error message"
### check and modify that limits.conf file
### Do this on the machine where the agent is running for Spark standalone version
ulimit -Ha
cat /etc/security/limits.conf
### Edit the limits.conf file
sudo vi /etc/security/limits.conf
### Increase the limit for example
### Add these 3 lines
fs.file-max=500000
* soft nofile 58192
* hard nofile 100000
### do not comment out the 3 lines (no '#'s in the 3 lines above)
Redirect Spark scratch
### Redirect Spark scratch to another location
SPARK_LOCAL_DIRS=/mnt/disks/sdb/tmp
Alternatively, you can add the following to the Free form (Appended) field on the Agent Configuration page to change Spark storage: -conf spark.local.dir=/home/owldq/owl/owltmp
Automate cleanup of Spark work folders
You can add the following line of code to owl/spark/conf/spark-env.sh, which must be copied from the spark-env.sh.template file, or to the bottom of owl/spark/bin/load-spark-env.sh.
### Set Spark to delete older files
export SPARK_WORKER_OPTS="${SPARK_WORKER_OPTS} -Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=1800 -Dspark.worker.cleanup.appDataTtl=3600"
Check disk space in the Spark work folder
### Check worker nodes disk space
sudo du -ah | sort -hr | head -5
Delete files in the Spark work folder
### Delete any files in the Spark work directory
sudo find /home/owldq/owl/spark/work/* -mtime +1 -type f -delete
Troubleshooting Kerberos
For debug logging, add the following to the owl-env.sh file:
# For Kerberos debug logging
export EXTRA_JVM_OPTIONS="-Dsun.security.krb5.debug=true"
Reboot the Collibra DQ web service with the following:
./owlmanage.sh stop=owlweb
./owlmanage.sh start=owlweb
Note You can use the option for several purposes such as SSL debugging, setting HTTP/HTTPS proxy settings, setting additional keystore properties, and so on.
Add Spark home environment variables to profile
### Adding ENV variables to bash profile
### Variable 'owldq' below should be updated wherever installed e.g. centos
vi ~/.bash_profile
export SPARK_HOME=/home/owldq/owl/spark
export PATH=$SPARK_HOME/bin:$PATH
### Add to owl-env.sh for standalone install
vi /home/owldq/owl/config/owl-env.sh
export SPARK_HOME=/home/owldq/owl/spark
export PATH=$SPARK_HOME/bin:$PATH
Spark launch scripts
For information on working with Spark launch scripts, go to Spark Launch Scripts.
Check that processes are running
### Checking PIDS for different components
ps -aef|grep postgres
ps -aef|grep owl-web
ps -aef|grep owl-agent
ps -aef|grep spark