Upgrade Spark
This section details how to upgrade on-premises Apache Spark versions.
Important We recommend upgrading to Spark 3.5.3, to address various critical vulnerabilities present in the Spark core library, including Log4j.
Steps
Note The below steps are intended to reflect examples of what works in a simple Standalone environment. You may need to modify them to accommodate your specific deployment.
- Run the following commands to stop Spark and Collibra DQ services.
- Set the OWL_HOME variable without a trailing slash after the value.Copy
export OWL_HOME=<the owl folder where CDQ is installed>
- Stop Spark Master.
- Stop Spark Worker.
- Stop DQ Web.
- Stop DQ Agent.
- Verify that all processes are stopped.
Example
export OWL_HOME=/home/ec2-user/owl
Copycd $OWL_HOME/spark/sbin
./stop-master.shCopycd $OWL_HOME/spark/sbin
./stop-worker.shCopycd $OWL_HOME/bin
./owlmanage.sh stop=owlwebCopycd $OWL_HOME/bin
./owlmanage.sh stop=owlagentCopyps -ef | grep owl
## No DQ processes should return as running. ##Tip If any DQ processes are still running, you can use the command
kill -9 <pid from the ps command in Step 1f>
to stop them manually. - Set the OWL_HOME variable without a trailing slash after the value.
- Create a package installation folder one level up from OWL_HOME and set the environment variables accordingly.
- Create a backup folder.
- Download the Collibra DQ installer packaged with the necessary Spark version from the Collibra Data Quality & Observability Downloads page in the Product Resource Center.
- Run the following command to extract the installer.
- Create backups of the following directories and files by moving or copying them to the backup folder.
- spark
- spark-extras (if it exists)
- drivers
- dq-core-*-dependencies.jar
- dq-webapp*.jar
- dq-agent*.jar
- owlcheck
- owlmanage.sh
- Remove the folders and files using the following or similar steps.
- Open the OWL_HOME/bin folder.Copy
cd $OWL_HOME/bin
- Move the dq-core-*-dependencies.jar, dq-webapp*.jar, dq-agent*.jar, owlcheck, and owlmanage.sh files to the INSTALLER_DIRECTORY/backup/ folder.Copy
mv dq*.jar owl* $INSTALLER_DIRECTORY/backup/
- Move up one folder level.Copy
cd ..
- Move drivers, spark, and spark-extras to the INSTALLER_DIRECTORY/backup/ folder.Copy
mv drivers spark spark-extras $INSTALLER_DIRECTORY/backup/
- Open the OWL_HOME/bin folder.
- Confirm that the files and folders are in the backup folder.Copy
ls -ltr $INSTALLER_DIRECTORY/backup/
- Run the following command to go to the packages/install-packages folder.
- Extract the Spark installation files.Copy
tar -xvf spark-3.5.3-bin-hadoop3.tgz
- Extract the spark-extras.tar.gz file.
- Rename and move the spark-3.5.3-bin-hadoop3 file from the installer extract folder to the OWL_HOME/spark folder.
- Run the following command to copy the spark-extras folder into spark/jars.
Copy
cp spark-extras/* $OWL_HOME/spark/jars
- Add the following line of code to owl/spark/conf/spark-env.sh (which can be copied from the spark-env.sh.template file provided), to automate the cleanup of Spark work folders and avoid filling up disk space. Copy
export SPARK_WORKER_OPTS="${SPARK_WORKER_OPTS} -Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=1800 -Dspark.worker.cleanup.appDataTtl=3600"
- Copy in the Collibra Data Quality & Observability application files.
- Open the INSTALLER_DIRECTORY.
- Copy in the dq-core-*-dependencies.jar, dq-webapp*.jar, and dq-agent*.jar files.
- Copy in the owlcheck and owlmanage.sh files.
- Make the owlcheck and owlmanage.sh files executable.
Copycd $INSTALLER_DIRECTORY
Copycp dq-*.jar $OWL_HOME/bin
Copycp owl* $OWL_HOME/bin
Copychmod +x $OWL_HOME/bin/owl*
- Extract and copy in the updated drivers.
- Run the following commands to start Spark and Collibra DQ services.
- Start Spark Master.
- Start Spark Worker.
- Start DQ Web.
- Tail the log file until you see a "Bootstrap process complete" message.Copy
tail -f $OWL_HOME/log/owl-web.log
- Start DQ Agent.
Copycd $OWL_HOME/spark/sbin
./start-master.shCopycd $OWL_HOME/spark/sbin
./start-worker.sh spark://$(hostname -f):7077Copycd $OWL_HOME/bin
./owlmanage.sh start=owlwebExampleCopycd $OWL_HOME/bin
./owlmanage.sh start=owlagent - Visit the Spark page on port 8080 to confirm that the Spark version is the one you installed and the worker is alive.
- Create a test DQ job without any optional DQ layers or rules to verify that the Spark driver, executors, and containers are able to launch successfully.
tar -xvf <downloaded .tar.gz file>
cd $INSTALLER_DIRECTORY/packages/install-packages
Example You can use the command tar -xvf spark-3.5.3-bin-hadoop3.tgz
to extract Spark 3.5.3.
tar -xvf spark-extras.tar.gz