Upgrade Spark

Important You must upgrade to Java 17 and Spark 3.5.3 to install and use Collibra Data Quality & Observability 2025.02. For more information about version compatibility, click "Java and Spark compatibility matrix" below.

Java and Spark compatibility matrix
Collibra Data Quality & Observability versionJava 8Java 11Java 17Spark versionsAdditional notes
2025.01 and earlier

Yes

Yes

No

  • 2.3.0 (Java 8 only)
  • 2.4.5 (Java 8 only)
  • 3.0.1 (Java 8 and 11)
  • 3.1.2 (Java 8 and 11)
  • 3.2.2 (Java 8 and 11)
  • 3.4.1 (Java 11 only)
 
2025.02

No

No

Yes

3.5.3 only 
2025.03

No

No

Yes

3.5.3 only 
2025.04

Yes

Yes

Yes

  • 2.3.0 (Java 8 only)
  • 2.4.5 (Java 8 only)
  • 3.0.1 (Java 8 and 11)
  • 3.1.2 (Java 8 and 11)
  • 3.2.2 (Java 8 and 11)
  • 3.4.1 (Java 11 only)
  • 3.5.3 (Java 17 only)

Important 
The Java 8 and 11 build profiles only contain the 2025.02 release and critical bug fixes addressed in 2025.03 and 2025.04. They do not contain any feature enhancements from the 2025.03 or 2025.04 releases.

Only the Java 17 build profile contains feature enhancements and bug fixes listed in the 2025.04 release notes.

2025.05

No

No

Yes

3.5.3 onlyFixes for Java 8 and 11 build profiles will be available only for critical and high-priority defects.
2025.06

No

No

Yes

3.5.3 onlyFixes for Java 8 and 11 build profiles will be available only for critical and high-priority defects.
2025.07

No

No

Yes

3.5.3 onlyFixes for Java 8 and 11 build profiles will be available only for critical and high-priority defects.
2025.08

No

No

Yes

3.5.3 onlyFixes for Java 8 and 11 build profiles will be available only for critical and high-priority defects.

This section details how to upgrade on-premises Apache Spark versions.

Important We recommend upgrading to Spark 3.5.3, to address various critical vulnerabilities present in the Spark core library, including Log4j.

Steps

Note The below steps are intended to reflect examples of what works in a simple Standalone environment. You may need to modify them to accommodate your specific deployment.

  1. Run the following commands to stop Spark and Collibra DQ services.
    1. Set the OWL_HOME variable without a trailing slash after the value.
      Copy
      export OWL_HOME=<the owl folder where CDQ is installed>
    2. Example export OWL_HOME=/home/ec2-user/owl

    3. Stop Spark Master.
    4. Copy
      cd $OWL_HOME/spark/sbin
      ./stop-master.sh
    5. Stop Spark Worker.
    6. Copy
      cd $OWL_HOME/spark/sbin
      ./stop-worker.sh
    7. Stop DQ Web.
    8. Copy
      cd $OWL_HOME/bin
       ./owlmanage.sh stop=owlweb
    9. Stop DQ Agent.
    10. Copy
      cd $OWL_HOME/bin
      ./owlmanage.sh stop=owlagent
    11. Verify that all processes are stopped.
    12. Copy
      ps -ef | grep owl

      ## No DQ processes should return as running. ##

      Tip If any DQ processes are still running, you can use the command kill -9 <pid from the ps command in Step 1f> to stop them manually.

  2. Create a package installation folder one level up from OWL_HOME and set the environment variables accordingly.
    1. Move up one level from OWL_HOME.
      Copy
      cd $OWL_HOME/..
    2. Export the INSTALLER_DIRECTORY variable.
      Copy
      export INSTALLER_DIRECTORY=$OWL_HOME/dq_install
    3. Create the INSTALLER_DIRECTORY folder.
      Copy
      mkdir $INSTALLER_DIRECTORY
  3. Create a backup folder.
    1. Open the INSTALLER_DIRECTORY folder.
      Copy
      cd $INSTALLER_DIRECTORY
    2. Create a backup folder.
      Copy
      mkdir backup
  4. Download the Collibra DQ installer packaged with the necessary Spark version from the Collibra Data Quality & Observability Downloads page in the Product Resource Center.
  5. Run the following command to extract the installer.
  6. Copy
    tar -xvf <downloaded .tar.gz file>
  7. Create backups of the following directories and files by moving or copying them to the backup folder.
    • spark
    • spark-extras (if it exists)
    • drivers
    • dq-core-*-dependencies.jar
    • dq-webapp*.jar
    • dq-agent*.jar
    • owlcheck
    • owlmanage.sh
  8. Remove the folders and files using the following or similar steps.
    1. Open the OWL_HOME/bin folder.
      Copy
      cd $OWL_HOME/bin
    2. Move the dq-core-*-dependencies.jar, dq-webapp*.jar, dq-agent*.jar, owlcheck, and owlmanage.sh files to the INSTALLER_DIRECTORY/backup/ folder.
      Copy
      mv dq*.jar owl* $INSTALLER_DIRECTORY/backup/
    3. Move up one folder level.
      Copy
      cd ..
    4. Move drivers, spark, and spark-extras to the INSTALLER_DIRECTORY/backup/ folder.
      Copy
      mv drivers spark spark-extras $INSTALLER_DIRECTORY/backup/
  9. Confirm that the files and folders are in the backup folder.
    Copy
    ls -ltr  $INSTALLER_DIRECTORY/backup/
  10. Run the following command to go to the packages/install-packages folder.
  11. Copy
    cd $INSTALLER_DIRECTORY/packages/install-packages

    Example You can use the command tar -xvf spark-3.5.3-bin-hadoop3.tgz to extract Spark 3.5.3.

  12. Extract the Spark installation files.
    Copy
    tar -xvf spark-3.5.3-bin-hadoop3.tgz
  13. Extract the spark-extras.tar.gz file.
  14. Copy
    tar -xvf spark-extras.tar.gz
  15. Rename and move the spark-3.5.3-bin-hadoop3 file from the installer extract folder to the OWL_HOME/spark folder.
    1. Rename the spark-3.5.3-bin-hadoop3 file spark.
      Copy
      mv spark-3.5.3-bin-hadoop3 spark
    2. Move spark to the home directory.
      Copy
      mv spark $OWL_HOME
  16. Run the following command to copy the spark-extras folder into spark/jars.
    Copy
    cp spark-extras/* $OWL_HOME/spark/jars
  17. Add the following line of code to owl/spark/conf/spark-env.sh (which can be copied from the spark-env.sh.template file provided), to automate the cleanup of Spark work folders and avoid filling up disk space.
    Copy
    export SPARK_WORKER_OPTS="${SPARK_WORKER_OPTS} -Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=1800 -Dspark.worker.cleanup.appDataTtl=3600"
  18. Copy in the Collibra Data Quality & Observability application files.
    1. Open the INSTALLER_DIRECTORY.
    2. Copy
      cd $INSTALLER_DIRECTORY
    3. Copy in the dq-core-*-dependencies.jar, dq-webapp*.jar, and dq-agent*.jar files.
    4. Copy
      cp dq-*.jar $OWL_HOME/bin
    5. Copy in the owlcheck and owlmanage.sh files.
    6. Copy
      cp owl* $OWL_HOME/bin
    7. Make the owlcheck and owlmanage.sh files executable.
    8. Copy
      chmod +x $OWL_HOME/bin/owl*
  19. Extract and copy in the updated drivers.
    1. Create a drivers folder.
      Copy
      mkdir drivers
    2. Open the drivers folder.
      Copy
      cd drivers
    3. Extract drivers.tar.gz into the drivers folder.
      Copy
      tar -xvf ../drivers.tar.gz
    4. Move up one folder level.
      Copy
      cd ..
    5. Move the drivers/ folder to the application home directory.
      Copy
      mv drivers/ $OWL_HOME
  20. Run the following commands to start Spark and Collibra DQ services.
    1. Start Spark Master.
    2. Copy
      cd $OWL_HOME/spark/sbin
      ./start-master.sh
    3. Start Spark Worker.
    4. Copy
      cd $OWL_HOME/spark/sbin
      ./start-worker.sh spark://$(hostname -f):7077
    5. Start DQ Web.
    6. Copy
      cd $OWL_HOME/bin
      ./owlmanage.sh start=owlweb
    7. Tail the log file until you see a "Bootstrap process complete" message.
      Copy
      tail -f $OWL_HOME/log/owl-web.log
    8. Example screenshot of log file
    9. Start DQ Agent.
    10. Copy
      cd $OWL_HOME/bin
      ./owlmanage.sh start=owlagent
  21. Visit the Spark page on port 8080 to confirm that the Spark version is the one you installed and the worker is alive.
  22. Create a test DQ job without any optional DQ layers or rules to verify that the Spark driver, executors, and containers are able to launch successfully.