Additional Spark Standalone configuration options
This section describes additional configuration options in setup.sh, owl-env.sh, and owl.properties.
setup.sh, go to Data Quality & Observability Classic Directory Structure.
Setting additional options in setup.sh
The following table describes the arguments supported by setup.sh:
| Argument | Description |
|---|---|
-non-interactive
|
Skips asking to accept Java license agreement. |
-skipSpark
|
Skips the extraction of Spark components. |
-stop
|
Do not automatically start all components (Owl-Web, Zeppelin, Postgres). |
-port=
|
Set DQ Web application to use the defined port. |
-user=
|
Optional parameter to set the user to run Collibra DQ. The default is the current user. |
-owlbase=
|
Sets the base path to where you want Collibra DQ installed. |
-owlpackage=
|
Optional parameter to set the Collibra DQ package directory. The default is the current working directory. |
-help
|
Display this help and exit. |
-options=
|
The different Collibra DQ components to install in a comma-separated list format. For example, -options=owlagent,owlweb,postgres,spark |
-pgpassword=
|
The password used to set for the PostgreSQL metastore. For unattended installs. |
-pgserver=
|
The name of the PostgreSQL server. For example, -pgserver=owl-postgres-host.example.com:5432/owldb. For unattended installs. |
Example setup.sh commands
A common example of the setup.sh command is:
./setup.sh -port=9000 -user=ec2-user -owlbase=/home/ec2-user -owlpackage=/home/ec2-user/packages
- The tar ball extracted to this folder on my EC2 Instance: ****
/home/ec2-user/packages/ - Collibra DQ is running as the ****
ec2-user - The DQ Web application runs on port
9000 - The base location for the setup.sh script to create the will be:
/home/ec2-user/
This example installs only the DQ Agent:
./setup.sh -user=ec2-user -owlbase=/home/ec2-user -owlpackage=/home/ec2-user/package -options=owlagent
- The package extracted to this folder on my EC2 Instance: ****
/home/ec2-user/packages/ - Owl-agent is running as the ****
ec2-user - The base location for the setup.sh script to create the Collibra DQ folder and place all packages under Collibra DQ is:
/home/ec2-user/
Configuring environment settings with owl-env.sh
The owl-env.sh script holds the main variables that are reused across components. It is executed from the /owl/config directory.
| owl-env.sh scripts | Description |
|---|---|
export SPARK_CONF_DIR="/home/collibra/owl/cdh-spark-conf"
|
The directory on your machine where the Spark conf directory resides. |
export INSTALL_PATH="/home/collibra/owl"
|
The installation directory of Collibra DQ. |
export JAVA_HOME="/home/collibra/jdk1.8.0_131"
|
Java Home for Collibra DQ to leverage. |
export LOG_PATH="/home/collibra/owl/log"
|
The log path. |
export BASE_PATH="/home/collibra"
|
The base location under which the Collibra DQ directory resides. |
export SPARK_MAJOR_VERSION=2
|
Spark Major version. Collibra DQ only supports 2+ version of Spark. |
export OWL_LIBS="/home/collibra/owl/libs"
|
Lib Directory to inject in spark-submit jobs. |
export USE_LIBS=0
|
Use the lib directory or not. 0 is the default. A value of 1 means the lib directory is used. |
export SPARKSUBMITEXE="spark-submit"
|
Spark submit executable command. Collibra DQ using spark-submit as an example. |
export ext_pass_manage=0
|
If using a password management system. You can enable for password to be pulled from it. A value of 0 disables an external password management system. A value of 1 enables an external password management system. |
export ext_pass_script="/opt/owl/bin/getpassword.sh"
|
Leverage password script to execute a get password script from the vault. |
TIMEOUT=900 #15 minutes in seconds
|
Owl-Web user time out limits. |
export LOCAL_REGISTRATION_ENABLED=false
|
Disables the Registration link on the login page, thereby preventing the creation of new user accounts. |
PORT=9003 #owl-web port NUMBER
|
The default port to use for owl-web. |
export SPRING_LOG_LEVEL=ERROR
|
The logging level to be displayed in the owl-web.log |
export SPRING_DATASOURCE_DRIVER_CLASS_NAME=org.postgresql.Driver
|
The driver class name for postgres metastore (used by web). |
export SPRING_DATASOURCE_URL=jdbc:postgresql://localhost:5432/postgres
|
JDBC connection string to Collibra DQPostgreSQL metastore. |
export SPRING_DATASOURCE_USERNAME=collibra
|
Collibra DQPostgreSQL username. |
export SPRING_DATASOURCE_PASSWORD=3+017wfY1l1vmsvGYAyUcw5zGL
|
Collibra DQPostgreSQL password. |
export AUTOCLEAN=TRUE/FALSE
|
TRUE/FALSE Enable/Disable automatically delete old datasets. |
export DATASETS_PER_ROW=200000
|
Delete datasets after this threshold is hit (must be greater than the default to change). |
export ROW_COUNT_THRESHOLD=300000
|
Delete rows after this threshold is hit (must be greater than the default to change). |
export SERVER_HTTP_ENABLED=true
|
Enabling HTTP to owl web |
export OWL_ENC=OFF #JAVA for java encryption
|
Enable Encryption (NOTE need to add to owl.properties also). Has to be in form owl.enc=OFF within owl.properties file to disable, and in this form owl.enc=JAVA to enable. the owl.properties file is located in the owl install path /config folder (/opt/owl/config). |
PGDBPATH=/home/collibra/owl/owl-postgres/bin/data
|
Path for PostgreSQL DB |
export RUNS_THRESHOLD=5000
|
Delete runs after this threshold is hit (must be greater than the default to change). |
export HTTP_SECONDARY_PORT=9001
|
Secondary HTTP port to use which is useful when SSL is enabled. |
export SERVER_PORT=9000
|
Same as PORT. |
export SERVER_HTTPS_ENABLED=true
|
Enabling of SSL. |
export SERVER_SSL_KEY_TYPE=PKCS12
|
Certificate trust store. |
export SERVER_SSL_KEY_PASS=t2lMFWEHsQha3QaWnNaR8ALaFPH15Mh9
|
Certificate key password. |
export SERVER_SSL_KEY_ALIAS=owl
|
Certificate key alias. |
export SERVER_REQUIRE_SSL=true
|
Override HTTP on and force HTTPS regardless of HTTP settings. |
export MULTITENANTMODE=FALSE
|
Flipping to TRUE will enable multi tenant support. |
export multiTenantSchemaHub=owlhub
|
Schema name used for multi tenancy. |
export OWL_SPARKLOG_ENABLE=false
|
Enabling deeper spark logs when toggled to true. |
export LDAP_GROUP_RESULT_DN_ATTRIBUTE
|
The attribute to the full path of the group object, for example, CN=OwlAppAdmin,OU=OwlGroups,OU=Groups,DC=owl, DC=com. Default is distinguishedname. |
export LDAP_GROUP_RESULT_NAME_ATTRIBUTE
|
The attribute to the simple name of the group, for example, OwlAppAdmin. Default is CN. |
export LDAP_GROUP_RESULT_CONTAINER_BASE
|
Property used in the scenario where the LDAP_GROUP_RESULT_DN_ATTRIBUTE does not return a value. In this case, the LDAP_GROUP_RESULT_NAME_ATTRIBUTE prepends to this value, which creates a fully qualified LDAP path. For example, OU=OwlGroups,OU=Groups,DC=owl,DC=com. Default is <null>. |
export ALLOWED_LOCAL_PATHS='*'
|
Enables Collibra DQ to use the specified local paths. Wildcard ('*') is allowed. Separate multiple paths using a comma. |
export SERVER_FORWARD_HEADERS_STRATEGY=FRAMEWORK
|
Resolves an issue related to how Spring Boot handles forwarded headers that can cause an error to occur during login. |
Configuring properties with owl.properties
The owl.properties file enables you to set the following properties. It is located in the /owl/config directory:
| Example | Description |
|---|---|
key=XXXXXX
|
The license key. |
spring.datasource.url=jdbc:postgresql://localhost:5432/postgres
|
The connection string to the Collibra DQ metastore (used by owl-core). |
spring.datasource.password=xxxxxx
|
The password to the Collibra DQ metastore (used by owl-core). |
spring.datasource.username=xxxxxx
|
The username to the Collibra DQ metastore (used by owl-core). |
spring.datasource.driver-class-name=com.owl.org.postgresql.Driver
|
Shaded PostgreSQL driver class name. |
spring.agent.datasource.url
|
jdbc:postgresql://$host:$port/owltrunk |
spring.agent.datasource.username
|
{user} |
spring.agent.datasource.passwords
|
{password} |
spring.agent.datasource.driver-class-name
|
org.postgresql.Driver |
Validation Patterns
Collibra DQ supports pattern validation for some commonly used IDs and names. Go to Pattern validation for commonly used names.
IPv6 Compliance
This section provides additional steps for deploying Collibra DQ Standalone in an IPv6-compliant environment.
Steps
-
Add the following properties to the spark-env.sh file and restart the Spark driver and worker services.
-
Set the following extra JVM option to the owl-env.sh properties and restart the DQ Web and Agent services
export SPARK_PUBLIC_DNS=:::7077
export SPARK_DAEMON_JAVA_OPTS="-Djava.net.preferIPv6Addresses=true"
export SPARK_LAUNCHER_OPTS="-Djava.net.preferIPv6Addresses=true"
# Append -Djava.net.preferIPv6Addresses=true to the EXTRA_JVM_OPTIONS of the owl-env.sh
export EXTRA_JVM_OPTIONS="-Djava.net.preferIPv6Addresses=true --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.net=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.util.calendar=ALL-UNNAMED"
# Append -Djava.net.preferIPv6Addresses=true to the AGENT_EXTRA_JVM_OPTIONS of the owl-env.sh
export AGENT_EXTRA_JVM_OPTIONS="-Djava.net.preferIPv6Addresses=true"