Best Practices

Multi Tenant Names

Note Tenant names should be lower case only.

Understanding Collibra DQ activities and what the key/date columns mean for each

  • Starting with profile and expanding to rules and then other advanced capabilities.
  • Training with DQ-team Zoom/Onsite support.
  • Running with sample data.
  • Introducing anomalies on sample data and running an owlcheck to see the anomalies.

Using the tool with practical scenarios

  • Having Well Defined Use Cases
    • Determine a single table (dataset) that you would like to scan.
    • Have an expectation of what you would expect DQ to find in this dataset.
    • Understand which activities would capture the expected findings.
  • Target internal datasets with known data issues.
  • Historical Comparisons:
    • If pre-cleaned data is available with data findings that have been cleaned via legacy methods such as internal rules, run these datasets and compare the results from DQ to Internal findings.
  • Work with data owners to understand findings or review expected findings.

Explorer

  • The date selected with the calendar widget in the Scope (home) tab should align with the calendar widget assigned on the final (Save/Run) tab.
  • If you elect to Unlock the cmd line and override the final parameters, do not re-lock or the changes will be overwritten. In general, only advanced users should override the guided settings.
  • Pushdown and parallel JDBC cannot be used together. If you are using pushdown, do not select the parallel JDBC option.

Files

  • File paths should not contain spaces or special characters.
  • Backrun (replay) and advanced features are best suited for JDBC connections. Some features are unavailable if file and storage naming conventions do not consistently contain a date signature.

Connection Pool

If you see this message, update the agent configs in owl-env.sh or agent confg map for k8 deployments.

Copy
Failed to obtain JDBC Connection; nested exception is org.apache.tomcat.jdbc.pool.PoolExhaustedException: [pool-29-thread-2] Timeout: Pool empty. Unable to fetch a connection in 0 seconds, none available[size:2; busy:1; idle:0; lastwait:200].

Adjust these configs to modify the connection pool available.

Copy
export SPRING_DATASOURCE_POOL_MAX_WAIT=500
export SPRING_DATASOURCE_POOL_MAX_SIZE=10
export SPRING_DATASOURCE_POOL_INITIAL_SIZE=5

Freeform Agent Configs

When configuring the DQ Agent and using the Free Form Parameters at the bottom of the dialogue, you need to comma separate multiple -conf key/value pairs. Use this format: "-conf some.key=x, some.other.key=y".

K8 Secrets

The following Env Vars are now managed as a Secret instead of as a Configmap:

LICENSE_KEY LIVY_SSL_KEY_PASS SERVER_SSL_KEY_PASS SPRING_AGENT_DATASOURCE_PASSWORD SPRING_AGENT_DATASOURCE_USERNAME SPRING_DATASOURCE_PASSWORD SPRING_DATASOURCE_USERNAME