DQ job settings

You can use the Settings modal to fine-tune your DQ job.

Using the Settings modal

Click Settings below the Run button to open the Settings modal.

Option Description
Profile
Profile String Length

Ensures string-type data fits within the predefined schemas of its target data sources.

Data Analysis
Relationship Analysis

Relationship analysis lets you discover relationships in your data and measures the strength of those relationships with a correlation matrix.

Relationship Analysis is set to Auto by default.

Histogram Analysis

Segments data from your DQ Job with histograms.

Histogram Analysis is set to Auto by default.

AdaptiveRules
Data Lookback

The number of past DQ Job runs for your learning model to analyze.

The default value is 10.

Learning Phase

The minimum number of DQ Job runs required for behavioral scoring begins to calculate.

The default value is 4.

Parallel JDBC
Override Partition Column

Splits your selected column evenly for parallel JDBC Spark load execution.

Select the checkbox option, then select a column from the dropdown menu.

The default option when Override Partition Column is selected is OWLAUTOJDBC.

No. of Partitions

A partition is an even split of the total number of rows in your record. For large DQ jobs, increasing the number of partitions can improve performance and increase processing efficiency.

Drag the slider or enter a value between 2 and 20 in the input field.

Example If the row count of your table is 10 million, set the number of partitions to 10 to divide the record evenly into 10 partitioned blocks of 1 million rows. The job then executes the 10 blocks concurrently in parallel.

Data Quality Job
Metastore Host
The PostgreSQL metastore connection URL that determines which Metastore to use to register and record the results of your job.
Logging

Indicates the log level.

Select an option from the dropdown menu.

The default is Info.

Additional Lib
A directory path to include any additional drivers or jars in the classpath.
Union LookBack Min. Row

Indicates where to create the historical context of a scan with union lookback configured based on the number of rows in the preceding scans.

If you are using union lookback, enter a value based on the number of recorded rows from previous scans.

Example To exclude scans that recorded less than 10 rows from the historical load context, enter a value of 10.

Archive Breaking Records

The external storage container to which rule break records export in CSV format.

Select the checkbox option, then select an archive location from the dropdown menu.

Important When archive breaking records is turned on, rule break records no longer write to the PostgreSQL metastore.

Note For more information, see the Archive Breaking Records section.

Check header
Excludes schema findings from the results of a DQ job. This is for when your schema contains special characters in its column names.
Core Fetch Mode
Overrides the -q in the command line by adding -corefetchmode to the command line, which allows the core to fetch the query from the load options table.
Option Description
Profile
Profiling

Creates a baseline sketch of your table or file over time.

Profiling is on by default.

Advanced Profile
Determines whether a string field contains various string numerics, calculates TopN, BottomN, and TopN Shapes, and detects the scale and precision of double fields.
Data Analysis
Relationship Analysis

Lets you discover relationships in your data and measures the strength of those relationships with a correlation matrix.

Relationship Analysis is set to Auto by default.

Histogram Analysis

Segments data from your DQ job with histograms.

Histogram Analysis is set to Auto by default.

AdaptiveRules
Data Lookback

The number of past DQ job runs for your learning model to analyze.

The default value is 10.

Learning Phase

The minimum number of DQ job runs required for behavioral scoring begins to calculate.

The default value is 4.

Archive Break Records
Data Preview from Source

Prevents data preview records from storing on the PostgreSQL Metastore. When you select this option:

  • All data preview records are removed from the PostgreSQL Metastore and everything else remains in your data source.
  • When you view data preview records in the web application, the state of the records reflects how they currently appear in your data source.

This option strengthens security by completely removing sensitive data from the PostgreSQL Metastore.

Archive Dupes Break Records

Allows the storage of dupe break records to the source system instead of the PostgreSQL Metastore.

Archive Outliers Break Records
Allows the storage of outlier break records to the source system instead of the PostgreSQL Metastore.
Archive Rules Break Records
Allows the storage of rule break records to the source system instead of the PostgreSQL Metastore.
Archive Shapes Break Records
Allows the storage of shapes break records to the source system instead of the PostgreSQL Metastore.
Source Output Schema

An alternative destination schema to create tables for break records storage instead of the schema provided in the connection. This can be either the database.schema or the schema and requires write access to the source output schema location.

Logging
SQL Logging

Switches logging for all SQL queries on and off.

SQL logging for all jobs is off by default.

Pushdown
No. of Connections

The maximum number of connections to the data source to run your DQ job. Using multiple connections lets your DQ job execute queries in parallel.

The default value is 10.

No. of Threads

The maximum number of threads a DQ layer can run in parallel. Use this parameter to divide the number of open connections between DQ layers.

The default value is 2.

Run Date
Date Format

The run date format (${rd} or ${rdEnd}) substituted on the command line at runtime. The option you select should match the date or datetime format of the timeslice column that you specified in the Time Slice option of the Select rows step.