Job Command Line Parameters

This topic provides information on using command line parameters to run jobs. It includes information specific to managing Spark resources, as well as a reference guide describing the commands available in the job run command.

Note You can display the run command used to run the current job on the Job tab of the Findings page.

Managing Spark resources from the command line

Scale linearly with your data

Scale linearly with your data by adding executors and/or memory. For example:

Copy

-f "file:///Users/home/salary_data.csv" \
-d "," \
-rd "2018-01-08" \
-ds "salary_data"
-numexecutors 2 \
-executormemory 2g

Yarn Master

If Collibra DQ is run on an edge node on a popular hadoop distribution such as HDP, CDH, EMR it will automatically register the jobs with Yarn Resource Manager.

Spark Master

Collibra DQ also runs using spark master by using the -master command and passing in spark:url.

Spark Standalone

Collibra DQ typically runs in standalone, but it will not by default distribute the processing beyond the hardware it was activated on.

Options	Description
deploymode	The Spark deploymode option. For example, `cluster`.
drivermemory	The driver memory of your local Spark instance in gigabytes.
executorcores	Spark executor cores.
executormemory	The total Spark executor memory in gigabytes, for example, `3G`.
master	Overrides `local[*]`
sparkprinc	Kerberos principal name, example: [email protected].

Use Spark-Submit directly bypassing DQCheck

Copy

spark-submit \
--driver-class-path /opt/owl/drivers/postgres42/postgresql-42.2.4.jar \
--driver-library-path /opt/owl/drivers/postgres42/postgresql-42.2.4.jar \
--driver-memory 3g --num-executors 2 --executor-memory 1g \
--master spark://Kirks-MBP.home:7077 \
--class com.owl.core.cli.OwlCheck /opt/owl/bin/owl-core-trunk-jar-with-dependencies.jar \
-u user -p pass -c jdbc:postgresql://xyz.chzid9w0hpyi.us-east-1.rds.amazonaws.com/postgres \
-ds accounts -rd 2019-05-05 -dssafeoff -q "select * from accounts"
-driver org.postgresql.Driver -lib /opt/owl/drivers/postgres42/

Parallel JDBC Spark-Submit

Copy

spark-submit \
--driver-class-path /opt/owl/drivers/postgres42/postgresql-42.2.4.jar \
--driver-library-path /opt/owl/drivers/postgres42/postgresql-42.2.4.jar \
--conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///opt/owl/config/log4j-TRACE.properties \
--conf spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///opt/owl/config/log4j-TRACE.properties \
--files /opt/owl/config/log4j-TRACE.properties \
--driver-memory 2g --num-executors 2 --executor-memory 1g --master spark://Kirks-MBP.home:7077  \
--class com.owl.core.cli.OwlCheck /opt/owl/bin/owl-core-trunk-jar-with-dependencies.jar \
-u us -p pass -c jdbc:postgresql://xyz.chzid9w0hpyi.us-east-1.rds.amazonaws.com/postgres \
-ds aumdt -rd 2019-05-05 -dssafeoff -q "select * from aum_dt" \
-driver org.postgresql.Driver -lib /opt/owl/drivers/postgres42/  \
-connectionprops fetchsize=6000 -master spark://Kirks-MBP.home:7077 \
-corroff -histoff -statsoff \
-columnname updt_ts -numpartitions 4 -lowerbound 1557597987353 -upperbound 1557597999947

Command Line Reference

The following table describes the parameters available for a job run command.

Parameter Description

adddc Add Date Column as Run Id that is used. To see an example, go to Add Date Column.

addlib Additional library directory to be added to the classpath for the DQ Job (spark-submit)

agentjobid Internal use only

agg Grouping function for flexibility

aggq

select * from dataset where time_bin = '2018-12-10 10', example: for aggregate override.

alertemail Automatically add an alert with score greater than 75, to the email value supplied

alias Dataset name alias, example: userTable or users or user_file

archivecxn Connection name to archive break records

avro avro file data flag

avroschema avro schema file

bd Column count if you want to group by a particular set of values for behavioral statistics

bdcol Behavioral function if you want to aggregate for behavioral statistics

bdfunc Behavioral function if you want to aggregate for behavioral statistics

bdgrp Behavioral group to dynamically collect stats if you want to group by a particular set of values for behavioral statistics

behaviorscoreoff Turn off behavior scoring

bhemptyoff Behavior empty check detection off

bhlb The behavior lookback period where the entered value represents the number of days. For example, a value of 12 looks back 12 days of data. For more information, go to Lookback.

bhmaxoff Behavior max value detection off

bhmaxon Behavior max value detection on

bhmeanoff Behavior mean value detection off

bhmeanon Behavior mean value detection on

bhminoff Behavior min value detection off

bhminon Behavior min value detection on

bhminsupport Behavior min support, set to 4 by default, min number of days to learn from, learning phase

bhnulloff Behavior null check detection off

bhrowoff Behavior row count detection off

bhsensitivity Behavior sensitivity: NARROW, NULL, WIDE

bhtimeoff Behavior load time detection off

bhtimeon Behavior load time detection on

bhuniqueoff Behavior unique detection off

bhuniqueon Behavior unique detection on

br Number of back runs to fill training history, should be an integer value. To see an example, go to Advanced and DQ Job Back Run.

brbin Time bin for back runs, example: -brbin DAY

bt Back-tick character (`) to escape SQL queries when returning to database

by Compare by DAY, HOUR, or MIN.

c jdbc://hive:3306 (connection URL)

cacheoff Turn caching off. Caching is on by default. It can be turned off if the dataset is too large or cache optimization is not desired.

cardoff Turn off profiling section of owlcheck

categoricallimit Limit for categorical outliers stored

categoricallimitui Limit for categorical outliers displayed

categoricalscore Score for each categorical outlier

catoff Disables categorical outlier detection.

caton Turn on categorical outliers

catOutAlgo Specify ML algorithm for categorical outliers. Default: "" (no ML)

catOUtAlgoParams Optional params for catOutAlgo to override Owl-suggested params. Default: "". E.g. "k=5,initSteps=5"

catOutBottomN Max number of categorical outliers in a column

catOutConfType Method to use to calculate likelihood of category level

catOutMaxCategoryN Maximum number of categories within key that will trigger homogenous past categorical outlier case

catOutMaxConf Confidence upper bound to qualify as an outlier

catOutMaxFreqPercentile Frequency percentile upper bound to qualify as an outlier

catOutMinFreq Minimum frequency needed to be considered an outlier. Raise to make less sensitive

catOutMinVariance Minimum frequency count variance (within key) required to be considered an outlier. set to negative to be more sensitive

catOutParallelOff Turn off parallel column-wise processing of categorical outliers

catOutTopN Number of top frequently appearing level in a column to include in preview

columnname Column name to split on for spark JDBC

concat No arguments. Concatenate option for categorical oultiers columns

conf The Spark configuration option. For example, spark.kubernetes.memoryOverheadFactor=0.4,spark.kubernetes.executor.podTemplateFile=local:///opt/owl/config/k8s-executor-template.yml,spark.kubernetes.driver.podTemplateFile=local:///opt/owl/config/k8s-driver-template.yml

connectionprops key=value,hive.resultset.use.unique.column.names=false To see an example, go to DQ Job JDBC.

connectionpropssrc key=value,hive.resultset.use.unique.column.names=false

corefetchmode Let core go fetch query from meta store in stead using the one passed in command line

corroff Dataset correlation flag force off

corron Dataset correlation flag force on

cxn The name of the saved database or file connection from which your dataset originates.

d Delimiter, ','

dataconceptid Identifier of the group of semantic rules by datatype

datashapeexc Exclude a column from data shapes discovery

datashapegranular Check length for alphanumeric fields, and independent check for numbers and letters

datashapeinc Include a column that has been excluded from data shapes discovery

datashapelimit Limit for datashapes stored

datashapelimitui Limit for datashapes displayed

datashapemaxcolsize Maximum length of a string column before it is disqualified from shapes detection

datashapemaxpercol Maximum number of shapes per column before column is ignored during shapes processing

datashapeoff Turn DataShape Activity Off

datashapescore Score for each datashape

datashapesense Maximum occurrence rate (%) to be considered a shape

dateoff Turn date detection off. In some cases date detection is a costly operation with little value

dblb DB lookback to check owl check history for previous histories

dc The date column for outlier detection.

delta Delta file data flag

deploymode The Spark deploymode option. For example, cluster. For more information, see Deploy Mode.

depth The depth of duplicate discovery between 1-3, increasing runtime non-linearly. The default value is 1.

df Date Format, example: yyyy-MM-dd

diff Percentage difference between two days to do a reference for keys missing

divisor Divisor for unix timestamp. s for seconds or ms for milliseconds. Default is ms.

dl Deep learning. This enables the outliers activity.

dlcombine When numerical outlier appears more than once, combine them as single outlier

dlcombineoff When numerical outlier appears more than once, do not combine them as single outlier

dlexc Deep learning col exclusion, example: open,close,high,volume

dlinc The column limit for deep learning. This can be a comma delimited list of columns to include in your job. For example, if you want to include columns called account_id, date, and frequency, the correct syntax would be account_id,date,frequency.

dlkey The natural key for deep learning. This is the column in your dataset that you set as the key column.

dllb The deep learning lookback period where the entered value represents the number of days included in the outlier activity lookback.

dlminhist An automatically generated flag based on the outlier lookback setting -dllb that defines the minimum number of days before DQ flags data as potential outliers. -dlminhistensures that the number of days in the algorithm is relative to the total scope of the lookback period. Default dllb - 2

dlmulti Pass multiple dlkey=dlinc key value pairs. Split by pipe for multiple

dn Driver name org.apache.jdbc.PhoenixDriver

dpoff Do not store data preview records

dprev Data preview turned off, same as onReadOnly

dq Double-quote character (") to escape SQL queries when returning to database

driver The driver class name of a custom driver. To see an example, go to DQ Job Hive.

drivermemory The driver memory of your local Spark instance in gigabytes.

ds The name of the dataset.

dssafeoff Best practice naming convention flag, provides a globally unique and meaningful natural key to all datasets

dupe Enables the dupe activity.

dupeapprox Approximate groupings default value =1 [0-3]

dupecutoff

The duplicate score lower boundary for non-exact matching percentage. For example, if you set the dupecutoff value to 40, then the lowest percentage of a potential duplicate match would be 40%. This can be used in conjunction with -dupepermatchupperlimit to specify a range of matches.

Note If Exact Match is enabled, this value cannot be set.

dupeexc Duplicate record detection, column exclusion list

dupeinc The column limit for duplicate record detection. This can be a comma delimited list of columns to include in your job. For example, if you want to include columns called account_id, date, and frequency, the correct syntax would be account_id,date,frequency.

dupelb Duplicate lower bounds on percent match, default [85]

dupelimit Limit for dupe rows stored

dupelimitui Limit for dupe rows displayed

dupenocase Duplicate record case sensitivity off

dupeonly Only run duplicate section

dupepermatchupperlimit The duplicate score upper boundary for non-exact matching percentage, set to 100 by default.

dupescore Score for each duplicate record

dupesperdupe Max dupes to calculate per duplicate match

dupetruecase Enables case sensitivity.

dupeub Duplicate upper bounds on percent match, default [100]

ec Add custom escape character to escape SQL queries when returning to database

encoding Load file charset encoding other than UTF-8

erlq Explicit k,v string of rule_name and rule sql for secondary datasets

executorcores Spark executor cores

executormemory The total Spark executor memory in gigabytes, for example, 3G.

f File path for load, /dir/filename.csv

files Pass additional spark files for distribution on cluster

filter Only use rows containing this value. For more information, see Filter & Filter Not.

filtergram filtergram

filternot Exclude rows containing this value. For more information, see Filter & Filter Not.

flatten Option to flatten json and explode arrays

fllb File Lookback to check owl check history for previous files. For more information, go to Lookback.

fllbminrow Minimum number of rows (inclusive) that owl check history needs to be considered for File Lookback. Default 0 (which includes all owlchecks). For more information, go to Lookback.

fpgbucketlimit Limit bucket size for Pattern algorithm, example: -fpgbucketlimit 20000

fpgconfidence Minimum occurrence rate at which an association rule has to be found to be true

fpgdc The column in your dataset that you set as the date column.

fpgdupeoff Pattern mining do not remove dupe cols, helps performance impacts quality

fpgexc Pattern mining is expensive use this input to limit the observed cols

fpginc

The column limit for pattern mining. This can be a comma delimited list of columns to include in your job. For example, if you want to include columns called account_id, date, and frequency, the correct syntax would be account_id,date,frequency

Because pattern mining is expensive, limiting the number of columns in your query can be an effective way to control costs.

fpgkey The natural key for pattern mining. This is the column in your dataset that you set as the key column.

fpglb The lookback period where the entered value represents the number of days included in the pattern activity lookback. For more information, go to Lookback.

fpglimit Limit for frequent pattern mining results stored

fpglimitui Limit for frequent pattern mining results displayed

fpgmatchoff Turn off match for only patterns that appear in today dataset scope

fpgmulti Pass multiple fpgkey=fpginc key value pairs. Split by pipe for multiple. To see an example, go to Multiple Pattern Relationships.

fpgon Enables the pattern (mining) activity.

fpgq Select * from file (sql)

fpgscore Score for pattern mining records

fpgsupport Minimum occurrence rate for an itemset to be identified as frequent

fpgtbin Time bin for pattern mining, example: -fpgtbin DAY

fq Select * from file (sql). To see an example, go to DQ Job JSON.

fullfile Use entire file for lookbacks instead of just filequery. Includes the historical context of a single file in the outlier and/or patterns scans. For more information, go to Lookback.

h The hostname where CDQ is installed. This option is for running DQ jobs remotely.

header Comma delimited list of headers: fname,lname,price

headercheckoff Check headers for invalid chars. For more information, see Header Check.

help Print this message

histlimit Limit for histograms stored

histlimitui Limit for histograms displayed

histoff Dataset histogram flag force off

histon Dataset histogram flag force on

hive Turn on native Hive connection. For more information, go to DQ Job Hive and DQ Job Validate Source.

hivehwc Use hive warehouse connector to access data in HDP3.x Hive Warehouse

hootonly Only display hoot at stdout

hootprettyoff Hoot json pretty print flag off

host Owl metadata store host

hudi hudi file data flag

in Validate distinct column values against another dataset

inferschemaoff Turn off inferschema when loading files

iot Automatically store a numeric column without specifying a tsk, tsv

jars Spark - Comma-separated list of jars to include on the driver and executor classpaths.

jdbckeytab Path and location to jdbc principal keytab file

jdbcprinc Kerberos principal name specifically to connect to Kerberized JDBC, example: [email protected]

jdbctgt Path and location to jdbc principal tgt file

jobschemaname Mainly needed for Big Query, but can be used for any database to set the schema name explicitly versus parsing it out of the sql query later

json Json data flag

kafka Indicates that the target data source is Kafka

kafkabroker Kafka port, example: 9092

kafkagroup Kafka consumer group, example: machine-group

kafkakeyserde Optional --kafka_key_deserializer org_apache_kafka_common_serialization_StringSerializer

kafkaport Kafka host, example: localhost

kafkasasl Enable kafka SASL (Kerberos) authentication. If this option is set, also set kafkasaslservice flag

kafkasaslservice The name of the SASL service fr authenticate

kafkassl Enable kafka SSL 1-way and/or 2-way ssl. If this option is set, also set ssltruststore/sslkeystore flags

kafkatopic Topic, topic name, example: test_stream

kafkavalserde Optional --kafka_value_deserializer org_apache_kafka_common_serialization_StringSerializer

kerbpwd Kerberos password to aquire TGT

key Primary key or unique key, typically business key, example : sym,exch (compound use comma)

keyDelim Delimiter for primary key or unique key when concatenating the values to single string, example: sym,exch -> sym~~exch

lib The library class directory, for example, “/opt/owl/drivers/postrgres/”.

libsrc Library class directory for val src cxn

lickey Passes lickey from owlcheck to owl-core

linkid linkid is for client datasets to pass their primary_key or record_id field so that when Owl serves back the results they are linked to the original dataset

logfile Allow user to add their own custom logging

loglevel The logging level. This can be either INFO or DEBUG.

lookbacklimit Limit for lookback intervals

lower Median Q2 multiplier to impact lower boundary

lowerbound Number or Timestamp

maps Contains maps in json that requires extra handling

master Overrides local[*]

matches (Deprecated) Show matches in validate source instead of mismatches

maxcolumnlimit Limit for max columns

minintervals Minimum streaming intervals for profiling

mixedjson Contains non-json and json flag

mu Measurement unit for outliers

multiline Multiline json flag

notin Validate distinct column values against another dataset

nulls -nulls 'null' treats 'null' as NULL. For more information, see Nulls in Datasets.

numericlimit Limit for numeric outliers stored

numericlimitui Limit for numeric outliers displayed

numericscore Score for numeric outliers

numexecutors The number of Spark executors.

numpartitions Number of partitions or splits for parallel queries, default 3. To see an example, go to DQ Job JDBC.

obslimit Limit for observations stored

obslimitui Limit for observations displayed

obsscore Score for observation record

opt key=value, [escape='', quote='', timestampFormat='yyyy-MM-dd' ]

optpath /file/path/to/dsOption.properties [escape=value]

orc orc file data flag

otlrq Select * from file (sql)

outlierlimit Limit for outliers stored

outlierlimitui Limit for outliers displayed

outlieronly Only run outlier section

outlierscore Score for mismatching source to target records

owluser The username of the CDQ user running the job.

p Password

packages Spark - Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. The format for the coordinates should be groupId:artifactId:version

parallel Turn on parallel task execution vs sequential (default). Performance advantage but uses more hardware resources

parquet Parquet file data flag

partitionnum Number of partitions calculated by estimator/overide by user

passfail Set the failing score, example: 75

passfaillimit Limit for passing or failing runs

patternonly Only run pattern mining section

pgpassword Password for Owl's postgres metastore

pguser Username for Owl's postgres metastore

pipeline List of activities to analyze

plan Turn on execution plan. Describes the executions plan

port Owl metadata store port

postclearcache Delay clear cache process to the end of owlcheck

precisionoff Turn Profile Precision Off, do not calculate the length of doubles

profile2 Run inline version of column stats

profileonly Only run profile and shape section

profilepushdown Compute profile in the target database

profileStringLength Profile min/max length for String type columns on

profoff Turn off profiling section of owlcheck

pwdmgr lookup a password manager password via script and obtain the password for the JDBC connection

q The SQL query of your job. For example, select * from [table].

q1 The lower quartile boundary impact (IQR) value between 0-0.45. If this is not specified, the lower quartile is 0.15 by default.

q3 The upper quartile boundary impact (IQR) value between 0.55-1. If this is not specified, the upper quartile is 0.85 by default.

qhist Select * from table (sql)

queue YARN queue name

rc Record detection

rcBy Record compare by function

rcDateCol Record detection date column

rcKeys Record detection keys

rcTbin Record detction time bin

rd The run date of your job in either yyyy-MM-dd or yyyy-MM-dd HH:mm format. For more information, go to Date Time Variable Options.

rdAdj

Dynamically calculates and substitute the run date variable (rd) at the run time of file-based and database connection DQ Jobs to include historical data in your job schedule lookback.

The run date adjustment variable uses the following format, where each element defined in the table below should be replaced with an actual value: -rdAdj [XX]:[+/-][NNN]

Element

Description

[XX]

The date time variable.

Accepts yyyy, MM, dd, hh, mm, and ss.

[+/-]

Determines whether the [XX] element will be added or subtracted from the run date.

[NNN]

The number of units to add or subtract.

Example
-rdAdj dd:-1 -df "yyyy-MM-dd" -f "example-dataset/nyse=${yyyy}-${MM}-${dd}/part*.csv -rd "2018-01-20" substitutes 2018-01-19, which is one day prior to the run date of 2018-01-20, as specified by the -rdAdj variable.

Note
If you are using a file without a date format (-df), yyyy-MM-dd is used by default. If a date format is not set, it needs to be adjusted in the command line and parameterized in the file name with ${} for each date time variable. For example, -f "example-dataset/nyse=${yyyy}-${MM}-${dd}/part*.csv. The parameterized variables must match the format used in the -df variable.

rdEnd End date for query ranges t_date >= ${rd} and t_date < ${rdEnd}, must be in format 'yyyy-MM-dd' or for incremental use Hours or Minutes yyyy-MM-dd HH:mm

readonly Do not connect to meta store good for testing or trials

record Validate distinct column values against runs

recordoff Check for records that were added or dropped from dataset

repartitionoff Do not repartition

rlc Rule secondary src jdbc://hive:3306 (connection URL)

rld Rule secondary src driver path

rlds srcDataset (silo.account)

rlp Rule secondary src password

rlq Rule secondary src SQL

rlu Rule secondary src username

rootds Context based predictions you can assign a root dataset, example: user -> userLoan, user -< userCredit. rootds = user

rulename Only for rules validation testing to run single rule

ruleserial Run rules in serial mode

rulesoff Rules section flag off

rulesonly Only run rules section off

schemaregistrypass Password to login to schema registry where stream schema can be found

schemaregistryurl url of schema registry where stream schema can be found

schemaregistryuser Username to login to schema registry where stream schema can be found

schemascore Score for schema changes

scorecardsize Limit for size of scorecard displayed

sdriver Classname for custom secondary driver entered by user for complex rule

selectall Select * override cols

semanticoff Semantic forced off

semanticon Semantic forced on

skipfirstrow Indicates that the first row contains header values

skiplines Skip first N lines of a csv file before loading

sourceonly Only run validate source section

sp Sample percentage [0.0 - 1.0], default value 1.0 = 100%

sparkkeytab Path and location to keytab file

sparkprinc Kerberos principal name, example: [email protected]

sq Single quote (') character to escape SQL queries when returning to database

srcauto Auto generates validate source params from owl check history. Only needs -srcds and -valsrcfq or -q

srcavro avro file data flag for source

srcavroschema Validate source avro schema file

srcc jdbc://hive:3306 (connection URL)

srccxn Instead providing the user, pass, connectionurl for a connection, provide the saved connection name for validate source

srcd src driver oracle.driver.JDBC

srcdel Source delimiter ,

srcdelta Delta file data flag for source

srcds srcDataset (silo.account)

srcencoding Load source file charset encoding other than UTF-8

srcfile Validate source file

srcflatten Option to flatten json and explode arrays for source

srcfullfile Use entire file for lookbacks instead of just filequery for source

srcheader Validate source header for a file

srchive -srchive for validate source on Hive using HCat non JDBC

srcinferschemaoff Turn off inferschema when loading files or source

srcjson json data flag for source

srcjsonmaps Contains maps in json that requires extra handling for source

srcmixedjson Validate source contains non-json and json flag

srcmultiline Multiline json flag for source

srcorc orc file data flag for source

srcp src password

srcparquet Parquet file data flag for source

srcpwdmgr Lookup a password manager password via script and obtain the password for the JDBC connection

srcq src SQL

srcskiplines Skip first N lines of a source csv file before loading ()

srcu src username

srcxml Xml data flag for source

srcxmlrowtag Xml Row Tag for source

sslciphers Comma separated list of valid ciphers for the target secure socket connection

ssldisablehostverify Disable SSL hostname verification when deciding whether to trust the host's certificate

sslkeypass ssl key password (Only required when ssl key stored in keystore has a password)

sslkeystore Location of the ssl keystore

sslkeystorepass ssl keystore password

sslkeystoretype Type of the ssl keystore (Default: JKS)

ssltruststore Location of the ssl truststore

ssltruststorepass ssl truststore password

ssltruststoretype Type of the ssl truststore (Default: JKS)

statsoff Column stats flag off, on by default

stock Optimized for stock data, price history

stream Indicates that the target data source is a stream of data

streamformat Format, example: csv,avro,json,xml

streaminterval Interval, in second format, example: 10

streammaxlull The maximum time in seconds that a stream should not be empty

streamprops key=value,hive.resultset.use.unique.column.names=false

streamschema col:integer,col1:double,col2:string,col3:long

streamtype Type of stream. Possible values: Kafka

stringmode All data types forced to strings for type safe processing

t1 Select * from @dataset.column (sql)

t1q Select * from @dataset.column (sql)

tbin MIN -> minute [14:27], HOUR -> hour military [13], DAY -> [05], SEC -> Second [14:27:35]. To see an example, go to Advanced and DQ Job Back Run.

tbq Select * from dataset where time_bin = '2018-12-10 10'. Ex: for time bin outliers override.

tc Time Column for cases when date time are separate

timestamp Converts timestamp column to date format. Uses -dc date column flag as column to convert. Must be accompnaied with -transform flag to transform string to DateType/TimestampType

todq Select * from dataset where time_bin = '2018-12-10 10', example: for today override.

transform Transform expressions. can be on or delimited by |. Example: colname=cast(colname as string),colname2=colname2(cast as date). For more information, go to Transform.

ts Flag this dataset as a Time-Series dataset

u Username

ubon Use boundaries flag off

upper Median Q2 multiplier to impact higher boundary

upperbound Number or Timestamp

usespark usespark flag, forces spark, intended for datasets > 30 mil rows

usesql usesql implies to use the -q select * from table where etc as a subselect of the partitioning

usetemplate Does not require cmd line params uses saved properties, can override by adding them. To see an example, go to DQ Job Chron.

validateschemaorderon Validate source column name order

validatevalues Validate source matches on cell values and show mismatches

validatevaluesfilter Spark sql where clause to limit rows to validate values, example: "id = 123"

validatevaluesignoreempty Validate value ignores empty string as an issue

validatevaluesignorenull Validate values ignores null as an issue

validatevaluesignoreprecision Validate value ignore precision for decimal values

validatevaluesshowall Validate values shows findings for all columns instead of one per row

validatevaluesshowmissingkeys Provide options to show missing keys on both target and source for validating source with key case

validatevaluesthreshold Validate value threshold ratio. Default .9 (=90%)

validatevaluesthresholdstrictdownscore Validate values turn on strict downscore for threshold category

validatevaluestrimon Provide options to trim extra space for source target join and cell to cell comparison

valsrccaseon Validate source column name case sensitivity off

valsrcexc Validate source column exclusion list for target dataset

valsrcexcsrc Validate source column exclusion list for source dataset

valsrcfq Validate source file query

valsrcinc Validate source column inclusion list for target dataset

valsrcincsrc Validate source column inclusion list for source dataset

valsrcjoinonly Skip validate source row count, schema comparison and validate values, pair use with postcacheclear

valsrckey Validate source column key list for target dataset

valsrclimit Limit for validate source stored

valsrclimitui Limit for validate source displayed

valsrcmap Validate source file column mapping (sourceCol=targetCol,sourceCol2=targetCol2)

valsrcpdc Push row count to source database

valsrctypeoff Validate source don't check for schema type

version 0.1

vs Turn on validate source

where Allows you to place a common where clause and still accept partitioning

xml Xml data flag

xmlRowTag Xml Row Tag

zfn Zero fill null, NULL values will be 0.0. For more information, see Nulls in Datasets.

zkhost zk host

zkpath zk path

zkport Zookeeper port