Job Command Line Parameters

This topic provides information on using command line parameters to run jobs. It includes information specific to managing Spark resources, as well as a reference guide describing the commands available in the job run command.

Note You can display the run command used to run the current job on the Job tab of the Findings page.

Managing Spark resources from the command line

Scale linearly with your data

Scale linearly with your data by adding executors and/or memory. For example:

Copy
-f "file:///Users/home/salary_data.csv" \
-d "," \
-rd "2018-01-08" \
-ds "salary_data"
-numexecutors 2 \
-executormemory 2g

Yarn Master

If Collibra DQ is run on an edge node on a popular hadoop distribution such as HDP, CDH, EMR it will automatically register the jobs with Yarn Resource Manager.

Spark Master

Collibra DQ also runs using spark master by using the -master command and passing in spark:url.

Spark Standalone

Collibra DQ typically runs in standalone, but it will not by default distribute the processing beyond the hardware it was activated on.

Options Description
deploymode The Spark deploymode option. For example, cluster.
drivermemory The driver memory of your local Spark instance in gigabytes.
executorcores Spark executor cores.
executormemory The total Spark executor memory in gigabytes, for example, 3G.
master Overrides local[*]
sparkprinc Kerberos principal name, example: [email protected].

Use Spark-Submit directly bypassing DQCheck

Copy
spark-submit \
--driver-class-path /opt/owl/drivers/postgres42/postgresql-42.2.4.jar \
--driver-library-path /opt/owl/drivers/postgres42/postgresql-42.2.4.jar \
--driver-memory 3g --num-executors 2 --executor-memory 1g \
--master spark://Kirks-MBP.home:7077 \
--class com.owl.core.cli.OwlCheck /opt/owl/bin/owl-core-trunk-jar-with-dependencies.jar \
-u user -p pass -c jdbc:postgresql://xyz.chzid9w0hpyi.us-east-1.rds.amazonaws.com/postgres \
-ds accounts -rd 2019-05-05 -dssafeoff -q "select * from accounts"
-driver org.postgresql.Driver -lib /opt/owl/drivers/postgres42/  

Parallel JDBC Spark-Submit

Copy
spark-submit \
--driver-class-path /opt/owl/drivers/postgres42/postgresql-42.2.4.jar \
--driver-library-path /opt/owl/drivers/postgres42/postgresql-42.2.4.jar \
--conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///opt/owl/config/log4j-TRACE.properties \
--conf spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///opt/owl/config/log4j-TRACE.properties \
--files /opt/owl/config/log4j-TRACE.properties \
--driver-memory 2g --num-executors 2 --executor-memory 1g --master spark://Kirks-MBP.home:7077  \
--class com.owl.core.cli.OwlCheck /opt/owl/bin/owl-core-trunk-jar-with-dependencies.jar \
-u us -p pass -c jdbc:postgresql://xyz.chzid9w0hpyi.us-east-1.rds.amazonaws.com/postgres \
-ds aumdt -rd 2019-05-05 -dssafeoff -q "select * from aum_dt" \
-driver org.postgresql.Driver -lib /opt/owl/drivers/postgres42/  \
-connectionprops fetchsize=6000 -master spark://Kirks-MBP.home:7077 \
-corroff -histoff -statsoff \
-columnname updt_ts -numpartitions 4 -lowerbound 1557597987353 -upperbound 1557597999947

Command Line Reference

The following table describes the parameters available for a job run command.

Parameter Description
adddc Add Date Column as Run Id that is used
addlib Additional library directory to be added to the classpath for the DQ Job (spark-submit)
agentjobid Internal use only
agg Grouping function for flexibility
aggq

select * from dataset where time_bin = '2018-12-10 10', example: for aggregate override.

alertemail Automatically add an alert with score greater than 75, to the email value supplied
alias Dataset name alias, example: userTable or users or user_file
archivecxn Connection name to archive break records
avro avro file data flag
avroschema avro schema file
bd Column count if you want to group by a particular set of values for behavioral statistics
bdcol Behavioral function if you want to aggregate for behavioral statistics
bdfunc Behavioral function if you want to aggregate for behavioral statistics
bdgrp Behavioral group to dynamically collect stats if you want to group by a particular set of values for behavioral statistics
behaviorscoreoff Turn off behavior scoring
bhemptyoff Behavior empty check detection off
bhlb The behavior lookback period where the entered value represents the number of days. For example, a value of 12 looks back 12 days of data.
bhmaxoff Behavior max value detection off
bhmaxon Behavior max value detection on
bhmeanoff Behavior mean value detection off
bhmeanon Behavior mean value detection on
bhminoff Behavior min value detection off
bhminon Behavior min value detection on
bhminsupport Behavior min support, set to 4 by default, min number of days to learn from, learning phase
bhnulloff Behavior null check detection off
bhrowoff Behavior row count detection off
bhsensitivity Behavior sensitivity: NARROW, NULL, WIDE
bhtimeoff Behavior load time detection off
bhtimeon Behavior load time detection on
bhuniqueoff Behavior unique detection off
bhuniqueon Behavior unique detection on
br Number of back runs to fill training history, should be an integer value
brbin Time bin for back runs, example: -brbin DAY
bt Back-tick character (`) to escape SQL queries when returning to database
by Compare by DAY, HOUR, or MIN.
c jdbc://hive:3306 (connection URL)
cacheoff Turn caching off. Caching is on by default. It can be turned off if the dataset is too large or cache optimization is not desired.
cardoff Turn off profiling section of owlcheck
categoricallimit Limit for categorical outliers stored
categoricallimitui Limit for categorical outliers displayed
categoricalscore Score for each categorical outlier
catoff Disables categorical outlier detection.
caton Turn on categorical outliers
catOutAlgo Specify ML algorithm for categorical outliers. Default: "" (no ML)
catOUtAlgoParams Optional params for catOutAlgo to override Owl-suggested params. Default: "". E.g. "k=5,initSteps=5"
catOutBottomN Max number of categorical outliers in a column
catOutConfType Method to use to calculate likelihood of category level
catOutMaxCategoryN Maximum number of categories within key that will trigger homogenous past categorical outlier case
catOutMaxConf Confidence upper bound to qualify as an outlier
catOutMaxFreqPercentile Frequency percentile upper bound to qualify as an outlier
catOutMinFreq Minimum frequency needed to be considered an outlier. Raise to make less sensitive
catOutMinVariance Minimum frequency count variance (within key) required to be considered an outlier. set to negative to be more sensitive
catOutParallelOff Turn off parallel column-wise processing of categorical outliers
catOutTopN Number of top frequently appearing level in a column to include in preview
columnname Column name to split on for spark JDBC
concat No arguments. Concatenate option for categorical oultiers columns
conf The Spark configuration option. For example, spark.kubernetes.memoryOverheadFactor=0.4,spark.kubernetes.executor.podTemplateFile=local:///opt/owl/config/k8s-executor-template.yml,spark.kubernetes.driver.podTemplateFile=local:///opt/owl/config/k8s-driver-template.yml
connectionprops key=value,hive.resultset.use.unique.column.names=false
connectionpropssrc key=value,hive.resultset.use.unique.column.names=false
corefetchmode Let core go fetch query from meta store in stead using the one passed in command line
corroff Dataset correlation flag force off
corron Dataset correlation flag force on
cxn The name of the saved database or file connection from which your dataset originates.
d Delimiter, ','
dataconceptid Identifier of the group of semantic rules by datatype
datashapeexc Exclude a column from data shapes discovery
datashapegranular Check length for alphanumeric fields, and independent check for numbers and letters
datashapeinc Include a column that has been excluded from data shapes discovery
datashapelimit Limit for datashapes stored
datashapelimitui Limit for datashapes displayed
datashapemaxcolsize Maximum length of a string column before it is disqualified from shapes detection
datashapemaxpercol Maximum number of shapes per column before column is ignored during shapes processing
datashapeoff Turn DataShape Activity Off
datashapescore Score for each datashape
datashapesense Maximum occurrence rate (%) to be considered a shape
dateoff Turn date detection off. In some cases date detection is a costly operation with little value
dblb DB lookback to check owl check history for previous histories
dc The date column for outlier detection.
delta Delta file data flag
deploymode The Spark deploymode option. For example, cluster.
depth The depth of duplicate discovery between 1-3, increasing runtime non-linearly. The default value is 1.
df Date Format, example: yyyy-MM-dd
diff Percentage difference between two days to do a reference for keys missing
divisor Divisor for unix timestamp. s for seconds or ms for milliseconds. Default is ms.
dl Deep learning. This enables the outliers activity.
dlcombine When numerical outlier appears more than once, combine them as single outlier
dlcombineoff When numerical outlier appears more than once, do not combine them as single outlier
dlexc Deep learning col exclusion, example: open,close,high,volume
dlinc The column limit for deep learning. This can be a comma delimited list of columns to include in your job. For example, if you want to include columns called account_id, date, and frequency, the correct syntax would be account_id,date,frequency.
dlkey The natural key for deep learning. This is the column in your dataset that you set as the key column.
dllb The deep learning lookback period where the entered value represents the number of days included in the outlier activity lookback.
dlminhist Minimum records for outlier history, default dllb - 2
dlmulti Pass multiple dlkey=dlinc key value pairs. Split by pipe for multiple
dn Driver name org.apache.jdbc.PhoenixDriver
dpoff Do not store data preview records
dprev Data preview turned off, same as onReadOnly
dq Double-quote character (") to escape SQL queries when returning to database
driver The driver class name of a custom driver.
drivermemory The driver memory of your local Spark instance in gigabytes.
ds The name of the dataset.
dssafeoff Best practice naming convention flag, provides a globally unique and meaningful natural key to all datasets
dupe Enables the dupe activity.
dupeapprox Approximate groupings default value =1 [0-3]
dupecutoff

The duplicate score lower boundary for non-exact matching percentage. For example, if you set the dupecutoff value to 40, then the lowest percentage of a potential duplicate match would be 40%. This can be used in conjunction with -dupepermatchupperlimit to specify a range of matches.

Note If Exact Match is enabled, this value cannot be set.
dupeexc Duplicate record detection, column exclusion list
dupeinc The column limit for duplicate record detection. This can be a comma delimited list of columns to include in your job. For example, if you want to include columns called account_id, date, and frequency, the correct syntax would be account_id,date,frequency.
dupelb Duplicate lower bounds on percent match, default [85]
dupelimit Limit for dupe rows stored
dupelimitui Limit for dupe rows displayed
dupenocase Duplicate record case sensitivity off
dupeonly Only run duplicate section
dupepermatchupperlimit The duplicate score upper boundary for non-exact matching percentage, set to 100 by default.
dupescore Score for each duplicate record
dupesperdupe Max dupes to calculate per duplicate match
dupetruecase Enables case sensitivity.
dupeub Duplicate upper bounds on percent match, default [100]
ec Add custom escape character to escape SQL queries when returning to database
encoding Load file charset encoding other than UTF-8
erlq Explicit k,v string of rule_name and rule sql for secondary datasets
executorcores Spark executor cores
executormemory The total Spark executor memory in gigabytes, for example, 3G.
f File path for load, /dir/filename.csv
files Pass additional spark files for distribution on cluster
filter Only use rows containing this value
filtergram filtergram
filternot Only use rows containing this value
flatten Option to flatten json and explode arrays
fllb File Lookback to check owl check history for previous files
fllbminrow Minimum number of rows (inclusive) that owl check history needs to be considered for File Lookback. Default 0 (which includes all owlchecks)
fpgbucketlimit Limit bucket size for Pattern algorithm, example: -fpgbucketlimit 20000
fpgconfidence Minimum occurrence rate at which an association rule has to be found to be true
fpgdc The column in your dataset that you set as the date column.
fpgdupeoff Pattern mining do not remove dupe cols, helps performance impacts quality
fpgexc Pattern mining is expensive use this input to limit the observed cols
fpginc

The column limit for pattern mining. This can be a comma delimited list of columns to include in your job. For example, if you want to include columns called account_id, date, and frequency, the correct syntax would be account_id,date,frequency

Because pattern mining is expensive, limiting the number of columns in your query can be an effective way to control costs.

fpgkey The natural key for pattern mining. This is the column in your dataset that you set as the key column.
fpglb The lookback period where the entered value represents the number of days included in the pattern activity lookback.
fpglimit Limit for frequent pattern mining results stored
fpglimitui Limit for frequent pattern mining results displayed
fpgmatchoff Turn off match for only patterns that appear in today dataset scope
fpgmulti Pass multiple fpgkey=fpginc key value pairs. Split by pipe for multiple
fpgon Enables the pattern (mining) activity.
fpgq Select * from file (sql)
fpgscore Score for pattern mining records
fpgsupport Minimum occurrence rate for an itemset to be identified as frequent
fpgtbin Time bin for pattern mining, example: -fpgtbin DAY
fq Select * from file (sql)
fullfile Use entire file for lookbacks instead of just filequery
h The hostname where CDQ is installed. This option is for running DQ jobs remotely.
header Comma delimited list of headers: fname,lname,price
headercheckoff Check headers for invalid chars
help Print this message
histlimit Limit for histograms stored
histlimitui Limit for histograms displayed
histoff Dataset histogram flag force off
histon Dataset histogram flag force on
hive Turn on native hive for Hive non JDBC recommended
hivehwc Use hive warehouse connector to access data in HDP3.x Hive Warehouse
hootonly Only display hoot at stdout
hootprettyoff Hoot json pretty print flag off
host Owl metadata store host
hudi hudi file data flag
in Validate distinct column values against another dataset
inferschemaoff Turn off inferschema when loading files
iot Automatically store a numeric column without specifying a tsk, tsv
jars Spark - Comma-separated list of jars to include on the driver and executor classpaths.
jdbckeytab Path and location to jdbc principal keytab file
jdbcprinc Kerberos principal name specifically to connect to Kerberized JDBC, example: [email protected]
jdbctgt Path and location to jdbc principal tgt file
jobschemaname Mainly needed for Big Query, but can be used for any database to set the schema name explicitly versus parsing it out of the sql query later
json Json data flag
kafka Indicates that the target data source is Kafka
kafkabroker Kafka port, example: 9092
kafkagroup Kafka consumer group, example: machine-group
kafkakeyserde Optional --kafka_key_deserializer org_apache_kafka_common_serialization_StringSerializer
kafkaport Kafka host, example: localhost
kafkasasl Enable kafka SASL (Kerberos) authentication. If this option is set, also set kafkasaslservice flag
kafkasaslservice The name of the SASL service fr authenticate
kafkassl Enable kafka SSL 1-way and/or 2-way ssl. If this option is set, also set ssltruststore/sslkeystore flags
kafkatopic Topic, topic name, example: test_stream
kafkavalserde Optional --kafka_value_deserializer org_apache_kafka_common_serialization_StringSerializer
kerbpwd Kerberos password to aquire TGT
key Primary key or unique key, typically business key, example : sym,exch (compound use comma)
keyDelim Delimiter for primary key or unique key when concatenating the values to single string, example: sym,exch -> sym~~exch
lib The library class directory, for example, “/opt/owl/drivers/postrgres/”.
libsrc Library class directory for val src cxn
lickey Passes lickey from owlcheck to owl-core
linkid linkid is for client datasets to pass their primary_key or record_id field so that when Owl serves back the results they are linked to the original dataset
logfile Allow user to add their own custom logging
loglevel The logging level. This can be either INFO or DEBUG.
lookbacklimit Limit for lookback intervals
lower Median Q2 multiplier to impact lower boundary
lowerbound Number or Timestamp
maps Contains maps in json that requires extra handling
master Overrides local[*]
matches (Deprecated) Show matches in validate source instead of mismatches
maxcolumnlimit Limit for max columns
minintervals Minimum streaming intervals for profiling
mixedjson Contains non-json and json flag
mu Measurement unit for outliers
multiline Multiline json flag
notin Validate distinct column values against another dataset
nulls -nulls 'null' treats 'null' as NULL
numericlimit Limit for numeric outliers stored
numericlimitui Limit for numeric outliers displayed
numericscore Score for numeric outliers
numexecutors The number of Spark executors.
numpartitions Number of partitions or splits for parallel queries,default 3
obslimit Limit for observations stored
obslimitui Limit for observations displayed
obsscore Score for observation record
opt key=value, [escape='', quote='', timestampFormat='yyyy-MM-dd' ]
optpath /file/path/to/dsOption.properties [escape=value]
orc orc file data flag
otlrq Select * from file (sql)
outlierlimit Limit for outliers stored
outlierlimitui Limit for outliers displayed
outlieronly Only run outlier section
outlierscore Score for mismatching source to target records
owluser The username of the CDQ user running the job.
p Password
packages Spark - Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. The format for the coordinates should be groupId:artifactId:version
parallel Turn on parallel task execution vs sequential (default). Performance advantage but uses more hardware resources
parquet Parquet file data flag
partitionnum Number of partitions calculated by estimator/overide by user
passfail Set the failing score, example: 75
passfaillimit Limit for passing or failing runs
patternonly Only run pattern mining section
pgpassword Password for Owl's postgres metastore
pguser Username for Owl's postgres metastore
pipeline List of activities to analyze
plan Turn on execution plan. Describes the executions plan
port Owl metadata store port
postclearcache Delay clear cache process to the end of owlcheck
precisionoff Turn Profile Precision Off, do not calculate the length of doubles
profile2 Run inline version of column stats
profileonly Only run profile and shape section
profilepushdown Compute profile in the target database
profileStringLength Profile min/max length for String type columns on
profoff Turn off profiling section of owlcheck
pwdmgr lookup a password manager password via script and obtain the password for the JDBC connection
q The SQL query of your job. For example, select * from [table].
q1 The lower quartile boundary impact (IQR) value between 0-0.45. If this is not specified, the lower quartile is 0.15 by default.
q3 The upper quartile boundary impact (IQR) value between 0.55-1. If this is not specified, the upper quartile is 0.85 by default.
qhist Select * from table (sql)
queue YARN queue name
rc Record detection
rcBy Record compare by function
rcDateCol Record detection date column
rcKeys Record detection keys
rcTbin Record detction time bin
rd The run date of your job in either yyyy-MM-dd or yyyy-MM-dd HH:mm format.
rdAdj Adjusts the run date (rd) value for replacement date variables yyyy:MM:dd HH:mm:ss, formatting XX:NNN (example dd:-2 overrides the run date by substracting 2 days)
rdEnd End date for query ranges t_date >= ${rd} and t_date < ${rdEnd}, must be in format 'yyyy-MM-dd' or for incremental use Hours or Minutes yyyy-MM-dd HH:mm
readonly Do not connect to meta store good for testing or trials
record Validate distinct column values against runs
recordoff Check for records that were added or dropped from dataset
repartitionoff Do not repartition
rlc Rule secondary src jdbc://hive:3306 (connection URL)
rld Rule secondary src driver path
rlds srcDataset (silo.account)
rlp Rule secondary src password
rlq Rule secondary src SQL
rlu Rule secondary src username
rootds Context based predictions you can assign a root dataset, example: user -> userLoan, user -< userCredit. rootds = user
rulename Only for rules validation testing to run single rule
ruleserial Run rules in serial mode
rulesoff Rules section flag off
rulesonly Only run rules section off
schemaregistrypass Password to login to schema registry where stream schema can be found
schemaregistryurl url of schema registry where stream schema can be found
schemaregistryuser Username to login to schema registry where stream schema can be found
schemascore Score for schema changes
scorecardsize Limit for size of scorecard displayed
sdriver Classname for custom secondary driver entered by user for complex rule
selectall Select * override cols
semanticoff Semantic forced off
semanticon Semantic forced on
skipfirstrow Indicates that the first row contains header values
skiplines Skip first N lines of a csv file before loading
sourceonly Only run validate source section
sp Sample percentage [0.0 - 1.0], default value 1.0 = 100%
sparkkeytab Path and location to keytab file
sparkprinc Kerberos principal name, example: [email protected]
sq Single quote (') character to escape SQL queries when returning to database
srcauto Auto generates validate source params from owl check history. Only needs -srcds and -valsrcfq or -q
srcavro avro file data flag for source
srcavroschema Validate source avro schema file
srcc jdbc://hive:3306 (connection URL)
srccxn Instead providing the user, pass, connectionurl for a connection, provide the saved connection name for validate source
srcd src driver oracle.driver.JDBC
srcdel Source delimiter ,
srcdelta Delta file data flag for source
srcds srcDataset (silo.account)
srcencoding Load source file charset encoding other than UTF-8
srcfile Validate source file
srcflatten Option to flatten json and explode arrays for source
srcfullfile Use entire file for lookbacks instead of just filequery for source
srcheader Validate source header for a file
srchive -srchive for validate source on Hive using HCat non JDBC
srcinferschemaoff Turn off inferschema when loading files or source
srcjson json data flag for source
srcjsonmaps Contains maps in json that requires extra handling for source
srcmixedjson Validate source contains non-json and json flag
srcmultiline Multiline json flag for source
srcorc orc file data flag for source
srcp src password
srcparquet Parquet file data flag for source
srcpwdmgr Lookup a password manager password via script and obtain the password for the JDBC connection
srcq src SQL
srcskiplines Skip first N lines of a source csv file before loading ()
srcu src username
srcxml Xml data flag for source
srcxmlrowtag Xml Row Tag for source
sslciphers Comma separated list of valid ciphers for the target secure socket connection
ssldisablehostverify Disable SSL hostname verification when deciding whether to trust the host's certificate
sslkeypass ssl key password (Only required when ssl key stored in keystore has a password)
sslkeystore Location of the ssl keystore
sslkeystorepass ssl keystore password
sslkeystoretype Type of the ssl keystore (Default: JKS)
ssltruststore Location of the ssl truststore
ssltruststorepass ssl truststore password
ssltruststoretype Type of the ssl truststore (Default: JKS)
statsoff Column stats flag off, on by default
stock Optimized for stock data, price history
stream Indicates that the target data source is a stream of data
streamformat Format, example: csv,avro,json,xml
streaminterval Interval, in second format, example: 10
streammaxlull The maximum time in seconds that a stream should not be empty
streamprops key=value,hive.resultset.use.unique.column.names=false
streamschema col:integer,col1:double,col2:string,col3:long
streamtype Type of stream. Possible values: Kafka
stringmode All data types forced to strings for type safe processing
t1 Select * from @dataset.column (sql)
t1q Select * from @dataset.column (sql)
tbin MIN -> minute [14:27], HOUR -> hour military [13], DAY -> [05], SEC -> Second [14:27:35]
tbq Select * from dataset where time_bin = '2018-12-10 10'. Ex: for time bin outliers override.
tc Time Column for cases when date time are separate
timestamp Converts timestamp column to date format. Uses -dc date column flag as column to convert. Must be accompnaied with -transform flag to transform string to DateType/TimestampType
todq Select * from dataset where time_bin = '2018-12-10 10', example: for today override.
transform Transform expressions. can be on or delimited by |. Example: colname=cast(colname as string),colname2=colname2(cast as date)
ts Flag this dataset as a Time-Series dataset
u Username
ubon Use boundaries flag off
upper Median Q2 multiplier to impact higher boundary
upperbound Number or Timestamp
usespark usespark flag, forces spark, intended for datasets > 30 mil rows
usesql usesql implies to use the -q select * from table where etc as a subselect of the partitioning
usetemplate Does not require cmd line params uses saved properties, can override by adding them
validateschemaorderon Validate source column name order
validatevalues Validate source matches on cell values and show mismatches
validatevaluesfilter Spark sql where clause to limit rows to validate values, example: "id = 123"
validatevaluesignoreempty Validate value ignores empty string as an issue
validatevaluesignorenull Validate values ignores null as an issue
validatevaluesignoreprecision Validate value ignore precision for decimal values
validatevaluesshowall Validate values shows findings for all columns instead of one per row
validatevaluesshowmissingkeys Provide options to show missing keys on both target and source for validating source with key case
validatevaluesthreshold Validate value threshold ratio. Default .9 (=90%)
validatevaluesthresholdstrictdownscore Validate values turn on strict downscore for threshold category
validatevaluestrimon Provide options to trim extra space for source target join and cell to cell comparison
valsrccaseon Validate source column name case sensitivity off
valsrcexc Validate source column exclusion list for target dataset
valsrcexcsrc Validate source column exclusion list for source dataset
valsrcfq Validate source file query
valsrcinc Validate source column inclusion list for target dataset
valsrcincsrc Validate source column inclusion list for source dataset
valsrcjoinonly Skip validate source row count, schema comparison and validate values, pair use with postcacheclear
valsrckey Validate source column key list for target dataset
valsrclimit Limit for validate source stored
valsrclimitui Limit for validate source displayed
valsrcmap Validate source file column mapping (sourceCol=targetCol,sourceCol2=targetCol2)
valsrcpdc Push row count to source database
valsrctypeoff Validate source don't check for schema type
version 0.1
vs Turn on validate source
where Allows you to place a common where clause and still accept partitioning
xml Xml data flag
xmlRowTag Xml Row Tag
zfn Zero fill null, NULL values will be 0.0
zkhost zk host
zkpath zk path
zkport Zookeeper port