Spark-shell Sample

./bin/spark-shell --jars /opt/owl/bin/owl-core-trunk-jar-with-dependencies.jar,/opt/owl/drivers/postgres/postgresql-42.2.5.jar --deploy-mode client --master local[*]

Import lib’s, if you get a dependency error, please import a second time.

import com.owl.core.util.{OwlUtils, Util}

import com.owl.common.domain2.OwlCheckQ

import com.owl.common.options._

Set up connection parameters to the database we want to scan if you don’t already have a dataframe.

val`` ``url`` ``= "jdbc:postgresql://xxx.xxx.xxx.xxx:xxxx/db?currentSchema=schema"val`` ``connProps``=``Map("driver" -> "org.postgresql.Driver","user" -> "user","password" -> "pwd","url" ->`` ``url,"dbtable" -> "db.table")

Create a new OwlOptions object so we can assign properties.

val opt = new OwlOptions()

Set up variables for ease of re-use.

val dataset = "nyse_notebook_test_final"

val runId = "2017-12-18"

var date = runId

var query = s"""select * from <table> where <date_col> = '$date' """

val pgDatabase = "dev"val pgSchema = "public"

Set OwlOptions values to the metastore.

opt.dataset`` ``= datasetopt.runId`` ``= runIdopt.host`` ``= "xxx.xxx.xxx.xxx"opt.pgUser`` ``= "xxxxx"opt.pgPassword`` ``= "xxxxx"opt.port`` ``= s"5432/$pgDatabase?currentSchema=$pgSchema"

Create a connection, build the dataframe, register and run.

With inline processing you will already have a dataframe so you can skip down to setting the OwlContext.

val conn =`` ``connProps`` ``+ ("dbtable" -> s"($query)`` ``$dataset")val df =`` ``spark.read.format("jdbc").options(conn).load

val owl = OwlUtils.OwlContext(df, opt)owl.register(opt)owl.owlCheck