Spark-shell Sample
./bin/spark-shell --jars /opt/owl/bin/owl-core-trunk-jar-with-dependencies.jar,/opt/owl/drivers/postgres/postgresql-42.2.5.jar --deploy-mode client --master local[*]
Import lib’s, if you get a dependency error, please import a second time.
import com.owl.core.util.{OwlUtils, Util}
import com.owl.common.domain2.OwlCheckQ
import com.owl.common.options._
Set up connection parameters to the database we want to scan if you don’t already have a dataframe.
val`` ``
url
`` ``= "jdbc:postgresql://xxx.xxx.xxx.xxx:xxxx/db?currentSchema=schema"
val`` ``
connProps
``=``
Map
(
"driver" -> "org.postgresql.Driver",
"user" -> "user",
"password" -> "pwd",
"url" ->`` ``
url
,
"dbtable" -> "db.table"
)
Create a new OwlOptions object so we can assign properties.
val opt = new OwlOptions()
Set up variables for ease of re-use.
val dataset = "nyse_notebook_test_final"
val runId = "2017-12-18"
var date = runId
var query = s"""select * from <table> where <date_col> = '
$
date' """
val pgDatabase = "dev"
val pgSchema = "public"
Set OwlOptions values to the metastore.
opt.
dataset
`` ``= dataset
opt.
runId
`` ``= runId
opt.
host
`` ``= "xxx.xxx.xxx.xxx"
opt.
pgUser
`` ``= "xxxxx"
opt.
pgPassword
`` ``= "xxxxx"
opt.
port
`` ``= s"5432/
$
pgDatabase?currentSchema=
$
pgSchema"
Create a connection, build the dataframe, register and run.
With inline processing you will already have a dataframe so you can skip down to setting the OwlContext.
val conn =`` ``
connProps
`` ``+ ("dbtable" -> s"(
$
query)`` ``
$
dataset")
val df =`` ``
spark
.read.format("jdbc").options(conn).load
val owl = OwlUtils.
OwlContext
(df, opt)
owl.register(opt)
owl.owlCheck