Connecting to Oracle

This section contains details for Oracle connections.

General information

Field	Description
Data source	Oracle
Supported versions	12.2.0.1.0
Connection string	`jdbc:oracle:thin`
Packaged?	Yes
Certified?	Yes
Supported features
Estimate job	Yes
Analyze data	Yes
Schedule	Yes
Processing capabilities
Pushdown	Yes
Spark agent	Yes
Yarn agent	Yes
Parallel JDBC	Yes
Java Platform version compatibility
JDK 8	Yes
JDK 11	Yes

Minimum user permissions

In order to bring your Oracle data into Data Quality & Observability Classic, you need the following permissions.

Read access on Oracle database tables.
The Kerberos user has read permissions on Oracle tables (if using Kerberos authentication)
ROLE_ADMIN assigned to your user in Collibra DQ.

Recommended and required connection properties

Required	Connection Property	Type	Value
Yes	Name	Text	The unique name of your connection. Ensure that there are no spaces in your connection name.
Yes	Connection URL	String	The connection string path of your Oracle connection. When referring to the example below, replace the `${value}` sections of the connection URL with your actual value. Example Using SID `jdbc:oracle:thin:@${host}:SID` Example Using database service name `jdbc:oracle:thin:@//${host}:${port}/service_name`
Yes	Driver Name	String	The driver class name of your connection. `oracle.jdbc.OracleDriver`
Yes	Port	Integer	The port number to establish a connection to the datasource. The default port is `1521`
No	Limit Schemas	Option	Allows you to manage usage and restrict visibility to only the necessary schemas in the Explorer tree. See Limiting schemas to learn how to limit schemas from the Connection Management page. Note When you include a restricted schema in the query of a DQ Job, the query scope may be overwritten when the job runs. While only the schemas you selected when you set up the connection are shown in the Explorer menu, users are not restricted from running SQL queries on any schema from the data source.
No	Source Name	String	N/A
No	Target Agent	Option	The Agent that submits your Spark job for processing.
Yes	Auth Type	Option	The method to authenticate your connection. Note The configuration requirements are different depending on the Auth Type you select. See Authentication for more details on available authentication types.
No	Properties	String	The configurable driver properties for your connection. Multiple properties must be semicolon delimited. For example, abc=123;test=true

Connection properties recognized by Oracle JDBC drivers

Name	Short Name	Type	Description
user	n/a	String	The user name for logging into the database
password	n/a	String	The password for logging into the database
database	server	String	The connect string for the database
internal_logon	n/a	String	A role, such as `sysdba` or `sysoper`, that allows you to log on as `sys`
defaultRowPrefetch	prefetch	String (containing integer value)	The default number of rows to prefetch from the server (default value is "10")
remarksReporting	remarks	String (containing boolean value)	"true" if getTables() and getColumns() should report TABLE_REMARKS; equivalent to using setRemarksReporting() (default value is "false")
defaultBatchValue	batchvalue	String (containing integer value)	The default batch value that triggers an execution request (default value is "10")
processEscapes	n/a	String (containing boolean value)	"false" to disable escape processing for statements (Statement or PreparedStatement) created from this connection. Set this to "false" if you want to avoid many calls to `Statement.setEscapeProcessing(false);`. This is espcially usefull for PreparedStatement where a call to `setEscapeProcessing(false)` would have no effect. The default is "true".
defaultNChar	n/a	String (containing boolean value)	"false" is the default. If set to "true", the default behavior for handling character datatypes is changed so that NCHAR/NVARCHAR2 become the default. This means that setFormOfUse() won't be needed anymore when using NCHAR/NVARCHAR2. This can also be set as a java property :`java -Doracle.jdbc.defaultNChar=true myApplication`
useFetchSizeWithLongColumn	n/a	String (containing boolean value)	"false" is the default. Important This is a thin-only property and should not be used with any other drivers. If set to "true", the performance when retrieving data in a 'SELECT' will be improved but the default behavior for handling LONG columns will be changed to fetch multiple rows (prefetch size). It means that enough memory will be allocated to read this data. So if you want to use this property, make sure that the LONG columns you are retrieving are not too big or you may run out of memory. This property can also be set as a java property : `java -Doracle.jdbc.useFetchSizeWithLongColumn=true myApplication`
SetFloatAndDoubleUseBinary	n/a	String (containing boolean value)	"false" is the default. If set to "true", causes the java.sql.PreparedStatment setFloat and setDouble API's to use internal binary format as for BINARY_FLOAT and BINARY_DOUBLE parameters. See oracle.jdbc.OraclePreparedStatement setBinaryFloat and setBinaryDouble

https://docs.oracle.com/cd/E11882_01/appdev.112/e13995/oracle/jdbc/OracleDriver.html

Authentication

Select an authentication type from the dropdown menu. The options available in the dropdown menu are the currently supported authentication types for this data source.

Required	Field	Description
Yes	Username	The username of your Oracle account.
Yes	Password	The password of your Oracle account.
Yes	Script	The file path that contains the script file that the password manager uses to interact with and authenticate a user account. Example /tmp/keytab/oracle_pwd_mgr.sh
No	Param $1	Optional. An additional parameters to authenticate your Oracle connection.
No	Param $2	Optional. An additional parameter to authenticate your Oracle connection.
No	Param $3	Optional. An additional parameter to authenticate your Oracle connection.
Yes	Principal	The Kerberos entity to authenticate and grant access to your connection. Example [email protected]
Yes	Keytab	The file path of the keytab file that contains the encrypted key for a Kerberos principal. Example /tmp/keytab/oracle_user.keytab Important You must update your keytab secrets file with the following values. Key: cdq_user.keytab Label: base64 encoded string
Yes	Password	The secret credential associated with your Kerberos principal.
Yes	Script	The file path that contains the script file used to interact with and authenticate a Kerberos user. Example /tmp/keytab/oracle_pwd_mgr.sh
No	Param $1	Optional. Additional Kerberos parameter.
No	Param $2	Optional. Additional Kerberos parameter.
No	Param $3	Optional. Additional Kerberos parameter.
Yes	TGT Cache	The ticket-granting ticket cache that stores the TGT to authenticate your connection.

Limitations

There is an Apache Spark limitation where Pullup jobs on Oracle datasets that use Timestamp with Time Zone column types fail in the LOAD activity. For more information about the limitation and a workaround, see the support article.
Because Oracle normally converts all characters in table and schema names to uppercase, Oracle also treats lower case characters as special characters. A possible workaround is to do the following:
1. Go to Explorer and click Create DQ Job on an Oracle table.
2. In the Query box, wrap the schema and table name with double quotes ". For example, select * from "schema"."tablename".
3. Go to the Run page and click the Run CMD tab.
4. Click the lock icon to unlock the command line.
5. Wrap backslashes \ around the double quoted schema and table names, and add a whitespace before the ending double quote of the -q section. For example, -q "select * from \"schema\".\"table\" "
The SDO_GEOMETRY column data type is not supported on Oracle connections.
When you add a timeslice to a column in an Oracle dataset, you must use the "TO_DATE" function to ensure the successful execution of your DQ Job. When a column includes a timeslice without the "TO_DATE" function, the DQ Job will fail with the error, "ORA-00907: missing right parenthesis."
- When you include to_date within the command line query, your DQ Job will run successfully with a non-zero time. In the following example of a command line query with a to_date, TRADE_DATE is the timestamp column:
  - -owluser user -numexecutors 1 -lib "/opt/owl/drivers/oracle/" -executormemory 1g -bhminon -h dq:5432/dq?currentSchema=validation -executorcores 2 -ds ORACLE_DQUSER.NYSE_2 -master k8s:// -drivermemory 1g -deploymode cluster -q "select * from DQUSER.NYSE where TRADE_DATE = to_date('${rd}', 'yyyy-mm-dd hh24:mi:ss')" -rd "2018-01-16 05:00:00" -bhlb 10 -driver "oracle.jdbc.OracleDriver" -bhmaxon -loglevel INFO -cxn ORACLE