Connecting to Amazon Redshift

This section contains details for Amazon Redshift connections.

General information

Field	Description
Data source	Amazon Redshift
Supported version	2.1.0.9
Connection string	`jdbc:redshift://`
Packaged?	Yes
Certified?	Yes
Supported features
Estimate job	Yes
Analyze data	Yes
Schedule	Yes
Processing capabilities
Pushdown	Yes
Spark agent	Yes
Yarn agent	Yes
Parallel JDBC	Yes
Java Platform version compatibility
JDK 8	Yes
JDK 11	Yes

Minimum user permissions

In order to bring your Redshift data into Data Quality & Observability Classic, you need the following permissions.

Read access on your Redshift database tables.
ROLE_ADMIN assigned to your user in Collibra DQ.

Recommended and required connection properties

Required	Connection Property	Type	Value
Yes	Name	Text	The unique name of your connection. Ensure that there are no spaces in your connection name.
Yes	Connection URL	String	The connection string path of your Redshift connection. When referring to the example below, replace the `${value}` sections of the connection URL with your actual value. Example `jdbc:redshift://${host}:5439/${database}`
Yes	Driver Name	String	The driver class name of your Redshift connection. com.amazon.redshift.jdbc42.Driver
Yes	Port	Integer	The port number to establish a connection to the datasource. The default port is `5439`
No	Limit Schemas	Option	Allows you to manage usage and restrict visibility to only the necessary schemas in the Explorer tree. See Limiting schemas to learn how to limit schemas from the Connection Management page. Note When you include a restricted schema in the query of a DQ Job, the query scope may be overwritten when the job runs. While only the schemas you selected when you set up the connection are shown in the Explorer menu, users are not restricted from running SQL queries on any schema from the data source.
No	Source Name	String	N/A
No	Target Agent	Option	The Agent that submits your Spark job for processing.
Yes	Auth Type	Option	The method to authenticate your connection. Note The configuration requirements are different depending on the Auth Type you select. See Authentication for more details on available authentication types.
No	Properties	String	The configurable driver properties for your connection. Multiple properties must be semicolon delimited. For example, abc=123;test=true

Authentication

Select an authentication type from the dropdown menu. The options available in the dropdown menu are the currently supported authentication types for this data source.

Required	Field	Description
Yes	Username	The username of your Redshift account.
Yes	Password	The password of your Redshift account.
Yes	Script	The file path containing the script file that the password manager uses to interact with and authenticate a user account. Example /tmp/keytab/redshift_pwd_mgr.sh
No	Param $1	Optional. An additional parameter to authenticate your Redshift connection.
No	Param $2	Optional. An additional parameter to authenticate your Redshift connection.
No	Param $3	Optional. An additional parameter to authenticate your Redshift connection.

Locating the JDBC URL

Before you can connect to your Redshift cluster from a SQL client tool, you need to know the JDBC URL of your cluster. The JDBC URL has the following format: jdbc:redshift://endpoint:port/database.Note

A JDBC URL specified with the former format of jdbc:postgresql://endpoint:port/database still works.

The fields of the format shown preceding have the following values.

Field	Value
`jdbc`	The protocol for the connection.
`redshift`	The sub-protocol that specifies to use the Redshift driver to connect to the database.
`endpoint`	The endpoint of the Redshift cluster.
`port`	The port number that you specified when you launched the cluster. If you have a firewall, make sure that this port is open for you to use.
`database`	The database that you created for your cluster.

The following is an example JDBC URL: jdbc:redshift://examplecluster.abc123xyz789.us-west-2.redshift.amazonaws.com:5439/dev

For information about how to get your JDBC connection, see Finding your cluster connection string.

If the client computer fails to connect to the database, you can troubleshoot possible issues. For more information, see Troubleshooting connection issues in Amazon Redshift.\

https://docs.aws.amazon.com/redshift/latest/mgmt/configure-jdbc-connection.html

Configuring Amazon S3 for source-to-target validation

When setting up source-to-target validation jobs between Amazon S3 and Redshift, the authentication type used for the AWS S3 connection does not impact how DQ processes the data. For example, if Access Key-based authentication is used for the DQ S3 connection, and the DQ Agent is configured to use the Spark Cluster of EMR, DQ jobs are executed at EMR. The DQ Agent will generate the session credentials and share them with DQ jobs at run-time for dataset processing.

If using Instance Profile for authentication, DQ uses the IAM role attached to the EC2 instance to retry the session credentials required to authenticate with the AWS S3 service. The same session credentials are shared during run-time with DQ Agents for dataset processing, which involves fetching the data from the AWS S3 service into the Spark data frames.

Limitations

When using Pushdown for Redshift, the following column data types are not supported:

BINARY VARYING
GEOGRAPHY
GEOMETRY
HLLSKETCH
SQL_VARCHAR
SUPER
VARBINARY
VARBYTE