Connecting to Amazon Athena

This section contains details for Amazon Athena connections.

Select an option from the dropdown menu to display information for a particular driver class.

General information

Field	Description
Data source	Amazon Athena
Supported versions	2.0.35.1000 21.0.8214.0
Connection string	`jdbc:awsathena://jdbc:cdata:amazonathena`
Packaged?	No Yes
Certified?	Yes
Supported features
Estimate job	Yes
Analyze data	Yes
Schedule	Yes
Processing capabilities
Pushdown	Yes
Spark agent	Yes
Yarn agent	Yes No
Parallel JDBC	No
Java Platform version compatibility
JDK 8	Yes
JDK 11	Yes

Minimum user permissions

In order to bring your Athena data into Data Quality & Observability Classic, you need the following permissions.

Read access on your Glue catalog and S3 buckets.
Write access on your S3 output location.
ROLE_ADMIN assigned to your user in Collibra DQ.

Copy

{
    "Version": "YYYY-MM-DD",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "athena:StartQueryExecution",
                "s3:ListBucketMultipartUploads",
                "athena:GetQueryResultsStream",
                "glue:GetTables",
                "glue:GetPartitions",
                "athena:GetQueryResults",
                "glue:BatchGetPartition",
                "s3:ListBucket",
                "glue:GetDatabases",
                "athena:ListQueryExecutions",
                "s3:ListMultipartUploadParts",
                "glue:GetTable",
                "glue:GetDatabase",
                "athena:GetWorkGroup",
                "s3:PutObject",
                "s3:GetObject",
                "glue:GetPartition",
                "glue:GetCatalogImportStatus",
                "athena:StopQueryExecution",
                "athena:GetQueryExecution",
                "s3:GetBucketLocation",
                "athena:BatchGetQueryExecution",
                "athena:DeletePreparedStatement",
                "athena:CreatePreparedStatement"
            ],
            "Resource": [
                "arn:aws:athena:*:<AWSAccountID>:workgroup/primary",
                "arn:aws:s3:::<S3 bucket name>/*",
                "arn:aws:s3:::<S3 bucket name>",
                "arn:aws:glue:*:<AWSAccountID>:catalog",
                "arn:aws:glue:*:<AWSAccountID>:database/<database name>",
                "arn:aws:glue:*:<AWSAccountID>:table/<database name>/*"
            ]
        }
    ]
}

Recommended and required connection properties

Required	Connection Property	Type	Value
Yes	Name	Text	The unique name of your connection. Ensure that there are no spaces in your connection name.
Yes	Connection URL	String	The connection string path of your Athena connection. Use the appropriate JDBC connection URLs in your business tool configuration according to your private DNS configuration for your endpoint. Use the following connection string if you turned off the private DNS: jdbc:awsathena://vpce-.athena.us-east-1.vpce.amazonaws.com:443 Use the following connection string if you turned on the private DNS: jdbc:awsathena://athena.us-east-1.amazonaws.com:443 Note Athena's streaming API uses port 444 to stream the query results. When you use a JDBC/ODBC driver, Athena uses this port to stream the query results to the JDBC/ODBC driver installed on the client host. Therefore, unblock this port when you use a JDBC/ODBC driver to connect to Athena. If this port is blocked, your business intelligence tool might time out or fail to show query results when you run a query. Ensure that port 444 isn't blocked. If you use an AWS PrivateLink endpoint to connect to Athena, ensure that the security group attached to the AWS PrivateLink endpoint is open to inbound traffic on port 444. Athena uses port 444 to stream query results. If port 444 is blocked, then the results aren't streamed back to your client host. In such situations, you might receive an error message similar to "[Simba][AthenaJDBC](100123) An error has occurred. Exception during column initialization". This can also cause the business intelligence tool to stop responding and not display the query results. Also ensure that the security group attached to your VPC endpoint allows traffic from the host where you installed the JDBC/ODBC driver. When referring to the example below, replace the `${value}` sections of the connection URL with your actual value. Example `jdbc:awsathena://AwsRegion=${region};S3OutputLocation=s3://${output_location};MetadataRetrievalMethod=Queryjdbc:cdata:amazonathena:AwsRegion=northernvirginia;S3StagingDirectory=s3://<bucket name>;database=default;` Important `database=default;` must be included in the connection URL, as shown in the example above.
Yes	Driver Name	String	The driver class name of your Athena connection. `com.simba.athena.jdbc.Drivercdata.jdbc.amazonathena.AmazonAthenaDriver`
Yes	Port	Integer	The port number to establish a connection to the datasource. The default port is `0`
No	Limit Schemas	Option	Allows you to manage usage and restrict visibility to only the necessary schemas in the Explorer tree. See Limiting schemas to learn how to limit schemas from the Connection Management page. Note When you include a restricted schema in the query of a DQ Job, the query scope may be overwritten when the job runs. While only the schemas you selected when you set up the connection are shown in the Explorer menu, users are not restricted from running SQL queries on any schema from the data source.
No	Source Name	String	N/A
No	Target Agent	Option	The Agent that submits your Spark job for processing.
Yes	Auth Type	Option	The method to authenticate your connection. Note The configuration requirements are different depending on the Auth Type you select. See Authentication for more details on available authentication types.
No	Properties	String	The configurable driver properties for your connection. Multiple properties must be semicolon delimited. For example, abc=123;test=true

Your host can connect to Athena with either an Athena public service endpoint or an Athena private endpoint. For more information on setting the endpoint, see Command line options and Boto3 documentation.

Authentication

Select an authentication type from the dropdown menu. The options available in the dropdown menu are the currently supported authentication types for this data source.

Setting up a single role-based Athena connection using AWS EC2 roles for Instance Profile authentication

Important
This authentication option is only available for Athena CDATA driver versions 24.0.8994 and newer.

Set the Connection URL to reference the required parameters.

Copy

jdbc:cdata:amazonathena:AuthScheme=AwsEC2Roles;MetadataDiscoveryMethod=Athena;AWSRoleARN=arn:aws:iam::${AccountID}:role/${RoleName};Database=${ConnectionName};DataSource=${DBName};AWSRegion=${Region};S3StagingDirectory=${S3DirectoryPath};Workgroup=${WorkgroupName}

Tip
You can optionally include the log parameters by adding Logfile=/tmp/example-log.log;Verbosity=5; to the Connection URL to see any errors that occur with the CDATA driver.

Click Submit.

Setting up multiple role-based Athena connections using AWS EC2 roles for Instance Profile authentication

Important
This authentication option is only available for Athena CDATA driver versions 24.0.8994 and newer.

Create a cache location folder in the /tmp directory where Collibra DQ is running for each role-based Athena connection you wish to use to allow CDATA to add the Instance Profile credential file to them. Multiple role-based Athena connections cannot share the same folder in the /tmp directory.

Example
If you have 2 different role-based Athena connections, the folder for the first connection could be called /tmp/connA/ and the second could be called /tmp/connB/.

Set the Connection URL to reference the required parameters.

Copy

jdbc:cdata:amazonathena:AuthScheme=AwsEC2Roles;MetadataDiscoveryMethod=Athena;AWSRoleARN=arn:aws:iam::${AccountID}:role/${RoleName};Database=${ConnectionName};DataSource=${DBName};AWSRegion=${Region};S3StagingDirectory=${S3DirectoryPath};Workgroup=${WorkgroupName};Location=${/tmp/connection/};CredentialsLocation=${/tmp/connection/CredentialsFile.txt}

Tip
You can optionally include the log parameters by adding Logfile=/tmp/example-log.log;Verbosity=5; to the Connection URL to see any errors that occur with the CDATA driver.

Click Submit.

Required	Field	Description
Yes	Username	The username of your Athena account.
Yes	Password	The password of your Athena account.
Yes	Script	The file path containing the script file that the password manager uses to interact with and authenticate a user account. Example /tmp/keytab/athena_pwd_mgr.sh
No	Param $1	Optional. An additional parameter to authenticate your Athena connection.
No	Param $2	Optional. An additional parameter to authenticate your Athena connection.
No	Param $3	Optional. An additional parameter to authenticate your Athena connection.