Connecting to Amazon Athena
This section contains details for Amazon Athena connections.
Select an option from the dropdown menu to display information for a particular driver class.
General information
Field | Description |
---|---|
Data source | Amazon Athena |
Supported versions |
|
Connection string | jdbc:awsathena:// jdbc:cdata:amazonathena
|
Packaged? |
|
Certified? |
Yes |
Supported features | |
Estimate job
|
Yes |
Analyze data
|
Yes |
Schedule
|
Yes |
Processing capabilities | |
Pushdown
|
Yes |
Spark agent
|
Yes |
Yarn agent
|
|
Parallel JDBC
|
No |
Java Platform version compatibility | |
JDK 8
|
Yes |
JDK 11
|
Yes |
Minimum user permissions
In order to bring your Athena data into Collibra Data Quality & Observability, you need the following permissions.
- Read access on your Glue catalog and S3 buckets.
- Write access on your S3 output location.
- ROLE_ADMIN assigned to your user in Collibra DQ.
{
"Version": "YYYY-MM-DD",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"athena:StartQueryExecution",
"s3:ListBucketMultipartUploads",
"athena:GetQueryResultsStream",
"glue:GetTables",
"glue:GetPartitions",
"athena:GetQueryResults",
"glue:BatchGetPartition",
"s3:ListBucket",
"glue:GetDatabases",
"athena:ListQueryExecutions",
"s3:ListMultipartUploadParts",
"glue:GetTable",
"glue:GetDatabase",
"athena:GetWorkGroup",
"s3:PutObject",
"s3:GetObject",
"glue:GetPartition",
"glue:GetCatalogImportStatus",
"athena:StopQueryExecution",
"athena:GetQueryExecution",
"s3:GetBucketLocation",
"athena:BatchGetQueryExecution",
"athena:DeletePreparedStatement",
"athena:CreatePreparedStatement"
],
"Resource": [
"arn:aws:athena:*:<AWSAccountID>:workgroup/primary",
"arn:aws:s3:::<S3 bucket name>/*",
"arn:aws:s3:::<S3 bucket name>",
"arn:aws:glue:*:<AWSAccountID>:catalog",
"arn:aws:glue:*:<AWSAccountID>:database/<database name>",
"arn:aws:glue:*:<AWSAccountID>:table/<database name>/*"
]
}
]
}
Recommended and required connection properties
Required | Connection Property | Type | Value |
---|---|---|---|
Yes |
Name | Text | The unique name of your connection. Ensure that there are no spaces in your connection name. |
Yes |
Connection URL | String |
The connection string path of your Athena connection. Use the appropriate JDBC connection URLs in your business tool configuration according to your private DNS configuration for your endpoint.
Note
When referring to the example below, replace the Example Important |
Yes |
Driver Name | String |
The driver class name of your Athena connection.
|
Yes |
Port | Integer |
The port number to establish a connection to the datasource. The default port is |
No |
Source Name | String | N/A |
No |
Target Agent | Option | The Agent that submits your Spark job for processing. |
Yes |
Auth Type | Option |
The method to authenticate your connection. Note The configuration requirements are different depending on the Auth Type you select. See Authentication for more details on available authentication types. |
No |
Properties | String |
The configurable driver properties for your connection. Multiple properties must be comma delimited. For example, abc=123,test=true |
Your host can connect to Athena with either an Athena public service endpoint or an Athena private endpoint. For more information on setting the endpoint, see Command line options and Boto3 documentation.
Authentication
Select an authentication type from the dropdown menu. The options available in the dropdown menu are the currently supported authentication types for this data source.
Setting up a single role-based Athena connection using AWS EC2 roles for Instance Profile authentication
Important
This authentication option is only available for Athena CDATA driver versions 24.0.8994 and newer.
- Set the Connection URL to reference the required parameters.
- Click Submit.
jdbc:cdata:amazonathena:AuthScheme=AwsEC2Roles;MetadataDiscoveryMethod=Athena;AWSRoleARN=arn:aws:iam::${AccountID}:role/${RoleName};Database=${ConnectionName};DataSource=${DBName};AWSRegion=${Region};S3StagingDirectory=${S3DirectoryPath};Workgroup=${WorkgroupName}
Tip
You can optionally include the log parameters by adding Logfile=/tmp/example-log.log;Verbosity=5;
to the Connection URL to see any errors that occur with the CDATA driver.
Setting up multiple role-based Athena connections using AWS EC2 roles for Instance Profile authentication
Important
This authentication option is only available for Athena CDATA driver versions 24.0.8994 and newer.
- Create a cache location folder in the /tmp directory where Collibra DQ is running for each role-based Athena connection you wish to use to allow CDATA to add the Instance Profile credential file to them. Multiple role-based Athena connections cannot share the same folder in the /tmp directory.
- Set the Connection URL to reference the required parameters.
- Click Submit.
Example
If you have 2 different role-based Athena connections, the folder for the first connection could be called /tmp/connA/ and the second could be called /tmp/connB/.
jdbc:cdata:amazonathena:AuthScheme=AwsEC2Roles;MetadataDiscoveryMethod=Athena;AWSRoleARN=arn:aws:iam::${AccountID}:role/${RoleName};Database=${ConnectionName};DataSource=${DBName};AWSRegion=${Region};S3StagingDirectory=${S3DirectoryPath};Workgroup=${WorkgroupName};Location=${/tmp/connection/};CredentialsLocation=${/tmp/connection/CredentialsFile.txt}
Tip
You can optionally include the log parameters by adding Logfile=/tmp/example-log.log;Verbosity=5;
to the Connection URL to see any errors that occur with the CDATA driver.
Required | Field | Description |
---|---|---|
Yes |
Username | The username of your Athena account. |
Yes |
Password | The password of your Athena account. |
Yes |
Script |
The file path containing the script file that the password manager uses to interact with and authenticate a user account. Example /tmp/keytab/athena_pwd_mgr.sh |
No |
Param $1 | Optional. An additional parameter to authenticate your Athena connection. |
No |
Param $2 | Optional. An additional parameter to authenticate your Athena connection. |
No |
Param $3 | Optional. An additional parameter to authenticate your Athena connection. |