Connecting to Amazon Athena

This section contains details for Amazon Athena connections.

Select an option from the dropdown menu to display information for a particular driver class.

General information

Field Description
Data source Amazon Athena
Supported versions 2.0.35.1000 21.0.8214.0
Connection string jdbc:awsathena://jdbc:cdata:amazonathena
Packaged?

No Yes

Certified?

Yes

Supported features
Estimate job

Yes

Analyze data

Yes

Schedule

Yes

Processing capabilities
Pushdown

Yes

Spark agent

Yes

Yarn agent

Yes No

Parallel JDBC

No

Java Platform version compatibility
JDK 8

Yes

JDK 11

Yes

Minimum user permissions

In order to bring your Athena data into Collibra Data Quality & Observability, you need the following permissions.

  • Read access on your Glue catalog and S3 buckets.
  • Write access on your S3 output location.
  • ROLE_ADMIN assigned to your user in Collibra DQ.
Copy
{
    "Version": "YYYY-MM-DD",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "athena:StartQueryExecution",
                "s3:ListBucketMultipartUploads",
                "athena:GetQueryResultsStream",
                "glue:GetTables",
                "glue:GetPartitions",
                "athena:GetQueryResults",
                "glue:BatchGetPartition",
                "s3:ListBucket",
                "glue:GetDatabases",
                "athena:ListQueryExecutions",
                "s3:ListMultipartUploadParts",
                "glue:GetTable",
                "glue:GetDatabase",
                "athena:GetWorkGroup",
                "s3:PutObject",
                "s3:GetObject",
                "glue:GetPartition",
                "glue:GetCatalogImportStatus",
                "athena:StopQueryExecution",
                "athena:GetQueryExecution",
                "s3:GetBucketLocation",
                "athena:BatchGetQueryExecution",
                "athena:DeletePreparedStatement",
                "athena:CreatePreparedStatement"
            ],
            "Resource": [
                "arn:aws:athena:*:<AWSAccountID>:workgroup/primary",
                "arn:aws:s3:::<S3 bucket name>/*",
                "arn:aws:s3:::<S3 bucket name>",
                "arn:aws:glue:*:<AWSAccountID>:catalog",
                "arn:aws:glue:*:<AWSAccountID>:database/<database name>",
                "arn:aws:glue:*:<AWSAccountID>:table/<database name>/*"
            ]
        }
    ]
}

Recommended and required connection properties

Required Connection Property Type Value

Yes

Name Text The unique name of your connection. Ensure that there are no spaces in your connection name.

Yes

Connection URL String

The connection string path of your Athena connection.

Use the appropriate JDBC connection URLs in your business tool configuration according to your private DNS configuration for your endpoint.

Note 
Athena's streaming API uses port 444 to stream the query results. When you use a JDBC/ODBC driver, Athena uses this port to stream the query results to the JDBC/ODBC driver installed on the client host. Therefore, unblock this port when you use a JDBC/ODBC driver to connect to Athena. If this port is blocked, your business intelligence tool might time out or fail to show query results when you run a query.

Ensure that port 444 isn't blocked. If you use an AWS PrivateLink endpoint to connect to Athena, ensure that the security group attached to the AWS PrivateLink endpoint is open to inbound traffic on port 444. Athena uses port 444 to stream query results. If port 444 is blocked, then the results aren't streamed back to your client host. In such situations, you might receive an error message similar to "[Simba][AthenaJDBC](100123) An error has occurred. Exception during column initialization". This can also cause the business intelligence tool to stop responding and not display the query results.

Also ensure that the security group attached to your VPC endpoint allows traffic from the host where you installed the JDBC/ODBC driver.

 

When referring to the example below, replace the value between the { } in the ${value} sections of the connection URL with your actual value.

Example jdbc:awsathena://AwsRegion=${region};S3OutputLocation=s3://${output_location};MetadataRetrievalMethod=Queryjdbc:cdata:amazonathena:AwsRegion=northernvirginia;S3StagingDirectory=s3://<bucket name>;database=default;

Important database=default; must be included in the connection URL, as shown in the example above.

Yes

Driver Name String

The driver class name of your Athena connection.

com.simba.athena.jdbc.Drivercdata.jdbc.amazonathena.AmazonAthenaDriver

Yes

Port Integer

The port number to establish a connection to the datasource.

The default port is 0

No

Source Name String N/A

No

Target Agent Option The Agent that submits your Spark job for processing.

Yes

Auth Type Option

The method to authenticate your connection.

Note The configuration requirements are different depending on the Auth Type you select. See Authentication for more details on available authentication types.

No

Driver Properties String

The configurable driver properties for your connection. Multiple properties must be comma delimited. For example, abc=123,test=true

Your host can connect to Athena with either an Athena public service endpoint or an Athena private endpoint. For more information on setting the endpoint, see Command line options and Boto3 documentation.

Authentication

This data source supports the following authentication type:

Username/Password

Required Field Description

Yes

Username

The Access Key of your Athena service account.

Yes

Password The Secret Key of your Athena service account.

Tip To use Instance Profile, leave the Username and Password input fields blank.