Connecting to Amazon Athena

This section contains details for Amazon Athena connections.

Select an option from the dropdown menu to display information for a particular driver class.

General information

Field Description
Data source Amazon Athena
Supported versions 2.0.35.1000 21.0.8214.0
Connection string jdbc:awsathena://jdbc:cdata:amazonathena
Packaged?

No Yes

Certified?

Yes

Supported features
Estimate job

Yes

Analyze data

Yes

Schedule

Yes

Processing capabilities
Pushdown

Yes

Spark agent

Yes

Yarn agent

Yes No

Parallel JDBC

No

Java Platform version compatibility
JDK 8

Yes

JDK 11

Yes

Minimum user permissions

In order to bring your Athena data into Collibra Data Quality & Observability, you need the following permissions.

  • Read access on your Glue catalog and S3 buckets.
  • Write access on your S3 output location.
  • ROLE_ADMIN assigned to your user in Collibra DQ.
Copy
{
    "Version": "YYYY-MM-DD",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "athena:StartQueryExecution",
                "s3:ListBucketMultipartUploads",
                "athena:GetQueryResultsStream",
                "glue:GetTables",
                "glue:GetPartitions",
                "athena:GetQueryResults",
                "glue:BatchGetPartition",
                "s3:ListBucket",
                "glue:GetDatabases",
                "athena:ListQueryExecutions",
                "s3:ListMultipartUploadParts",
                "glue:GetTable",
                "glue:GetDatabase",
                "athena:GetWorkGroup",
                "s3:PutObject",
                "s3:GetObject",
                "glue:GetPartition",
                "glue:GetCatalogImportStatus",
                "athena:StopQueryExecution",
                "athena:GetQueryExecution",
                "s3:GetBucketLocation",
                "athena:BatchGetQueryExecution",
                "athena:DeletePreparedStatement",
                "athena:CreatePreparedStatement"
            ],
            "Resource": [
                "arn:aws:athena:*:<AWSAccountID>:workgroup/primary",
                "arn:aws:s3:::<S3 bucket name>/*",
                "arn:aws:s3:::<S3 bucket name>",
                "arn:aws:glue:*:<AWSAccountID>:catalog",
                "arn:aws:glue:*:<AWSAccountID>:database/<database name>",
                "arn:aws:glue:*:<AWSAccountID>:table/<database name>/*"
            ]
        }
    ]
}

Recommended and required connection properties

Required Connection Property Type Value

Yes

Name Text The unique name of your connection. Ensure that there are no spaces in your connection name.

Yes

Connection URL String

The connection string path of your Athena connection.

Use the appropriate JDBC connection URLs in your business tool configuration according to your private DNS configuration for your endpoint.

Note 
Athena's streaming API uses port 444 to stream the query results. When you use a JDBC/ODBC driver, Athena uses this port to stream the query results to the JDBC/ODBC driver installed on the client host. Therefore, unblock this port when you use a JDBC/ODBC driver to connect to Athena. If this port is blocked, your business intelligence tool might time out or fail to show query results when you run a query.

Ensure that port 444 isn't blocked. If you use an AWS PrivateLink endpoint to connect to Athena, ensure that the security group attached to the AWS PrivateLink endpoint is open to inbound traffic on port 444. Athena uses port 444 to stream query results. If port 444 is blocked, then the results aren't streamed back to your client host. In such situations, you might receive an error message similar to "[Simba][AthenaJDBC](100123) An error has occurred. Exception during column initialization". This can also cause the business intelligence tool to stop responding and not display the query results.

Also ensure that the security group attached to your VPC endpoint allows traffic from the host where you installed the JDBC/ODBC driver.

 

When referring to the example below, replace the ${value} sections of the connection URL with your actual value.

Example jdbc:awsathena://AwsRegion=${region};S3OutputLocation=s3://${output_location};MetadataRetrievalMethod=Queryjdbc:cdata:amazonathena:AwsRegion=northernvirginia;S3StagingDirectory=s3://<bucket name>;database=default;

Important database=default; must be included in the connection URL, as shown in the example above.

Yes

Driver Name String

The driver class name of your Athena connection.

com.simba.athena.jdbc.Drivercdata.jdbc.amazonathena.AmazonAthenaDriver

Yes

Port Integer

The port number to establish a connection to the datasource.

The default port is 0

No

Source Name String N/A

No

Target Agent Option The Agent that submits your Spark job for processing.

Yes

Auth Type Option

The method to authenticate your connection.

Note The configuration requirements are different depending on the Auth Type you select. See Authentication for more details on available authentication types.

No

Properties String

The configurable driver properties for your connection. Multiple properties must be comma delimited. For example, abc=123,test=true

Your host can connect to Athena with either an Athena public service endpoint or an Athena private endpoint. For more information on setting the endpoint, see Command line options and Boto3 documentation.

Authentication

Select an authentication type from the dropdown menu. The options available in the dropdown menu are the currently supported authentication types for this data source.

Setting up a single role-based Athena connection using AWS EC2 roles for Instance Profile authentication

Important 
This authentication option is only available for Athena CDATA driver versions 24.0.8994 and newer.

  1. Set the Connection URL to reference the required parameters.
  2. Copy
    jdbc:cdata:amazonathena:AuthScheme=AwsEC2Roles;MetadataDiscoveryMethod=Athena;AWSRoleARN=arn:aws:iam::${AccountID}:role/${RoleName};Database=${ConnectionName};DataSource=${DBName};AWSRegion=${Region};S3StagingDirectory=${S3DirectoryPath};Workgroup=${WorkgroupName}

    Tip 
    You can optionally include the log parameters by adding Logfile=/tmp/example-log.log;Verbosity=5; to the Connection URL to see any errors that occur with the CDATA driver.

  3. Click Submit.

Setting up multiple role-based Athena connections using AWS EC2 roles for Instance Profile authentication

Important 
This authentication option is only available for Athena CDATA driver versions 24.0.8994 and newer.

  1. Create a cache location folder in the /tmp directory where Collibra DQ is running for each role-based Athena connection you wish to use to allow CDATA to add the Instance Profile credential file to them. Multiple role-based Athena connections cannot share the same folder in the /tmp directory.
  2. Example 
    If you have 2 different role-based Athena connections, the folder for the first connection could be called /tmp/connA/ and the second could be called /tmp/connB/.

  3. Set the Connection URL to reference the required parameters.
  4. Copy
    jdbc:cdata:amazonathena:AuthScheme=AwsEC2Roles;MetadataDiscoveryMethod=Athena;AWSRoleARN=arn:aws:iam::${AccountID}:role/${RoleName};Database=${ConnectionName};DataSource=${DBName};AWSRegion=${Region};S3StagingDirectory=${S3DirectoryPath};Workgroup=${WorkgroupName};Location=${/tmp/connection/};CredentialsLocation=${/tmp/connection/CredentialsFile.txt}

    Tip 
    You can optionally include the log parameters by adding Logfile=/tmp/example-log.log;Verbosity=5; to the Connection URL to see any errors that occur with the CDATA driver.

  5. Click Submit.

Required Field Description

Yes

Username The username of your Athena account.

Yes

Password The password of your Athena account.

Yes

Script

The file path containing the script file that the password manager uses to interact with and authenticate a user account.

Example /tmp/keytab/athena_pwd_mgr.sh

No

Param $1 Optional. An additional parameter to authenticate your Athena connection.

No

Param $2 Optional. An additional parameter to authenticate your Athena connection.

No

Param $3 Optional. An additional parameter to authenticate your Athena connection.