Prepare S3 file system for Edge

Collibra relies on AWS Glue and AWS Identity and Access Management (IAM) to ingest and synchronize data from Amazon S3.

AWS Glue

AWS Glue performs extract-transform-load (ETL) processes on the data stored in data sources such as Amazon S3. Collibra only uses AWS Glue to ingest data from Amazon S3. All other features such as crawling other data sources or ETL processes are not integrated.

For more information about AWS Glue, go to the AWS Glue documentation.

AWS Glue has the following components:

Although you need an AWS account, you don't have to work in AWS Glue directly because Collibra does everything for you.

AWS Identity and Access Management

Collibra uses the AWS Identity and Access Management (IAM) service to manage access to Amazon S3 and AWS Glue. Similar to AWS Glue, you need an AWS account to use the IAM service. Once you have set up the required user and role, you don't have to work directly with IAM.

For more information about IAM, go to the AWS IAM documentation.

You need two things in IAM:

Supported authentication types

Before you integrate an S3 file system via Edge, you need to prepare Amazon S3 by creating the required roles and permissions.

Two types of authentication are supported for Amazon S3: IAM or EC2. The preparations in S3 depend on the authentication type you want to use.

Each authentication type has different requirements and steps.

Prerequisites

IAM

EC2

Steps

To set up IAM or EC2 with role-based authentication, complete the following steps.

  1. Go to AWS Identity and Access Management.
  2. Create the programmatic user that you want to use to connect to AWS.
    1. During the user creation process, attach the permission policy: AWSGlueServiceRole.
    2. After user creation, open the user details and create an inline policy.
    3. Add the following JSON content for the inline policy:
      Copy
      {
          "Version": "2012-10-17",
          "Statement": 
          [
              {
                  "Sid": "VisualEditor0",
                  "Effect": "Allow",
                  "Action": "iam:PassRole",
                  "Resource": "*"
              }
          ]
      }
    4. Use the following name for the inline policy: pass_role.

    When you create the AWS connection in Collibra, you need the programmatic user credentials and access keys. For information on access keys, go to the IAM documentation.

  3. Create an IAM role. You will need to use this IAM role when you add a capability in Collibra.
    1. During the role creation process, add permission policy: AWSGlueServiceRole.
    2. (Optional) If you also want to access a private S3 bucket, add an additional permission policy: AmazonS3ReadOnlyAccess.
    3. Open the newly created role, and in Trust relationships, check that glue.amazonaws.com is added as a trust policy. This should have been added automatically based on the permission policy AWSGlueServiceRole.

  4. If you have enabled Data Lake Formation, complete additional steps.

Note EC2 has been validated only for bundled K3s installations of Edge.

If you use K3s-bundled Edge on an AWS EC2 instance that is configured with role-based authentication, you can connect to Amazon S3 without an access key ID and secret access key. Use the following steps to configure role-based Amazon S3 access control.

  1. Go to AWS Identity and Access Management.
  2. Create an IAM role. You will need to use this IAM role when you add a capability in Collibra.
    1. During the role creation process, add the permission policy: AWSGlueServiceRole.
    2. (Optional) If you also want to access a private S3 bucket, add an additional permission policy: AmazonS3ReadOnlyAccess.
    3. Open the newly created role, and in Trust relationships, check that glue.amazonaws.com is added as a trust policy. This should have been added automatically based on the permission policy AWSGlueServiceRole.

    4. After the creation, open the user details and click Add permissions to create an inline policy.

    5. Use the following JSON content for the inline policy:
      Copy
      {
          "Version": "2012-10-17",
          "Statement": 
          [
              {
                  "Sid": "VisualEditor0",
                  "Effect": "Allow",
                  "Action": "iam:PassRole",
                  "Resource": "*"
              }
          ]
      }
    6. Use the following name for the inline policy: pass_role.
  3. In the Amazon EC2 console, attach the IAM role you created to the Amazon EC2 instance.
    • Only if the credentials in the Amazon EC2 instance cannot be used to authenticate, you can create a credentials file instead and save it in the user_home/.aws/ folder. The credentials file should look like this:
      [default]
      aws_access_key_id = <access key ID>
      aws_secret_access_key = <secret access key>

      For more information, see the AWS developer guide.

      Warning Do not use a credentials file unless absolutely necessary.

What's next

You can now go to Collibra to register your AWS regions and prepare your Edge site to continue with the Amazon S3 integration. See steps in Integrate an Amazon S3 file system.