Deploying Collibra Unstructured AI infrastructure

Deploying Unstructured AI infrastructure involves setting up the necessary resources in your cloud environment to support advanced AI applications. This page provides instructions for deploying Collibra Unstructured AI infrastructure on Amazon Web Services (AWS). It guides you through the necessary prerequisites, configuration, deployment steps, and troubleshooting, ensuring a successful setup.

Prerequisites

Before deployment, ensure the following resources and configurations are in place.

Expected final state

Resource Description Example value
AWS Account Dedicated subaccount recommended. 006352514257
IAM Role Terraform provisioner role with required permissions. UnstructuredTerraformProvisioner
S3 Bucket For Terraform state file. unstructured-tf-state
Route53 Hosted Zone DNS zone for ingress records. env-id.company.com
ACM Certificate TLS certificate for application domain (must be in ISSUED status). app.env-id.company.com
Cognito User Pool User authentication (app client must have no client secret). us-east-1_xxxxxxxx
Service-Linked Roles AWS service roles for EKS, RDS, and Auto Scaling. See Service-linked roles.

Required tools

The following tools must be installed on your local machine:

  • Terraform version v1.12.2 or newer.
  • AWS CLI version 2.23.6 or newer.
  • kubectl for cluster verification.
  • Helm for troubleshooting Helm releases.
  • Go for the configuration tool.

Optional AWS subaccount setup

It is recommended to deploy Collibra Unstructured AI infrastructure in a dedicated AWS subaccount for better resource isolation and management. If you choose to set up a subaccount, follow the AWS Organizations documentation to create a new account under your organization.

IAM provisioner role

Create an IAM role, such as UnstructuredTerraformProvisioner, that Terraform will assume. The role must:

  1. Trust your AWS account (or the specific user or role running Terraform):
    Copy
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::<YOUR_ACCOUNT_ID>:root"
          },
          "Action": "sts:AssumeRole"
        }
      ]
    }
  2. Have the following inline policy attached:

    Important Update the S3 statement Resource ARNs to match the actual Terraform state bucket name.

    Note The CoreInfrastructureAndNetworking and AllowCreationWithProjectTag statements use tag-based conditions. The Terraform AWS provider is configured with default_tags to automatically tag all resources with Project = "unstructured". For BYOVPC deployments, you must also tag your existing VPC and subnets with Project = "unstructured". See BYOVPC requirements for more information.

    View an example of the IAM policy JSON

    Copy
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "CoreInfrastructureAndNetworking",
                "Effect": "Allow",
                "Action": [
                    "ec2:*",
                    "elasticloadbalancing:*",
                    "eks:*",
                    "rds:*",
                    "secretsmanager:*"
                ],
                "Resource": "*",
                "Condition": {
                  "StringEquals": {
                    "aws:ResourceTag/Project": "unstructured"
                  }
                }
            },
            {
                "Sid": "AllowCreationWithProjectTag",
                "Effect": "Allow",
                "Action": [
                    "ec2:*",
                    "elasticloadbalancing:*",
                    "eks:*",
                    "rds:*",
                    "secretsmanager:*"
                ],
                "Resource": "*",
                "Condition": {
                  "StringEquals": {
                    "aws:RequestTag/Project": "unstructured"
                  }
                }
            },
            {
                "Sid": "GlobalDiscoveryActions",
                "Effect": "Allow",
                "Action": [
                    "ec2:Describe*",
                    "eks:Describe*",
                    "eks:List*",
                    "iam:Get*",
                    "iam:List*",
                    "kms:List*",
                    "kms:Describe*",
                    "rds:Describe*",
                    "secretsmanager:ListSecrets",
                    "ssm:GetParameter"
                ],
                "Resource": "*"
            },
            {
                "Sid": "IAM",
                "Effect": "Allow",
                "Action": [
                    "sts:AssumeRole",
                    "iam:CreateOpenIDConnectProvider",
                    "iam:DeleteOpenIDConnectProvider",
                    "iam:TagOpenIDConnectProvider",
                    "iam:CreateServiceLinkedRole"
                ],
                "Resource": "*"
            },
            {
                "Sid": "IAMLimited",
                "Effect": "Allow",
                "Action": [
                    "iam:*"
                ],
                "Resource": [
                    "arn:aws:iam::*:role/unstructured-*",
                    "arn:aws:iam::*:role/KarpenterController-*",
                    "arn:aws:iam::*:policy/unstructured-*",
                    "arn:aws:iam::*:policy/KarpenterController-*",
                    "arn:aws:iam::*:instance-profile/unstructured-*"
                ]
            },
            {
                "Sid": "Misc",
                "Effect": "Allow",
                "Action": [
                    "ec2:*LaunchTemplate*",
                    "ec2:RunInstances",
                    "ec2:DisassociateAddress",
                    "iam:PassRole",
                    "autoscaling:*",
                    "acm:*",
                    "cognito-idp:DescribeUserPoolClient",
                    "wafv2:*",
                    "waf-regional:*",
                    "shield:*",
                    "route53:*",
                    "ecr:GetAuthorizationToken",
                    "elasticloadbalancing:*",
                    "sqs:*",
                    "events:*"
                ],
                "Resource": "*"
            },
            {
                "Sid": "ECRCrossAccountPull",
                "Effect": "Allow",
                "Action": [
                    "ecr:BatchGetImage",
                    "ecr:GetDownloadUrlForLayer",
                    "ecr:BatchCheckLayerAvailability",
                    "ecr:DescribeRepositories",
                    "ecr:ListImages",
                    "ecr:DescribeImages"
                ],
                "Resource": "arn:aws:ecr:us-east-1:139228973453:repository/release/*"
            },
            {
                "Sid": "S3",
                "Effect": "Allow",
                "Action": [
                    "s3:ListBucket",
                    "s3:GetBucketLocation",
                    "s3:GetObject",
                    "s3:PutObject",
                    "s3:DeleteObject"
                ],
                "Resource": [
                    "arn:aws:s3:::unstructured-tf-state",
                    "arn:aws:s3:::unstructured-tf-state/*"
                ]
            },
            {
                "Sid": "KMSKeyManagement",
                "Effect": "Allow",
                "Action": [
                    "kms:CreateKey",
                    "kms:DescribeKey",
                    "kms:GetKeyPolicy",
                    "kms:PutKeyPolicy",
                    "kms:TagResource",
                    "kms:ScheduleKeyDeletion",
                    "kms:ListResourceTags",
                    "kms:CreateAlias",
                    "kms:DeleteAlias",
                    "kms:ListAliases",
                    "kms:ListKeys"
                ],
                "Resource": "*"
            },
            {
                "Sid": "CloudWatchLogsManagement",
                "Effect": "Allow",
                "Action": [
                    "logs:CreateLogGroup",
                    "logs:DescribeLogGroups",
                    "logs:ListTagsForResource",
                    "logs:TagResource",
                    "logs:PutRetentionPolicy",
                    "logs:DeleteLogGroup"
                ],
                "Resource": "*"
            }
        ]
    }

Service-linked roles

AWS service-linked roles must exist in your account before deployment. The IAM policy for the provisioner role scopes iam:* to unstructured-* resources, so it cannot create service-linked roles, which exist under aws-service-role/*.

Create service-linked roles manually before your first deployment:

Copy
aws iam create-service-linked-role --aws-service-name eks.amazonaws.com
aws iam create-service-linked-role --aws-service-name eks-nodegroup.amazonaws.com
aws iam create-service-linked-role --aws-service-name rds.amazonaws.com
aws iam create-service-linked-role --aws-service-name autoscaling.amazonaws.com
aws iam create-service-linked-role --aws-service-name elasticloadbalancing.amazonaws.com

Note If a service-linked role already exists, the command returns an error. You can safely ignore this error.

Instructions to create prerequisite resources

S3 backend for Terraform state

Create an S3 bucket for storing Terraform state:

Copy
aws s3api create-bucket \
  --bucket unstructured-tf-state \
  --region us-east-1

aws s3api put-bucket-encryption \
  --bucket unstructured-tf-state \
  --server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"AES256"}}]}'

Route53 hosted zone

A Route53 hosted zone must exist for the domain you plan to use. Terraform creates DNS records in this zone, but it does not create the zone itself.

Copy
aws route53 create-hosted-zone \
  --name env-id.company.com \
  --caller-reference "$(date +%s)"

After you create the zone, delegate your domain by updating the nameservers of your registrar to the ones returned by Route53.

ACM certificate

An ACM certificate must be issued for the application domain. It must be in the same region as your deployment and in ISSUED status.

Copy
aws acm request-certificate \
  --domain-name app.env-id.company.com \
  --validation-method DNS \
  --region us-east-1

Complete DNS validation by adding the CNAME record to your Route53 zone. Terraform handles the association of this certificate with the ingress load balancer.

Cognito user pool

Configure AWS Cognito for application authentication. This includes user pool creation, app client settings, custom user attributes, and optional SSO with a federated identity provider.

Prerequisites

  • AWS account with permissions to create and manage Amazon Cognito resources.
  • AWS CLI v2 installed and configured (optional if you use CLI commands).
  • The application URL (for example, https://app.unstructured.<your domain>.com).

Expected final state

Resource Description Example value
User Pool Cognito user pool for authentication. unstructured-ai-pool
App Client Public app client (no client secret). unstructured-frontend
Custom Attributes User properties for roles and tenancy. custom:permission_level, custom:tenant_name, custom:group_memberships
Admin User Initial admin user with permanent password. Username: admin, permission level: admin
Hosted UI Domain (Optional) Cognito domain for OAuth/SSO (if you use SSO). your-company.auth.us-east-1.amazoncognito.com
SAML Identity Provider (Optional) Federated IDP for SSO (if you use SSO). Provider name: SAML

1. Create a user pool

Using the AWS console
  1. Navigate to the AWS Console.
  2. Enter "Cognito" in the search bar.
  3. Click Create user pool.
  4. Define your application:
    • Select Single-page application (SPA).
    • Name your application.
  5. Configure options:
    • For sign-in identifiers, select Username and Email.
    • For self-registration, clear Enable self-registration.
    • For required attributes for sign-up, select email in the dropdown menu.
  6. Skip the return URL section. You only need this for SSO, and you can configure it later if required.
  7. Click Create User Directory.
  8. On the next page, click Go to overview. You will land on the user pool overview page. Note the following values, that are required for application configuration:
    ValueWhere to findDescription
    User Pool IDUser pool overview page.Unique identifier (for example, us-east-1_AbCdEfGhI).
    App Client IDUser pool overview page, or App clients tab.Client identifier (for example, 7649pb0etcv84u0rskudhd5pel).
    AWS RegionVisible in the URL bar or User Pool ARN.Region where the pool was created (for example, us-east-1).

    Note The wizard automatically creates an app client. You will configure its settings in the Configure app client step.

  9. Add custom attributes: In your user pool, navigate to Sign-up experience (under Authentication in the left panel). Scroll to Custom attributes. Click Add custom attributes and enter the following:
    Attribute nameTypeMutableDescription
    custom:permission_levelStringYesUser's role: admin, contributor, or viewer.
    custom:tenant_nameStringYesTenant ID for the organization of the user.
    custom:group_membershipsStringYesComma-separated list of group names.

    Important You cannot rename or delete custom attributes after creation. Verify the attribute names match exactly as shown.

View the CLI equivalent

Copy
# Set your desired region
export AWS_REGION="us-east-1"
export POOL_NAME="unstructured-ai-pool"

# Create the user pool with password policy and email configuration
aws cognito-idp create-user-pool \
  --pool-name "$POOL_NAME" \
  --username-attributes email \
  --auto-verified-attributes email \
  --schema \
    Name=email,Required=true,Mutable=true \
    Name=given_name,Required=false,Mutable=true \
    Name=family_name,Required=false,Mutable=true \
  --policies '{
    "PasswordPolicy": {
      "MinimumLength": 8,
      "RequireUppercase": true,
      "RequireLowercase": true,
      "RequireNumbers": true,
      "RequireSymbols": true
    }
  }' \
  --admin-create-user-config '{
    "AllowAdminCreateUserOnly": true
  }' \
  --user-attribute-update-settings '{
    "AttributesRequireVerificationBeforeUpdate": ["email"]
  }' \
  --region "$AWS_REGION"

# Save the User Pool ID from the output
export USER_POOL_ID="<user-pool-id-from-output>"

# Add custom attributes
aws cognito-idp add-custom-attributes \
  --user-pool-id "$USER_POOL_ID" \
  --custom-attributes \
    Name=permission_level,AttributeDataType=String,Mutable=true \
    Name=tenant_name,AttributeDataType=String,Mutable=true \
    Name=group_memberships,AttributeDataType=String,Mutable=true \
  --region "$AWS_REGION"

# Create the app client (SPA — no client secret)
aws cognito-idp create-user-pool-client \
  --user-pool-id "$USER_POOL_ID" \
  --client-name "unstructured-frontend" \
  --no-generate-secret \
  --explicit-auth-flows \
    ALLOW_USER_SRP_AUTH \
    ALLOW_REFRESH_TOKEN_AUTH \
  --region "$AWS_REGION"

# Save the Client ID from the output
export CLIENT_ID="<client-id-from-output>"

Note The create-user-pool CLI command does not support setting the feature plan (Essentials) or selecting the SPA app type. The CLI creates a standard user pool, and the app client is created separately. After creation, you can verify settings in the AWS Console.

2. Configure app client

The app client was automatically created during user pool creation. This section covers verifying and updating its settings.

Using the AWS console
  1. In your user pool, go to the App clients tab.
  2. Click the app client created during the wizard. It has the name you provided previously.
App client information

Verify the following settings:

Setting Expected value
App client name The name you provided during user pool creation.
Client secret - (no client secret for the SPA app type).
Authentication flows

Verify that the following authentication flows are enabled. If they are not, click Edit and enable them:

Authentication flow API name Required Description
Sign in with secure remote password (SRP) ALLOW_USER_SRP_AUTH Yes Standard username and password authentication.
Get new user tokens from existing authenticated sessions ALLOW_REFRESH_TOKEN_AUTH Yes Enables session continuity and token refresh.

Note You do not need other authentication flows such as ALLOW_USER_PASSWORD_AUTH, ALLOW_ADMIN_USER_PASSWORD_AUTH, ALLOW_CUSTOM_AUTH, or ALLOW_USER_AUTH. You can leave them cleared unless you have a specific need for them.

Authentication flow session duration
Setting Recommended value
Authentication flow session duration 3 minutes.
Token expiration
Token Recommended value Notes
Refresh token expiration 5 days. Used for automatic session renewal without re-login.
Access token expiration 60 minutes. Short-lived for security.
ID token expiration 60 minutes. Used by the application for API requests and role resolution.

Adjust token lifetimes according to the security requirements of your organization. Shorter refresh token expiration increases security but requires more frequent re-authentication.

Advanced authentication settings
Setting Value
Enable token revocation Yes (allows invalidating tokens on sign-out).
Enable prevent user existence errors Yes (returns generic error messages to prevent username enumeration attacks).
Attribute read and write permissions

The app client needs Read and Write access to the attributes listed below. Click Edit under the attribute permissions section to verify and update the settings.

Required custom attributes:

Attribute Read Write Used for
custom:permission_level Yes Yes Role-based access control (admin, contributor, or viewer).
custom:tenant_name Yes Yes Tenant ID for tenant identification.
custom:group_memberships Yes Yes Group-based access control.

Required standard attributes:

Attribute Read Write Used for
email Yes Yes User identification and communication.

Recommended standard attributes:

Attribute Read Write Used for
given_name Yes Yes User's first name shown in the UI.
family_name Yes Yes User's last name shown in the UI.

The application does not actively use other standard attributes such as address, birthdate, gender, locale, middle_name, name, nickname, or phone_number. You can leave them at defaults or disable them according to your requirements.

  1. Click Save changes after verifying or updating the settings.

Important The app client must not have a client secret. The application is a browser-based SPA that uses Secure Remote Password (SRP) authentication, which requires a public client. The SPA app type selected during user pool creation ensures this by default.

View the CLI equivalent

Copy
export USER_POOL_ID="<your-user-pool-id>"
export CLIENT_ID="<client-id-from-pool-creation>"

# Update the existing app client settings
aws cognito-idp update-user-pool-client \
  --user-pool-id "$USER_POOL_ID" \
  --client-id "$CLIENT_ID" \
  --no-generate-secret \
  --explicit-auth-flows \
    ALLOW_USER_SRP_AUTH \
    ALLOW_REFRESH_TOKEN_AUTH \
  --auth-session-validity 3 \
  --token-validities '{
    "AccessTokenValidity": 60,
    "IdTokenValidity": 60,
    "RefreshTokenValidity": 5
  }' \
  --token-validity-units '{
    "AccessToken": "minutes",
    "IdToken": "minutes",
    "RefreshToken": "days"
  }' \
  --enable-token-revocation \
  --prevent-user-existence-errors ENABLED \
  --read-attributes \
    email \
    given_name \
    family_name \
    custom:permission_level \
    custom:tenant_name \
    custom:group_memberships \
  --write-attributes \
    email \
    given_name \
    family_name \
    custom:permission_level \
    custom:tenant_name \
    custom:group_memberships \
  --region "$AWS_REGION"

Note The CLI command uses update-user-pool-client (not create) because the client already exists. When you use this command, you must specify all settings. Any omitted values will be reset to defaults.

3. Create the initial admin user

Create your first admin user. This user will have full access to manage users, groups, and settings through the application.

Using the AWS console
  1. In your user pool, go to Users.
  2. Click Create user.
  3. Enter the details:
    FieldValue
    User nameadmin (or your preferred admin username).
    Email IDAdmin's email ID.
    Mark email as verifiedYes.
    Temporary passwordSet a temporary password.
  4. Click Create user.
  5. After creation, go to the user detail page.
  6. Navigate to User attributes.
  7. Click Edit.
  8. Set custom attributes:
    AttributeValue
    custom:permission_leveladmin.
    custom:tenant_nameYour tenant ID (for example, default).
    custom:group_membershipsLeave empty (or set initial groups).
  9. Set a permanent password to move the user out of FORCE_CHANGE_PASSWORD status. See the CLI command below for an example.

View the CLI equivalent

Copy
export USER_POOL_ID="<your-user-pool-id>"
export ADMIN_USERNAME="admin"
export ADMIN_EMAIL="[email protected]"
export TENANT_NAME="default"
export TEMP_PASSWORD="TempPassword123!"
export PERMANENT_PASSWORD="YourSecurePassword123!"

# Create the user with custom attributes
aws cognito-idp admin-create-user \
  --user-pool-id "$USER_POOL_ID" \
  --username "$ADMIN_USERNAME" \
  --user-attributes \
    Name=email,Value="$ADMIN_EMAIL" \
    Name=email_verified,Value=true \
    Name=given_name,Value=Admin \
    Name=family_name,Value=User \
    Name=custom:permission_level,Value=admin \
    Name=custom:tenant_name,Value="$TENANT_NAME" \
  --temporary-password "$TEMP_PASSWORD" \
  --region "$AWS_REGION"

# Set a permanent password (moves user out of FORCE_CHANGE_PASSWORD)
aws cognito-idp admin-set-user-password \
  --user-pool-id "$USER_POOL_ID" \
  --username "$ADMIN_USERNAME" \
  --password "$PERMANENT_PASSWORD" \
  --permanent \
  --region "$AWS_REGION"

4. Optional SSO setup

4.1 Configure hosted ID domain

If you plan to use SSO with a federated identity provider, you must configure a hosted UI domain.

Using the AWS console
  1. In your user pool, go to Branding, then Domain in the left panel.
  2. Under Cognito domain, click Edit (or Create Cognito domain if none exists).
  3. Enter a domain prefix (for example, unstructured-ai-sso). This creates a domain in the following format:
    https://<your-prefix>.auth.<region>.amazoncognito.com

    For example: https://unstructured-ai-sso.auth.us-east-1.amazoncognito.com

  4. For Branding version, select Hosted UI (classic).
  5. Click Save.

    Note: If you prefer to use your own domain, for example, auth.yourcompany.com, use the Custom domain section instead. This requires an ACM certificate in us-east-1. For most deployments, the Cognito domain is sufficient.

  6. Callback URLs: The application manages OAuth redirect URLs in its own configuration via environment variables. See Section 5. You do not need to configure callback or sign-out URLs on the Cognito app client.
  7. Note the Cognito domain, as it is required for application configuration:
    ValueExample
    Cognito domain prefixunstructured-ai-sso.
    Full domain URLhttps://unstructured-ai-sso.auth.us-east-1.amazoncognito.com.

View the CLI equivalent

Copy
export COGNITO_DOMAIN_PREFIX="unstructured-ai-sso"

# Create the Cognito hosted UI domain
aws cognito-idp create-user-pool-domain \
  --user-pool-id "$USER_POOL_ID" \
  --domain "$COGNITO_DOMAIN_PREFIX" \
  --region "$AWS_REGION"
4.2 Add a SAML identity provider

To enable "Login with SSO" through a SAML 2.0 identity provider, such as Okta, Azure AD, OneLogin, or PingFederate:

Using the AWS console
  1. In your user pool, go to Authentication, then Social and external providers in the left panel.
  2. Click Add identity provider.
  3. Select SAML.
  4. Configure the provider:
    SettingValue
    Provider nameSAML.
    Metadata sourceUpload metadata file or provide metadata URL from your IDP.
  5. Configure Attribute mapping:
    IDP attributeCognito attribute
    emailemail.
    namename.
  6. Click Save.

Important The provider name must be SAML. The application uses this name when it initiates SSO login redirects.

View the CLI equivalent using metadata URL

Copy
export IDP_METADATA_URL="https://your-idp.example.com/metadata.xml"

aws cognito-idp create-identity-provider \
  --user-pool-id "$USER_POOL_ID" \
  --provider-name "SAML" \
  --provider-type SAML \
  --provider-details '{
    "MetadataURL": "'"$IDP_METADATA_URL"'"
  }' \
  --attribute-mapping '{
    "email": "email",
    "name": "name"
  }' \
  --region "$AWS_REGION"

View the CLI equivalent using metadata file

Copy
export IDP_METADATA_FILE="/path/to/metadata.xml"

METADATA_CONTENT=$(cat "$IDP_METADATA_FILE")

aws cognito-idp create-identity-provider \
  --user-pool-id "$USER_POOL_ID" \
  --provider-name "SAML" \
  --provider-type SAML \
  --provider-details '{
    "MetadataFile": "'"$(echo "$METADATA_CONTENT" | sed 's/"/\\"/g' | tr -d '\n')"'"
  }' \
  --attribute-mapping '{
    "email": "email",
    "name": "name"
  }' \
  --region "$AWS_REGION"
4.3 Configure your identity provider

In your identity provider, create a SAML application with the following settings:

Setting Value
SSO URL / ACS URL https://<your-cognito-domain>/saml2/idpresponse.
Audience URI / Entity ID urn:amazon:cognito:sp:<your-user-pool-ID>.
Name ID format EmailAddress or Persistent.

Okta-specific setup

  1. In the Okta Admin Console, go to Applications.
  2. Click Create App Integration.
  3. Select SAML 2.0.
  4. Click Next.
  5. Set:
    • Single sign-on URL: https://<your-cognito-domain>/saml2/idpresponse.
    • Audience URI (SP Entity ID): urn:amazon:cognito:sp:<your-user-pool-ID>.
  6. Under Attribute Statements, add:
    • emailuser.email.
    • nameuser.displayName.
  7. Complete the wizard. Note the Metadata URL from the Sign On tab, as you will need this for the Cognito SAML provider configuration.
4.4 Enable the IDP on the app client
Using the AWS console
  1. Go to Applications, then App clients.
  2. Click your app client.
  3. Select the Login pages tab.
  4. Click Edit.
  5. Under Identity providers, enable both:
    • Cognito user pool (for username and password login).
    • SAML (for SSO login).
  6. Click Save changes.

View the CLI equivalent

Copy
aws cognito-idp update-user-pool-client \
  --user-pool-id "$USER_POOL_ID" \
  --client-id "$CLIENT_ID" \
  --no-generate-secret \
  --explicit-auth-flows \
    ALLOW_USER_SRP_AUTH \
    ALLOW_REFRESH_TOKEN_AUTH \
  --supported-identity-providers COGNITO SAML \
  --allowed-o-auth-flows code \
  --allowed-o-auth-scopes openid profile email \
  --allowed-o-auth-flows-user-pool-client \
  --region "$AWS_REGION"

When you use update-user-pool-client, you must re-specify all existing settings because the command replaces the full client configuration.

Deployment scenarios

Standard deployment

In a standard deployment, Terraform creates all networking resources, such as VPC, subnets, NAT gateways, internet gateway, and route tables. The ingress load balancer is internet-facing.

BYOVPC (bring your own VPC)

In a BYOVPC deployment, you provide an existing VPC and subnets. Terraform skips networking creation and deploys directly into your infrastructure. The ingress load balancer is automatically set to internal, meaning the application is only accessible via private network connectivity, for example, VPN, Direct Connect, or peering.

Note VPN or network connectivity is your responsibility and is managed outside of this Terraform deployment.

BYOVPC requirements

Your VPC and subnets must meet the following requirements:

Requirement Details
DNS Support VPC must have DNS support and DNS hostnames enabled.
Private Subnets (EKS) Exactly 2 private subnets in different Availability Zones with outbound internet access (NAT Gateway or an equivalent NAT Gateway).
Private Subnets (DB) Exactly 2 additional private subnets in different Availability Zones for the RDS database.
Outbound Internet Required for pulling container images from ECR, Helm charts, and other external dependencies.
Tagging VPC and all subnets must be tagged with Project = "unstructured" (required by the tag-based conditions of the IAM policy).

Tag your VPC and subnets:

Copy
aws ec2 create-tags \
  --resources <vpc-id> <subnet-1> <subnet-2> <subnet-3> <subnet-4> \
  --tags Key=Project,Value=unstructured

CIDR range requirements

When you use BYOVPC, the following CIDR ranges must not overlap:

CIDR range Purpose Default
VPC CIDR Your VPC network. Customer-provided.
Kubernetes Service CIDR ClusterIP services. 172.20.0.0/16 (you can configure this via eks_cluster_service_cidr).
VPN Client CIDR VPN tunnel client IPs. Depends on your VPN configuration.

BYOVPC configuration

To use your own VPC, include the networking.byovpc section in config.yaml:

Copy
networking:
  byovpc:
    vpc_id: "vpc-xxxxxxxxxxxxxxxxx"
    private_subnet_ids:
      - "subnet-xxxxxxxx"   # EKS subnet (AZ 1)
      - "subnet-yyyyyyyy"   # EKS subnet (AZ 2)
    db_subnet_ids:
      - "subnet-aaaaaaaa"   # RDS subnet (AZ 1)
      - "subnet-bbbbbbbb"   # RDS subnet (AZ 2)

Omit this section entirely to have the infrastructure create a new VPC automatically.

Deployment steps

1. Prerequisites

Before deployment, ensure you have:

  • AWS CLI configured with access to the target account.
  • Terraform >= 1.5 installed.
  • Go >= 1.22 installed (for the configuration tool).
  • Service-linked roles created (EKS, RDS, Auto Scaling, ELB).
  • An IAM provisioner role with the permissions described in the IAM provisioner role section.
  • An S3 bucket for Terraform remote state.
  • A Cognito user pool and app client configured for authentication.
  • A Route53 hosted zone for DNS.
  • ACM Certificate issued in the deployment region.
  • ECR cross-account access (provide your AWS account ID to the Collibra team so your cluster can pull container images).
  • BYOVPC - VPC and subnets tagged with Project = "unstructured".

2. Configure

Copy the example configuration and enter your values:

Copy
cd iac
cp config.yaml.example config.yaml

Edit config.yaml with your environment-specific values. See config.yaml.example for a fully commented template.

Cross-account DNS: If your Route53 hosted zone is in a different AWS account, set create_a_record: false under the ingress: section. Terraform will not attempt to create the Route53 A record and will output the values you need to create it manually after deployment.

3. Generate Terraform files

Build and run the configuration tool. All commands run from iac/:

Copy
go build -C ../tools/configure -o configure .
../tools/configure/configure

The tool validates your configuration and generates the following:

  • terraform.auto.tfvars (all Terraform variable values).
  • backend.hcl (S3 backend configuration).

The validator performs offline checks, such as:

  • Required field presence.
  • Region consistency (the Cognito pool region matches the deployment region).
  • IAM role ARN format.
  • Cognito client ID format.
  • UUID format for observability site ID.
  • The TLS domain is a subdomain of the DNS zone.
  • BYOVPC subnet ID format, count, and uniqueness.

4. Deploy

Copy
terraform init -backend-config=backend.hcl
terraform apply

This command deploys the entire stack in a single apply:

  • VPC networking (or a BYOVPC).
  • EKS cluster and node groups.
  • Aurora PostgreSQL database.
  • IAM roles (API, workflow, Cognito access, and EBS CSI).
  • Linkerd service mesh (cert-manager, CRDs, and control plane).
  • AWS Load Balancer Controller and ingress.
  • Backend and frontend applications.
  • Argo Workflows and Events.
  • External Secrets Operator.
  • Cluster Autoscaler.
  • EBS CSI Driver.
  • OpenTelemetry Collector.

Deployment takes approximately 20 to 30 minutes.

5. Verify deployment

Copy
# Configure kubeconfig (use --role-arn since only the provisioner role has cluster access)
aws eks update-kubeconfig \
  --region <your-region> \
  --name unstructured-eks-cluster \
  --role-arn arn:aws:iam::<ACCOUNT_ID>:role/UnstructuredTerraformProvisioner

# Set AWS_PROFILE so kubectl/helm can authenticate
export AWS_PROFILE=<your-profile-name>

# Check all pods
kubectl get pods --all-namespaces

# Check ingress
kubectl get ingress -n unstructured

For standard deployments, access the application at https://app.env-id.company.com.

For BYOVPC deployments, ensure your VPN or private network connectivity is active. Then, access the application at the configured domain.

6. Post-deployment: Cross-account DNS

If you set create_a_record: false because your Route53 hosted zone is in a different AWS account, you must manually create the DNS record after deployment.

After terraform apply completes, retrieve the required values:

Copy
terraform output dns_record_config

Use the output values to create an A record (Alias) in your Route53 hosted zone via the AWS Console or via the AWS CLI:

Copy
# Get the hosted zone ID for your domain in the DNS account
ZONE_ID=$(aws route53 list-hosted-zones-by-name \
  --dns-name "<dns_zone_name from output>" \
  --query "HostedZones[0].Id" \
  --output text \
  --profile <dns-account-profile>)

# Create the alias A record
aws route53 change-resource-record-sets \
  --hosted-zone-id "$ZONE_ID" \
  --profile <dns-account-profile> \
  --change-batch '{
    "Changes": [{
      "Action": "UPSERT",
      "ResourceRecordSet": {
        "Name": "<record_name from output>",
        "Type": "A",
        "AliasTarget": {
          "DNSName": "<alias_target from output>",
          "HostedZoneId": "<alias_zone_id from output>",
          "EvaluateTargetHealth": true
        }
      }
    }]
  }'

Tip Copy values from the terraform output carefully. The --change-batch JSON is sensitive to formatting. Avoid trailing commas, ensure quotes are straight, and do not add a trailing period to the DNSName value.

Note You only need to repeat this step if the ALB hostname changes, such as after a full teardown and redeploy.

Upgrading

To upgrade to a newer version:

  1. Download and extract the latest Terraform tarball (.tgz) from the Collibra downloads page.
  2. Review the release notes for any breaking changes.
  3. Update config.yaml with any new parameters and re-run the configure tool.
  4. Run:
    Copy
    terraform init -backend-config=backend.hcl
    terraform apply

Teardown

Copy
cd iac
terraform destroy

Note Some resources may require manual cleanup after terraform destroy:

  • Secrets Manager secrets are scheduled for deletion with a recovery window. Secrets are not immediately deleted. Use aws secretsmanager delete-secret --force-delete-without-recovery if you need immediate deletion for redeployment.
  • KMS keys are scheduled for deletion with a waiting period.
  • Service-linked roles are not deleted by Terraform, as they are account-level resources.

Troubleshooting

Secrets Manager "already scheduled for deletion"

Error:

InvalidRequestException: You can't create this secret because a secret with this name is already scheduled for deletion.

Fix: Restore the existing secret or force-delete it:

Copy
# Option 1: Restore the secret
aws secretsmanager restore-secret --secret-id <secret-name> --region <region>

# Option 2: Force-delete and let Terraform recreate it
aws secretsmanager delete-secret \
  --secret-id <secret-name> \
  --force-delete-without-recovery \
  --region <region>

Helm provider OCI registry authentication errors

Error:

Failed to log in to OCI registry "oci://...": response status code 403: denied: Your authorization token has expired.

This error occurs due to a known bug in Helm provider v3.x where the repository_password is cached in Terraform state. When ECR authorization tokens expire after 12 hours, Terraform uses the expired token from state.

Fix: Remove and re-import all affected Helm releases:

Copy
terraform state rm module.frontend.helm_release.frontend
terraform import module.frontend.helm_release.frontend unstructured/unstructured-frontend

terraform state rm module.backend.helm_release.backend
terraform import module.backend.helm_release.backend unstructured/unstructured-backend

terraform state rm module.linkerd_certs.helm_release.linkerd_certs
terraform import module.linkerd_certs.helm_release.linkerd_certs linkerd/linkerd-certs

terraform state rm module.linkerd_certs.helm_release.cert_manager
terraform import module.linkerd_certs.helm_release.cert_manager cert-manager/cert-manager

terraform state rm module.linkerd_crds.helm_release.linkerd_crds
terraform import module.linkerd_crds.helm_release.linkerd_crds linkerd/linkerd-crds

terraform state rm module.linkerd.helm_release.linkerd
terraform import module.linkerd.helm_release.linkerd linkerd/linkerd

terraform state rm module.ingress.helm_release.aws_load_balancer_controller
terraform import module.ingress.helm_release.aws_load_balancer_controller kube-system/aws-load-balancer-controller

terraform state rm module.eks_workload_addons.helm_release.argo_events
terraform import module.eks_workload_addons.helm_release.argo_events unstructured/argo-events

terraform state rm module.eks_workload_addons.helm_release.argo_workflows
terraform import module.eks_workload_addons.helm_release.argo_workflows unstructured/argo-workflows

terraform state rm module.eks_workload_addons.helm_release.external_secrets_operator
terraform import module.eks_workload_addons.helm_release.external_secrets_operator unstructured/external-secrets

terraform state rm module.eks_workload_addons.helm_release.otel_collector
terraform import module.eks_workload_addons.helm_release.otel_collector unstructured/otel-collector

terraform state rm module.eks_workload_addons.helm_release.aws_ebs_csi_driver
terraform import module.eks_workload_addons.helm_release.aws_ebs_csi_driver kube-system/aws-ebs-csi-driver

terraform state rm module.eks_workload_addons.helm_release.cluster-autoscaler
terraform import module.eks_workload_addons.helm_release.cluster-autoscaler kube-system/cluster-autoscaler

Then re-apply:

Copy
terraform apply

Related issues:

When this occurs:

  • After ECR authorization tokens expire (tokens are valid for 12 hours).
  • After extended periods between Terraform applies.
  • When you switch AWS profiles or credentials.