Deploying Collibra Unstructured AI infrastructure
Deploying Unstructured AI infrastructure involves setting up the necessary resources in your cloud environment to support advanced AI applications. This page provides instructions for deploying Collibra Unstructured AI infrastructure on Amazon Web Services (AWS). It guides you through the necessary prerequisites, configuration, deployment steps, and troubleshooting, ensuring a successful setup.
Prerequisites
Before deployment, ensure the following resources and configurations are in place.
Expected final state
| Resource | Description | Example value |
|---|---|---|
| AWS Account | Dedicated subaccount recommended. | 006352514257
|
| IAM Role | Terraform provisioner role with required permissions. | UnstructuredTerraformProvisioner
|
| S3 Bucket | For Terraform state file. | unstructured-tf-state
|
| Route53 Hosted Zone | DNS zone for ingress records. | env-id.company.com
|
| ACM Certificate | TLS certificate for application domain (must be in ISSUED status). |
app.env-id.company.com
|
| Cognito User Pool | User authentication (app client must have no client secret). | us-east-1_xxxxxxxx
|
| Service-Linked Roles | AWS service roles for EKS, RDS, and Auto Scaling. | See Service-linked roles. |
Required tools
The following tools must be installed on your local machine:
- Terraform version v1.12.2 or newer.
- AWS CLI version 2.23.6 or newer.
- kubectl for cluster verification.
- Helm for troubleshooting Helm releases.
- Go for the configuration tool.
Optional AWS subaccount setup
It is recommended to deploy Collibra Unstructured AI infrastructure in a dedicated AWS subaccount for better resource isolation and management. If you choose to set up a subaccount, follow the AWS Organizations documentation to create a new account under your organization.
IAM provisioner role
Create an IAM role, such as UnstructuredTerraformProvisioner, that Terraform will assume. The role must:
- Trust your AWS account (or the specific user or role running Terraform):
Copy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<YOUR_ACCOUNT_ID>:root"
},
"Action": "sts:AssumeRole"
}
]
} - Have the following inline policy attached:
Important Update the S3 statement
ResourceARNs to match the actual Terraform state bucket name.Note The
CoreInfrastructureAndNetworkingandAllowCreationWithProjectTagstatements use tag-based conditions. The Terraform AWS provider is configured withdefault_tagsto automatically tag all resources withProject = "unstructured". For BYOVPC deployments, you must also tag your existing VPC and subnets withProject = "unstructured". See BYOVPC requirements for more information.
Service-linked roles
AWS service-linked roles must exist in your account before deployment. The IAM policy for the provisioner role scopes iam:* to unstructured-* resources, so it cannot create service-linked roles, which exist under aws-service-role/*.
Create service-linked roles manually before your first deployment:
aws iam create-service-linked-role --aws-service-name eks.amazonaws.com
aws iam create-service-linked-role --aws-service-name eks-nodegroup.amazonaws.com
aws iam create-service-linked-role --aws-service-name rds.amazonaws.com
aws iam create-service-linked-role --aws-service-name autoscaling.amazonaws.com
aws iam create-service-linked-role --aws-service-name elasticloadbalancing.amazonaws.com
Note If a service-linked role already exists, the command returns an error. You can safely ignore this error.
Instructions to create prerequisite resources
S3 backend for Terraform state
Create an S3 bucket for storing Terraform state:
aws s3api create-bucket \
--bucket unstructured-tf-state \
--region us-east-1
aws s3api put-bucket-encryption \
--bucket unstructured-tf-state \
--server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"AES256"}}]}'
Route53 hosted zone
A Route53 hosted zone must exist for the domain you plan to use. Terraform creates DNS records in this zone, but it does not create the zone itself.
aws route53 create-hosted-zone \
--name env-id.company.com \
--caller-reference "$(date +%s)"
After you create the zone, delegate your domain by updating the nameservers of your registrar to the ones returned by Route53.
ACM certificate
An ACM certificate must be issued for the application domain. It must be in the same region as your deployment and in ISSUED status.
aws acm request-certificate \
--domain-name app.env-id.company.com \
--validation-method DNS \
--region us-east-1
Complete DNS validation by adding the CNAME record to your Route53 zone. Terraform handles the association of this certificate with the ingress load balancer.
Cognito user pool
Configure AWS Cognito for application authentication. This includes user pool creation, app client settings, custom user attributes, and optional SSO with a federated identity provider.
Prerequisites
- AWS account with permissions to create and manage Amazon Cognito resources.
- AWS CLI v2 installed and configured (optional if you use CLI commands).
- The application URL (for example,
https://app.unstructured.<your domain>.com).
Expected final state
| Resource | Description | Example value |
|---|---|---|
| User Pool | Cognito user pool for authentication. | unstructured-ai-pool
|
| App Client | Public app client (no client secret). | unstructured-frontend
|
| Custom Attributes | User properties for roles and tenancy. | custom:permission_level, custom:tenant_name, custom:group_memberships |
| Admin User | Initial admin user with permanent password. | Username: admin, permission level: admin |
| Hosted UI Domain (Optional) | Cognito domain for OAuth/SSO (if you use SSO). | your-company.auth.us-east-1.amazoncognito.com
|
| SAML Identity Provider (Optional) | Federated IDP for SSO (if you use SSO). | Provider name: SAML |
1. Create a user pool
Using the AWS console
- Navigate to the AWS Console.
- Enter "Cognito" in the search bar.
- Click Create user pool.
- Define your application:
- Select Single-page application (SPA).
- Name your application.
- Configure options:
- For sign-in identifiers, select Username and Email.
- For self-registration, clear Enable self-registration.
- For required attributes for sign-up, select email in the dropdown menu.
- Skip the return URL section. You only need this for SSO, and you can configure it later if required.
- Click Create User Directory.
- On the next page, click Go to overview. You will land on the user pool overview page. Note the following values, that are required for application configuration:
Value Where to find Description User Pool ID User pool overview page. Unique identifier (for example, us-east-1_AbCdEfGhI).App Client ID User pool overview page, or App clients tab. Client identifier (for example, 7649pb0etcv84u0rskudhd5pel).AWS Region Visible in the URL bar or User Pool ARN. Region where the pool was created (for example, us-east-1).Note The wizard automatically creates an app client. You will configure its settings in the Configure app client step.
- Add custom attributes: In your user pool, navigate to Sign-up experience (under Authentication in the left panel). Scroll to Custom attributes. Click Add custom attributes and enter the following:
Attribute name Type Mutable Description custom:permission_levelString Yes User's role: admin,contributor, orviewer.custom:tenant_nameString Yes Tenant ID for the organization of the user. custom:group_membershipsString Yes Comma-separated list of group names. Important You cannot rename or delete custom attributes after creation. Verify the attribute names match exactly as shown.
Note The create-user-pool CLI command does not support setting the feature plan (Essentials) or selecting the SPA app type. The CLI creates a standard user pool, and the app client is created separately. After creation, you can verify settings in the AWS Console.
2. Configure app client
The app client was automatically created during user pool creation. This section covers verifying and updating its settings.
Using the AWS console
- In your user pool, go to the App clients tab.
- Click the app client created during the wizard. It has the name you provided previously.
App client information
Verify the following settings:
| Setting | Expected value |
|---|---|
| App client name | The name you provided during user pool creation. |
| Client secret | - (no client secret for the SPA app type). |
Authentication flows
Verify that the following authentication flows are enabled. If they are not, click Edit and enable them:
| Authentication flow | API name | Required | Description |
|---|---|---|---|
| Sign in with secure remote password (SRP) | ALLOW_USER_SRP_AUTH
|
Yes | Standard username and password authentication. |
| Get new user tokens from existing authenticated sessions | ALLOW_REFRESH_TOKEN_AUTH
|
Yes | Enables session continuity and token refresh. |
Note You do not need other authentication flows such as ALLOW_USER_PASSWORD_AUTH, ALLOW_ADMIN_USER_PASSWORD_AUTH, ALLOW_CUSTOM_AUTH, or ALLOW_USER_AUTH. You can leave them cleared unless you have a specific need for them.
Authentication flow session duration
| Setting | Recommended value |
|---|---|
| Authentication flow session duration | 3 minutes. |
Token expiration
| Token | Recommended value | Notes |
|---|---|---|
| Refresh token expiration | 5 days. | Used for automatic session renewal without re-login. |
| Access token expiration | 60 minutes. | Short-lived for security. |
| ID token expiration | 60 minutes. | Used by the application for API requests and role resolution. |
Adjust token lifetimes according to the security requirements of your organization. Shorter refresh token expiration increases security but requires more frequent re-authentication.
Advanced authentication settings
| Setting | Value |
|---|---|
| Enable token revocation | Yes (allows invalidating tokens on sign-out). |
| Enable prevent user existence errors | Yes (returns generic error messages to prevent username enumeration attacks). |
Attribute read and write permissions
The app client needs Read and Write access to the attributes listed below. Click Edit under the attribute permissions section to verify and update the settings.
Required custom attributes:
| Attribute | Read | Write | Used for |
|---|---|---|---|
custom:permission_level
|
Yes | Yes | Role-based access control (admin, contributor, or viewer). |
custom:tenant_name
|
Yes | Yes | Tenant ID for tenant identification. |
custom:group_memberships
|
Yes | Yes | Group-based access control. |
Required standard attributes:
| Attribute | Read | Write | Used for |
|---|---|---|---|
email
|
Yes | Yes | User identification and communication. |
Recommended standard attributes:
| Attribute | Read | Write | Used for |
|---|---|---|---|
given_name
|
Yes | Yes | User's first name shown in the UI. |
family_name
|
Yes | Yes | User's last name shown in the UI. |
The application does not actively use other standard attributes such as address, birthdate, gender, locale, middle_name, name, nickname, or phone_number. You can leave them at defaults or disable them according to your requirements.
- Click Save changes after verifying or updating the settings.
Important The app client must not have a client secret. The application is a browser-based SPA that uses Secure Remote Password (SRP) authentication, which requires a public client. The SPA app type selected during user pool creation ensures this by default.
Note The CLI command uses update-user-pool-client (not create) because the client already exists. When you use this command, you must specify all settings. Any omitted values will be reset to defaults.
3. Create the initial admin user
Create your first admin user. This user will have full access to manage users, groups, and settings through the application.
Using the AWS console
- In your user pool, go to Users.
- Click Create user.
- Enter the details:
Field Value User name admin(or your preferred admin username).Email ID Admin's email ID. Mark email as verified Yes. Temporary password Set a temporary password. - Click Create user.
- After creation, go to the user detail page.
- Navigate to User attributes.
- Click Edit.
- Set custom attributes:
Attribute Value custom:permission_leveladmin.custom:tenant_nameYour tenant ID (for example, default).custom:group_membershipsLeave empty (or set initial groups). - Set a permanent password to move the user out of
FORCE_CHANGE_PASSWORDstatus. See the CLI command below for an example.
4. Optional SSO setup
4.1 Configure hosted ID domain
If you plan to use SSO with a federated identity provider, you must configure a hosted UI domain.
Using the AWS console
- In your user pool, go to Branding, then Domain in the left panel.
- Under Cognito domain, click Edit (or Create Cognito domain if none exists).
- Enter a domain prefix (for example,
unstructured-ai-sso). This creates a domain in the following format:https://<your-prefix>.auth.<region>.amazoncognito.comFor example:
https://unstructured-ai-sso.auth.us-east-1.amazoncognito.com - For Branding version, select Hosted UI (classic).
- Click Save.
Note: If you prefer to use your own domain, for example,
auth.yourcompany.com, use the Custom domain section instead. This requires an ACM certificate inus-east-1. For most deployments, the Cognito domain is sufficient. - Callback URLs: The application manages OAuth redirect URLs in its own configuration via environment variables. See Section 5. You do not need to configure callback or sign-out URLs on the Cognito app client.
- Note the Cognito domain, as it is required for application configuration:
Value Example Cognito domain prefix unstructured-ai-sso.Full domain URL https://unstructured-ai-sso.auth.us-east-1.amazoncognito.com.
4.2 Add a SAML identity provider
To enable "Login with SSO" through a SAML 2.0 identity provider, such as Okta, Azure AD, OneLogin, or PingFederate:
Using the AWS console
- In your user pool, go to Authentication, then Social and external providers in the left panel.
- Click Add identity provider.
- Select SAML.
- Configure the provider:
Setting Value Provider name SAML.Metadata source Upload metadata file or provide metadata URL from your IDP. - Configure Attribute mapping:
IDP attribute Cognito attribute emailemail.namename. - Click Save.
Important The provider name must be SAML. The application uses this name when it initiates SSO login redirects.
View the CLI equivalent using metadata URL
View the CLI equivalent using metadata file
4.3 Configure your identity provider
In your identity provider, create a SAML application with the following settings:
| Setting | Value |
|---|---|
| SSO URL / ACS URL | https://<your-cognito-domain>/saml2/idpresponse. |
| Audience URI / Entity ID | urn:amazon:cognito:sp:<your-user-pool-ID>. |
| Name ID format | EmailAddress or Persistent. |
Okta-specific setup
- In the Okta Admin Console, go to Applications.
- Click Create App Integration.
- Select SAML 2.0.
- Click Next.
- Set:
- Single sign-on URL:
https://<your-cognito-domain>/saml2/idpresponse. - Audience URI (SP Entity ID):
urn:amazon:cognito:sp:<your-user-pool-ID>.
- Single sign-on URL:
- Under Attribute Statements, add:
email→user.email.name→user.displayName.
- Complete the wizard. Note the Metadata URL from the Sign On tab, as you will need this for the Cognito SAML provider configuration.
4.4 Enable the IDP on the app client
Using the AWS console
- Go to Applications, then App clients.
- Click your app client.
- Select the Login pages tab.
- Click Edit.
- Under Identity providers, enable both:
- Cognito user pool (for username and password login).
- SAML (for SSO login).
- Click Save changes.
When you use update-user-pool-client, you must re-specify all existing settings because the command replaces the full client configuration.
Deployment scenarios
Standard deployment
In a standard deployment, Terraform creates all networking resources, such as VPC, subnets, NAT gateways, internet gateway, and route tables. The ingress load balancer is internet-facing.
BYOVPC (bring your own VPC)
In a BYOVPC deployment, you provide an existing VPC and subnets. Terraform skips networking creation and deploys directly into your infrastructure. The ingress load balancer is automatically set to internal, meaning the application is only accessible via private network connectivity, for example, VPN, Direct Connect, or peering.
Note VPN or network connectivity is your responsibility and is managed outside of this Terraform deployment.
BYOVPC requirements
Your VPC and subnets must meet the following requirements:
| Requirement | Details |
|---|---|
| DNS Support | VPC must have DNS support and DNS hostnames enabled. |
| Private Subnets (EKS) | Exactly 2 private subnets in different Availability Zones with outbound internet access (NAT Gateway or an equivalent NAT Gateway). |
| Private Subnets (DB) | Exactly 2 additional private subnets in different Availability Zones for the RDS database. |
| Outbound Internet | Required for pulling container images from ECR, Helm charts, and other external dependencies. |
| Tagging | VPC and all subnets must be tagged with Project = "unstructured" (required by the tag-based conditions of the IAM policy). |
Tag your VPC and subnets:
aws ec2 create-tags \
--resources <vpc-id> <subnet-1> <subnet-2> <subnet-3> <subnet-4> \
--tags Key=Project,Value=unstructured
CIDR range requirements
When you use BYOVPC, the following CIDR ranges must not overlap:
| CIDR range | Purpose | Default |
|---|---|---|
| VPC CIDR | Your VPC network. | Customer-provided. |
| Kubernetes Service CIDR | ClusterIP services. | 172.20.0.0/16 (you can configure this via eks_cluster_service_cidr). |
| VPN Client CIDR | VPN tunnel client IPs. | Depends on your VPN configuration. |
BYOVPC configuration
To use your own VPC, include the networking.byovpc section in config.yaml:
networking:
byovpc:
vpc_id: "vpc-xxxxxxxxxxxxxxxxx"
private_subnet_ids:
- "subnet-xxxxxxxx" # EKS subnet (AZ 1)
- "subnet-yyyyyyyy" # EKS subnet (AZ 2)
db_subnet_ids:
- "subnet-aaaaaaaa" # RDS subnet (AZ 1)
- "subnet-bbbbbbbb" # RDS subnet (AZ 2)
Omit this section entirely to have the infrastructure create a new VPC automatically.
Deployment steps
1. Prerequisites
Before deployment, ensure you have:
- AWS CLI configured with access to the target account.
- Terraform >= 1.5 installed.
- Go >= 1.22 installed (for the configuration tool).
- Service-linked roles created (EKS, RDS, Auto Scaling, ELB).
- An IAM provisioner role with the permissions described in the IAM provisioner role section.
- An S3 bucket for Terraform remote state.
- A Cognito user pool and app client configured for authentication.
- A Route53 hosted zone for DNS.
- ACM Certificate issued in the deployment region.
- ECR cross-account access (provide your AWS account ID to the Collibra team so your cluster can pull container images).
- BYOVPC - VPC and subnets tagged with
Project = "unstructured".
2. Configure
Copy the example configuration and enter your values:
cd iac
cp config.yaml.example config.yaml
Edit config.yaml with your environment-specific values. See config.yaml.example for a fully commented template.
Cross-account DNS: If your Route53 hosted zone is in a different AWS account, set create_a_record: false under the ingress: section. Terraform will not attempt to create the Route53 A record and will output the values you need to create it manually after deployment.
3. Generate Terraform files
Build and run the configuration tool. All commands run from iac/:
go build -C ../tools/configure -o configure .
../tools/configure/configure
The tool validates your configuration and generates the following:
terraform.auto.tfvars(all Terraform variable values).backend.hcl(S3 backend configuration).
The validator performs offline checks, such as:
- Required field presence.
- Region consistency (the Cognito pool region matches the deployment region).
- IAM role ARN format.
- Cognito client ID format.
- UUID format for observability site ID.
- The TLS domain is a subdomain of the DNS zone.
- BYOVPC subnet ID format, count, and uniqueness.
4. Deploy
terraform init -backend-config=backend.hcl
terraform apply
This command deploys the entire stack in a single apply:
- VPC networking (or a BYOVPC).
- EKS cluster and node groups.
- Aurora PostgreSQL database.
- IAM roles (API, workflow, Cognito access, and EBS CSI).
- Linkerd service mesh (cert-manager, CRDs, and control plane).
- AWS Load Balancer Controller and ingress.
- Backend and frontend applications.
- Argo Workflows and Events.
- External Secrets Operator.
- Cluster Autoscaler.
- EBS CSI Driver.
- OpenTelemetry Collector.
Deployment takes approximately 20 to 30 minutes.
5. Verify deployment
# Configure kubeconfig (use --role-arn since only the provisioner role has cluster access)
aws eks update-kubeconfig \
--region <your-region> \
--name unstructured-eks-cluster \
--role-arn arn:aws:iam::<ACCOUNT_ID>:role/UnstructuredTerraformProvisioner
# Set AWS_PROFILE so kubectl/helm can authenticate
export AWS_PROFILE=<your-profile-name>
# Check all pods
kubectl get pods --all-namespaces
# Check ingress
kubectl get ingress -n unstructured
For standard deployments, access the application at https://app.env-id.company.com.
For BYOVPC deployments, ensure your VPN or private network connectivity is active. Then, access the application at the configured domain.
6. Post-deployment: Cross-account DNS
If you set create_a_record: false because your Route53 hosted zone is in a different AWS account, you must manually create the DNS record after deployment.
After terraform apply completes, retrieve the required values:
terraform output dns_record_config
Use the output values to create an A record (Alias) in your Route53 hosted zone via the AWS Console or via the AWS CLI:
# Get the hosted zone ID for your domain in the DNS account
ZONE_ID=$(aws route53 list-hosted-zones-by-name \
--dns-name "<dns_zone_name from output>" \
--query "HostedZones[0].Id" \
--output text \
--profile <dns-account-profile>)
# Create the alias A record
aws route53 change-resource-record-sets \
--hosted-zone-id "$ZONE_ID" \
--profile <dns-account-profile> \
--change-batch '{
"Changes": [{
"Action": "UPSERT",
"ResourceRecordSet": {
"Name": "<record_name from output>",
"Type": "A",
"AliasTarget": {
"DNSName": "<alias_target from output>",
"HostedZoneId": "<alias_zone_id from output>",
"EvaluateTargetHealth": true
}
}
}]
}'
Tip Copy values from the terraform output carefully. The --change-batch JSON is sensitive to formatting. Avoid trailing commas, ensure quotes are straight, and do not add a trailing period to the DNSName value.
Note You only need to repeat this step if the ALB hostname changes, such as after a full teardown and redeploy.
Upgrading
To upgrade to a newer version:
- Download and extract the latest Terraform tarball (.tgz) from the Collibra downloads page.
- Review the release notes for any breaking changes.
- Update
config.yamlwith any new parameters and re-run the configure tool. - Run:
Copy
terraform init -backend-config=backend.hcl
terraform apply
Teardown
cd iac
terraform destroy
Note Some resources may require manual cleanup after terraform destroy:
- Secrets Manager secrets are scheduled for deletion with a recovery window. Secrets are not immediately deleted. Use
aws secretsmanager delete-secret --force-delete-without-recoveryif you need immediate deletion for redeployment. - KMS keys are scheduled for deletion with a waiting period.
- Service-linked roles are not deleted by Terraform, as they are account-level resources.
Troubleshooting
Secrets Manager "already scheduled for deletion"
Error:
InvalidRequestException: You can't create this secret because a secret with this name is already scheduled for deletion.
Fix: Restore the existing secret or force-delete it:
# Option 1: Restore the secret
aws secretsmanager restore-secret --secret-id <secret-name> --region <region>
# Option 2: Force-delete and let Terraform recreate it
aws secretsmanager delete-secret \
--secret-id <secret-name> \
--force-delete-without-recovery \
--region <region>
Helm provider OCI registry authentication errors
Error:
Failed to log in to OCI registry "oci://...": response status code 403: denied: Your authorization token has expired.
This error occurs due to a known bug in Helm provider v3.x where the repository_password is cached in Terraform state. When ECR authorization tokens expire after 12 hours, Terraform uses the expired token from state.
Fix: Remove and re-import all affected Helm releases:
terraform state rm module.frontend.helm_release.frontend
terraform import module.frontend.helm_release.frontend unstructured/unstructured-frontend
terraform state rm module.backend.helm_release.backend
terraform import module.backend.helm_release.backend unstructured/unstructured-backend
terraform state rm module.linkerd_certs.helm_release.linkerd_certs
terraform import module.linkerd_certs.helm_release.linkerd_certs linkerd/linkerd-certs
terraform state rm module.linkerd_certs.helm_release.cert_manager
terraform import module.linkerd_certs.helm_release.cert_manager cert-manager/cert-manager
terraform state rm module.linkerd_crds.helm_release.linkerd_crds
terraform import module.linkerd_crds.helm_release.linkerd_crds linkerd/linkerd-crds
terraform state rm module.linkerd.helm_release.linkerd
terraform import module.linkerd.helm_release.linkerd linkerd/linkerd
terraform state rm module.ingress.helm_release.aws_load_balancer_controller
terraform import module.ingress.helm_release.aws_load_balancer_controller kube-system/aws-load-balancer-controller
terraform state rm module.eks_workload_addons.helm_release.argo_events
terraform import module.eks_workload_addons.helm_release.argo_events unstructured/argo-events
terraform state rm module.eks_workload_addons.helm_release.argo_workflows
terraform import module.eks_workload_addons.helm_release.argo_workflows unstructured/argo-workflows
terraform state rm module.eks_workload_addons.helm_release.external_secrets_operator
terraform import module.eks_workload_addons.helm_release.external_secrets_operator unstructured/external-secrets
terraform state rm module.eks_workload_addons.helm_release.otel_collector
terraform import module.eks_workload_addons.helm_release.otel_collector unstructured/otel-collector
terraform state rm module.eks_workload_addons.helm_release.aws_ebs_csi_driver
terraform import module.eks_workload_addons.helm_release.aws_ebs_csi_driver kube-system/aws-ebs-csi-driver
terraform state rm module.eks_workload_addons.helm_release.cluster-autoscaler
terraform import module.eks_workload_addons.helm_release.cluster-autoscaler kube-system/cluster-autoscaler
Then re-apply:
terraform apply
Related issues:
- GitHub Issue: https://github.com/hashicorp/terraform-provider-helm/issues/1660.
- Fix PR (pending merge): https://github.com/hashicorp/terraform-provider-helm/pull/1687.
When this occurs:
- After ECR authorization tokens expire (tokens are valid for 12 hours).
- After extended periods between Terraform applies.
- When you switch AWS profiles or credentials.