Deploying Collibra Unstructured AI infrastructure on Azure
Deploying Unstructured AI infrastructure involves setting up the necessary resources in your cloud environment to support advanced AI applications. This page provides instructions for deploying Unstructured AI infrastructure on Microsoft Azure. It guides you through the necessary prerequisites, configuration, deployment steps, and teardown, ensuring a successful setup.
Prerequisites
Before deployment, ensure the following resources and configurations are in place in your Azure subscription.
Expected final state
| Resource | Description | Example value |
|---|---|---|
| User/Service Principal | Must have sufficient Azure RBAC permissions. | See the required Azure permissions section below. |
| Storage Account + Container | For Terraform state file. | deasytfstate / tfstate |
| Resource Group (State) | Contains the storage account. | rg-deasy-mgmt-dev
|
| Resource Group (Infrastructure) | Where AKS, DB, and networking are deployed. | rg-deasy-infra-dev
|
| Resource Group (DNS) | Contains the DNS zone. | rg-deasy-dns
|
| Azure AD Group | Grants AKS cluster admin permissions. | unstructured-aks-admins
|
| DNS Zone | Azure DNS zone for ingress records. | env-id.company.com
|
| Key Vault | Stores the TLS certificate. | unstructured-certs-kv
|
| TLS Certificate | Stored as a Certificate object in Key Vault. | deasy-dev-app-cert
|
Required Azure permissions
The provisioning identity (user or service principal running Terraform) requires different permissions depending on your deployment scenario.
Step 1: Identify your scenario
| Scenario | Description |
|---|---|
| Standard | Terraform creates all networking. |
| BYOVNet | You provide an existing VNet and subnets for VPN-only access. |
Step 2: Create custom roles (BYOVNet only)
Skip this step if you are not using BYOVNet. This custom role is only needed when deploying into a customer-provided VNet.
# Set subscription ID
SUBSCRIPTION_ID="<your-subscription-id>"
# Custom Role: Unstructured Network Integrator
# Allows joining subnets for AKS/DB and writing subnet config (for NSG association)
az role definition create --role-definition "{
\"Name\": \"Unstructured Network Integrator\",
\"Description\": \"Can join, read, and write subnets for AKS and database deployment\",
\"Actions\": [
\"Microsoft.Network/virtualNetworks/subnets/join/action\",
\"Microsoft.Network/virtualNetworks/subnets/read\",
\"Microsoft.Network/virtualNetworks/subnets/write\"
],
\"NotActions\": [],
\"AssignableScopes\": [\"/subscriptions/$SUBSCRIPTION_ID\"]
}"
Note If you prefer not to create a custom role, you can use the built-in Network Contributor role scoped to the AKS and DB subnets instead. This grants broader permissions but is limited to those specific subnets.
Step 3: Grant permissions for your scenario
View BYOVNet additional permissions
Permission summary by resource location
| Resource | Location | Role | Least-privilege method |
|---|---|---|---|
| AKS, DB, Key Vaults | Infrastructure RG | Contributor | Create, update, and delete resources. |
| Key Vault Secrets | Infrastructure RG | Key Vault Secrets Officer | Create secrets in API/DB Key Vaults. |
| Role assignments | Infrastructure RG | RBAC Admin (conditioned) | Constrained to Key Vault Secrets User/Officer, Reader, and Network Contributor. |
| Terraform state | State storage account | Reader | Read storage account properties for Azure AD authentication. |
| Terraform state | State container | Storage Blob Data Contributor | Scoped to specific container for state blobs. |
| DNS zone records | DNS Zone | DNS Zone Contributor | Create TXT validation record, CNAME (public mode) or A record (BYOVNet mode). |
| TLS Certificate | Infrastructure RG | Key Vault Certificate User | Scoped to specific certificate. |
| AKS Cluster | Infrastructure RG | Azure Kubernetes Service RBAC Cluster Admin | Required for Azure AD RBAC-enabled clusters; cannot be self-assigned. |
| BYOVNet AKS Subnet | Customer VNet | Network Integrator | Custom role (subnet join for AKS). |
| BYOVNet DB Subnet | Customer VNet | Network Integrator | Custom role (subnet join and NSG association). |
| BYOVNet AKS Subnet | Customer VNet | RBAC Admin (conditioned) | Constrained to Network Contributor only (for internal load balancer). |
Instructions for creating prerequisite resources
Storage account for Terraform state
Terraform state must be stored in an Azure Storage Account with a blob container.
# Create resource group for management resources
az group create --name rg-deasy-mgmt-dev --location centralus
# Create storage account
az storage account create \
--name deasytfstate \
--resource-group rg-deasy-mgmt-dev \
--location centralus \
--sku Standard_LRS \
--encryption-services blob
# Create blob container for state file
az storage container create \
--name tfstate \
--account-name deasytfstate
Record these values in config.yaml under the state: block. The configure tool generates backend.hcl from them. Do not edit providers.tf directly.
state:
resource_group: "rg-deasy-mgmt-dev"
storage_account: "deasytfstate"
container: "tfstate"
key: "collibra/unstructured/terraform.tfstate"
Resource groups
Three resource groups are required:
| Resource group | Purpose |
|---|---|
rg-deasy-mgmt-dev
|
Terraform state storage account. |
rg-deasy-infra-dev
|
AKS cluster, database, networking, and Key Vaults. |
rg-deasy-dns
|
DNS zone. |
az group create --name rg-deasy-mgmt-dev --location centralus
az group create --name rg-deasy-infra-dev --location centralus
az group create --name rg-deasy-dns --location centralus
Azure AD group for AKS admins
An Azure AD security group is required to grant cluster admin permissions to users.
# Create the unstructured-aks-admins group
az ad group create --display-name "unstructured-aks-admins" --mail-nickname "unstructured-aks-admins"
# Add users to the group (get user object ID first)
az ad user show --id [email protected] --query id -o tsv
az ad group member add --group "unstructured-aks-admins" --member-id <user-object-id>
DNS zone
az network dns zone create \
--resource-group rg-deasy-dns \
--name env-id.company.com
Important After creating the zone, delegate your domain to Azure DNS by updating your domain registrar's nameservers to the Azure DNS nameservers:
az network dns zone show \
--resource-group rg-deasy-dns \
--name env-id.company.com \
--query nameServers
Key Vault with TLS certificate
A Key Vault containing the TLS certificate is required for HTTPS ingress. The certificate must meet the following requirements:
- Must be stored as a Certificate object in Key Vault.
- Must use PEM content type (
application/x-pem-file), required by ESO'sfilterPEMfor NGINX TLS secret sync. - Must include the full certificate chain (leaf, intermediate, and root CA).
- Must match the domain specified in
tls_cert_domain, for exampleapp.env-id.company.com.
Set up Key Vault:
az keyvault create \
--name unstructured-certs-kv \
--resource-group rg-deasy-infra-dev \
--location centralus \
--enable-rbac-authorization true
Import certificate:
Import a PEM certificate from a CA (GoDaddy, DigiCert, Let's Encrypt, and so on):
# PEM file — import directly
az keyvault certificate import \
--vault-name unstructured-certs-kv \
--name app-tls-cert \
--file /path/to/certificate.pem
# PFX file — convert to PEM first, then import with PEM policy
openssl pkcs12 -in certificate.pfx -out certificate.pem -nodes -password pass:<pfx-password>
az keyvault certificate import \
--vault-name unstructured-certs-kv \
--name app-tls-cert \
--file certificate.pem \
--policy '{"secretProperties":{"contentType":"application/x-pem-file"}}'
View example: Azure-managed certificate via App Service
AWS credentials for ECR access
Helm charts and container images are pulled from AWS ECR. AWS credentials with ECR read access are provided to you. The access key ID is non-sensitive and lives in config.yaml under aws.access_key. The secret access key is never written to disk; terraform apply prompts for it interactively at run time.
# config.yaml
aws:
access_key: "<provided-access-key>"
Note If byor: is set in config.yaml, Terraform does not prompt for AWS credentials.
Prerequisites
Before running the configure tool, verify that the following prerequisites are met:
- Azure CLI is installed and authenticated (
az login). kubeloginis installed (required for AKS Azure AD RBAC authentication).- Go is installed to build the configure binary.
- Subscription is set via
export ARM_SUBSCRIPTION_ID="<subscription-id>". - The provisioning identity has the required RBAC roles.
- Storage account and container exist for Terraform state.
- All three resource groups exist.
- The
unstructured-aks-adminsAzure AD group exists. - DNS zone exists and the domain is delegated.
- Key Vault exists with the TLS certificate stored as a Certificate object.
- AWS access key ID is set in
config.yamlunderaws.access_key. config.yamlis populated fromconfig.yaml.example.
Azure Entra ID integration setup
Configure Microsoft Entra ID (formerly Azure AD) for application authentication, including app registration, API permissions, and custom user attributes.
Prerequisites
The provisioning identity must have the Administrator or Application Administrator role in your Azure Entra ID tenant.
Expected final state
| Resource | Description | Example value |
|---|---|---|
| App Registration | Entra ID application for authentication. | unstructured-ai
|
| Client Secret | Application credential (time-limited). | 12-month expiry, stored securely. |
| API Permissions | Microsoft Graph permissions (admin-consented). | User.ReadWrite.All, Group.ReadWrite.All, Directory.ReadWrite.All |
| Extension Attributes | Custom user properties. | tenant_name, permission_level |
| Optional Claims | Token claim configuration. | Extension attributes included in ID tokens. |
1. App registration
- Navigate to the Azure Portal and search for "Microsoft Entra ID" in the search bar.
- In the left sidebar, expand "Manage" and click "App registrations".
- Click "+ New registration" at the top.
- Fill in the registration details:
- Name:
unstructured-ai(or your preferred application name). - Supported account types: Select "Accounts in this organizational directory only (<your-org-name> - Single tenant)".
- Redirect URI (optional): Set the platform to "Single-page application (SPA)" and enter your URL.
- Name:
- Click "Register".
- After registration, note the following values from the "Overview" page:
| Value | Description |
|---|---|
| Application (client) ID | Unique identifier for the application (used in app configuration). |
| Directory (tenant) ID | Your Azure AD tenant identifier. |
| Object ID | Application object identifier used in Graph API calls (different from the client ID). |
Note The Object ID and Application (client) ID are different values. Graph API extension endpoints use the Object ID, while your application configuration uses the client ID.
2. Configure redirect URIs
- In your App Registration, go to "Authentication" under "Manage".
- Under "Single-page application", add the following redirect URIs: add your production URL as needed, for example
https://app.<your-domain>. - Under "Implicit grant and hybrid flows", ensure "ID tokens" is selected (required for OpenID Connect sign-in).
- Click "Save".
3. Create client secret
- In your App Registration, go to "Certificates & secrets" under "Manage".
- Click the "Client secrets" tab.
- Click "+ New client secret".
- Fill in the details:
- Description:
unstructured-ai-secret. - Expires: 365 days (12 months) or your preferred expiration.
- Description:
- Click "Add".
- Copy the "Value" immediately after creation.
| Field | Value |
|---|---|
| Secret ID | <auto-generated>
|
| Value | <copy this immediately>
|
| Expires | <selected expiration date>
|
Important Copy the client secret Value immediately after creation. It will be permanently hidden once you navigate away from this page. Store it securely and never commit it to version control.
4. Request API permissions
- In your App Registration, go to "API permissions" under "Manage".
- Click "+ Add a permission".
- Select "Microsoft Graph" > "Application permissions".
- Add the following permissions:
| Permission | Type | Purpose |
|---|---|---|
User.ReadWrite.All
|
Application | Create, read, update, and delete users. |
Group.ReadWrite.All
|
Application | Create, read, update, and delete groups. |
Directory.ReadWrite.All
|
Application | Manage directory extension attributes (tenant_name, permission_level, group_memberships). |
- Click "Add permissions" after selecting each permission.
- Click "Grant admin consent for <your-org-name>" to approve the application permissions.
Note If you are not an admin, request that a Global Administrator or Privileged Role Administrator grants admin consent on your behalf.
5. Register extension attributes
Extension attributes allow you to add custom properties to user objects (for example, tenant_name and permission_level) that can be included as claims in tokens.
5.1 Register extension attributes via Microsoft Graph API
5.2 Set extension attribute values on users
6. Configure token claims (optional)
To include extension attributes in ID tokens issued to users:
- In your App Registration, go to "Token configuration" under "Manage".
- Click "+ Add optional claim".
- Select the ID token type.
- Add claims or configure via "Manifest".
Alternatively, edit the "Manifest" directly to add:
{
"optionalClaims": {
"idToken": [
{
"name": "extension_<client_id_no_hyphens>_tenant_name",
"source": "user",
"essential": false
},
{
"name": "extension_<client_id_no_hyphens>_permission_level",
"source": "user",
"essential": false
}
]
}
}
Note Replace <client_id_no_hyphens> with your Application (client) ID with hyphens removed (see the naming convention in step 5.2).
Deployment
Once all prerequisites are in place, deployment is driven by the configure tool in tools/configure/azure. It reads config.yaml, validates inputs, and generates terraform.auto.tfvars and backend.hcl. Both files are auto-generated and must not be edited by hand. For BYOR deployments, it also mirrors images from the source registry into your customer registry.
Run all commands below from iac/azure/:
cd iac/azure
1. Authenticate and set environment
# Log in to Azure
az login
# Set subscription (environment variable is more reliable than az account set)
export ARM_SUBSCRIPTION_ID="<subscription-id>"
2. Populate config.yaml
cp config.yaml.example config.yaml
# Edit config.yaml with your values (see config.yaml.example for field documentation)
3. Build and run configure
go build -C ../../tools/configure -o ../../iac/azure/configure ./azure
./configure
This generates terraform.auto.tfvars and backend.hcl. If a customer registry is configured, it also mirrors images.
4. Initialize and apply
terraform init -backend-config=backend.hcl
terraform plan
terraform apply
terraform apply prompts for client_secret (Entra app secret) and aws_secret_access_key. Neither is stored on disk.
The deployment takes approximately 15 to 20 minutes and creates:
- AKS cluster with NGINX ingress and Azure Front Door (standard mode) or internal load balancer (BYOVNet mode).
- PostgreSQL Flexible Server.
- Networking (VNet, subnets, NSGs).
- Linkerd service mesh.
- Backend and frontend applications.
- DNS records managed by Terraform.
Note After terraform apply completes, Azure Front Door may take an additional up to 30 minutes to fully propagate the managed TLS certificate to all edge PoPs. The site returns 404 or a certificate mismatch warning until propagation completes. This is part of Azure's CDN pipeline, not an infrastructure issue.
5. Verify deployment
# Get AKS credentials
az aks get-credentials \
--resource-group rg-deasy-infra-dev \
--name <aks-cluster-name>
# Check pod status
kubectl get pods -n unstructured
# Verify ingress
kubectl get ingress -n unstructured
Your application should be accessible at https://app.env-id.company.com or your configured domain.
Teardown
Key Vault lock
If the TLS certificate Key Vault has a resource lock, you must remove it before destruction:
# List locks on the Key Vault
az lock list --resource-group rg-deasy-infra-dev \
--resource-name unstructured-certs-kv \
--resource-type Microsoft.KeyVault/vaults
# Delete the lock (replace <lock-name> with the actual name)
az lock delete --name <lock-name> \
--resource-group rg-deasy-infra-dev \
--resource-name unstructured-certs-kv \
--resource-type Microsoft.KeyVault/vaults
To destroy all resources, use the destroy subcommand of the configure tool. It uninstalls workloads, drains Karpenter nodes, and then runs terraform destroy. Run from iac/azure/:
./configure destroy
The destroy flow prompts for the Entra app client secret and is not stored on disk. Azure provider authentication uses your existing az login session and ARM_* environment variables.
Note Helm may warn about "kept CRDs" during teardown. These are deleted when the cluster is destroyed.