Cross-account crawling
You can make an S3 bucket accessible for crawlers from other AWS accounts than the account in which the S3 bucket is located. To access the external S3 bucket, the programmatic user and the IAM crawling role must be defined in the AWS main account.
Policy
A policy must be attached to the external S3 bucket to allow:
- the AWS Glue crawler to access and perform S3 actions on an external S3 bucket from another AWS account.
- Data Catalogto execute the S3 GetBucketLocation API on an external S3 bucket via the programmatic user.
You can use the following JSON content:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "collibra-jobserver-access",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<enter_id>:role/collibra-jobserver-s3-role"
},
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::crawler-name",
"arn:aws:s3:::crawler-name/*"
]
},
{
"Sid": "collibra-jobserver-access",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<enter_id>:user/collibra-jobserver"
},
"Action": "s3:getBucketLocation",
"Resource": [
"arn:aws:s3:::*"
]
}
]
}