Working with Amazon S3

Amazon S3 is an online object storage service hosted by Amazon. For more information about Amazon S3, see the Amazon S3 documentation.

In Collibra Data Intelligence Cloud, you can synchronize with Amazon S3 in multiple ways.

Synchronization method Advantages and disadvantages More information
S3 file system integration

The resulting assets represent the folder structure by means of S3 Bucket, Directory, File, Table and Column assets.

You can’t profile and classify columns and tables.

Amazon S3 file system integration
Catalog connector

You can profile and classify the columns and tables in your S3 buckets.

The folder structure of your S3 bucket isn't represented in Data Catalog.

Jobserver Edge

Register an Amazon S3 data source using the AWS Glue Catalog connector

  1. Set up an Edge site.
  2. Create a JDBC connection to your Amazon S3 data source by means the AWS Glue Catalog connector.
  3. Add the following capabilities to the Edge site: Catalog JDBC ingestion and JDBC Profiling.
  4. Register the Amazon S3 data source via Edge.
  5. Synchronize your Amazon S3 data source.