Integrated Amazon S3 data

Important 

In Collibra 2024.05, we launched a new user interface (UI) for Collibra Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

After you have synchronized the data, the integration of the Amazon S3 file system is completed.

Synchronization results

After synchronization, the resulting assets are in the domain that was specified in the crawler. By default, the assets get the Candidate status.

Warning Do not move the assets to another domain. Doing so may lead to errors during future synchronizations. This is a known limitation.

By default, the assets are shown in a plain list, but you can enable a multi-path hierarchy to show it in a tree structure. For the best result, we recommend that you use the following relations:

  1. Storage Container contains Storage Container
  2. Directory contains Directory
  3. Storage Container contains File
  4. Directory contains File Group
  5. File contains Table
  6. File Group contains Table
  7. Table contains Column

The following images shows the resulting hierarchical table.

Note In case of a partial synchronization caused by a temporary communication issue, the status of the assets that cannot be synchronized is set to Missing from source. During the next fully successful synchronization, the assets are removed or their previous status is restored, depending on their actual status in the source system.

Synchronized metadata per asset type

This table shows the metadata for each Amazon S3 asset type.

Asset type

Synchronized metadata

Public ID
S3 Bucket

URL

Url

Location

Location
File Storage contains/ is part of Storage Container FileContainerContainsFileContainer
Directory

URL

Url
Storage Container contains/ is part of Storage Container FileContainerContainsFileContainer
Directory contains/ is part of Directory DirectoryContainsDirectory
File Group URL Url
File Type FileType
Document Size DocumentSize
Number of Files NumberOfFiles
Directory contains/ is part of File Group DirectoryContainsFileGroup
File URL Url
File Type FileType
Document Size DocumentSize
Storage Container contains/ contained in File FileContainerContainsFile
Table Glue database name GlueDatabaseName
Glue table name GlueTableName

Description from source (available only if you integrate via Edge)
The description is taken from the Glue database directly. We look at the description for Table.

Note 

You can't integrate the descriptions from source directly.

DescriptionFromSourceSystem
Table type (available only if you integrate via Edge) TableType
File contains/ is part of Table FileContainsTable
File Group contains/ is part of Table FileGroupContainsTable
Column Technical Data Type TechnicalDataType

Column Position (available only if you integrate via Edge)

Note The column position is not included when columns in the same table have identical names.

ColumnPosition

Description from source (available only if you integrate via Edge)
The description is taken from the Glue database directly. We look at the comment for Column.

Note 

You can't integrate the descriptions from source directly.

DescriptionFromSourceSystem
Column is part of/ contains Table ColumnIsPartOfTable