Warning Jobserver and all related Jobserver integrations are end of life starting October, 2024, with the exception of Public Sector customers using GovCloud or on-prem environments.
For information on the integration of S3 via Edge, go to Integrating an Amazon S3 file system via Edge.
Synchronizing Amazon S3
When you synchronize Amazon S3, the content of your Amazon S3 repository is analyzed and represented by means of assets and their characteristics.
You can synchronize manually, or you can automate it by adding a synchronization schedule by means of a cron expression.
- You can only synchronize one S3 File System at a time. If a synchronization job is in progress and a second one is triggered, manually or automatically, it will be queued.
- If a synchronization job is still running and a new synchronization of the same S3 File System is triggered (manually or automatically), the running synchronization will continue and the new synchronization request is ignored.
Technically, the synchronization happens in several steps:
- Collibra creates crawlers in AWS Glue, based on the crawlers defined in Collibra.
- If AWS Glue contains databases with metadata from a previous synchronization, the databases are deleted.
- Each AWS Glue crawler crawls a location in Amazon S3 based on its include path. For each domain assigned to one or more crawlers, AWS Glue creates a database with the crawling results.
- Collibra ingests those databases and creates assets, attributes and relations as required to match the metadata.
The resulting assets are in the domain that was specified in the crawler.Warning Do not move the assets to another domain. Doing so may lead to errors during future synchronizations. This is a known limitation.
- The AWS Glue crawlers are deleted.
Naming convention
Synchronizing Amazon S3 relies on a naming convention to match assets during the synchronization process. We highly recommend that you not change the S3 File System asset's full name.
Warning Editing full name of the S3 File System assets may lead to errors during the synchronization process.