Synchronize Amazon S3 manually

Important 

In Collibra 2024.02, we've launched a new user interface (UI) in beta for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

You can manually start a synchronization job of an S3 File System asset. This can be useful if you want to test your crawlers, or if you want to synchronize immediately.

Tip You can also add a synchronization schedule to synchronize automatically.

Prerequisites

  • You have registered an Amazon S3 file system.
  • You have a programmatic AWS user and IAM role with the required permissions.
  • You have connected an S3 File System asset to Amazon S3.
  • You have created one or more crawlers.
  • You have a global role with the View Edge connections and capabilities global permission, for example, Edge integration engineer.
  • You have a global role with the Catalog global permission, for example, Catalog Author.
  • You have a resource role with the Configure external system resource permission on the community or domain that contains the S3 File System, for example Owner.
  • You have a role with the following resource permissions on the S3 community you created when you registered an Amazon S3 file system:
    • Asset: add
    • Attribute: add
    • Domain: add
    • Attachment: add

Steps

  1. Open an S3 File System asset page.
  2. In the tab panebar, click Configuration. In the tab panebar, click Configuration.
  3. In the Crawlers section, click Synchronize now.

    A notification indicates synchronization has started.

    The synchronization job appears in the Activities list as a bulk synchronization.

    The Synchronization Schedule section displays the time of the last synchronization.

    Once the synchronization is completed, you can view a summary of the results from the Activities list and you can view the assets in their domain. For more information, go to Integrated Amazon S3 data.

Note In case of a partial synchronization caused by a temporary communication issue, the status of the assets that cannot be synchronized is set to Missing from source. During the next fully successful synchronization, the assets are removed or their previous status is restored, depending on their actual status in the source system.