Warning Jobserver and all related Jobserver integrations are end of life starting October, 2024, with the exception of Public Sector customers using GovCloud or on-prem environments.
For information on the integration of S3 via Edge, go to Integrating an Amazon S3 file system via Edge.

Synchronize Amazon S3 manually

Important 

In Collibra 2024.05, we launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

You can manually start a synchronization job of an S3 File System asset. This can be useful if you want to test your crawlers, or if you want to synchronize immediately.

Tip You can also add a synchronization schedule to synchronize automatically.

Prerequisites

  • You have registered an Amazon S3 file system.
  • You have configured one or more Jobservers in Collibra Console. If there is no available Jobserver, the Register data source actions will be grayed out in the global create menu of Collibra Data Intelligence Platform.
  • You have a programmatic AWS user and IAM role with the required permissions.
  • You have connected an S3 File System asset to Amazon S3.
  • You have created one or more crawlers.
  • You have a global role with the Catalog global permission, for example, Catalog Author.
  • You have a resource role with the Configure external system resource permission on the community or domain that contains the S3 File System, for example Owner.
  • You have a role with the following resource permissions on the S3 community you created when you registered an Amazon S3 file system:
    • Asset: add
    • Attribute: add
    • Domain: add
    • Attachment: add

Steps

  1. Open an S3 File System asset page.
  2. In the tab panebar, click Configuration. In the tab panebar, click Configuration.
  3. In the Crawlers section, click Synchronize now.

    A notification indicates synchronization has started.

    The synchronization job appears in the Activities list as a bulk synchronization.

    The Synchronization Schedule section displays the time of the last synchronization.

    Once the synchronization is completed, you can view a summary of the results from the Activities list and you can view the assets in their domain. For more information, go to Integrated Amazon S3 data.

Note In case of a partial synchronization caused by a temporary communication issue, the status of the assets that cannot be synchronized is set to Missing from source. During the next fully successful synchronization, the assets are removed or their previous status is restored, depending on their actual status in the source system.