Submit Search

Add the S3 synchronization capability

After you have enabled the settings to integrate S3 and you have an S3 connection, you need to add the S3 synchronization capability to the connection.

Before you begin

You have created and installed an Edge site.

Required permissions

You have a global role that has the System administration global permission.
You have a global role that has the Manage connections and capabilities global permission, for example, Edge integration engineer.

Steps

Open an Edge site.
1. On the main menu, click , and then click Settings.
  The Collibra settings page opens.
2. In the tab pane, click Edge.
  The Sites tab opens and shows a table with an overview of the Edge sites.
3. In the table, click the name of the Edge site whose status is Healthy.
  The Edge site page opens.
In the Capabilities section, click Add capability.
The Add capability page is shown.

Enter the required information.

Field	Description	Required
Capability	This section contains general information about the capability.
Name	The name of the Edge capability.	Yes
Description	The description of the Edge capability.	No
Capability template	The capability template. The value that you select in this field determines which sections appear on the page. Select the following Edge capability: `S3 synchronization`	Yes
S3 service account	This section contains information about how to connect to Amazon S3.
AWS Connection	The AWS connection to be used.	Yes
IAM role	The IAM role to be used by the AWS Glue crawlers.	Yes
Encryption options	Select the type of encryption used to store the IAM role. Default: To be encrypted by Edge management server.	Yes
Delete Glue database left after previous synchronization of the file system	Select the checkbox if you want the capability to delete the Glue databases created by previous runs of the capability, before the capability starts the synchronization. If you deselect this checkbox, the Glue databases created by previous runs are not removed. This can be useful for troubleshooting. By default, this checkbox is selected.	No
Save input metadata	Select the checkbox if you want to save the input metadata extracted from the data source in ZIP files. The files can be useful for troubleshooting. Select this option only on request of Collibra Support. The Collibra Support team can provide the location of the saved ZIP files after the S3 synchronization. By default, this checkbox is not selected.	No
Finalization Strategy	Define what you want to do if an asset has been deleted from the S3 data source after an initial synchronization. The possible values are: Change Status (default): If an asset has been deleted from the S3 data source after an initial synchronization, we update the status of the asset in Collibra to "Missing from source". Remove Resources: If an asset has been deleted from the S3 data source after an initial synchronization, we remove the asset from Collibra. Ignore: If an asset has been deleted from the S3 data source after an initial synchronization, we don't change anything for the asset in Collibra.	Yes
Custom parameter	Define additional parameters for the synchronization.	No
Advanced Configuration	This section contains configuration options that can help when investigating issues with the capability. Important Only complete the fields Logging configuration, Memory (MiB), and JVM arguments on request of or together with Collibra Support.	No
Glue database configuration	Text in JSON format to define the Glue database names, regions, and domain IDs that you want to integrate. Tip Use this parameter if the current S3 synchronization crawler configuration doesn’t meet your needs. With this parameter, you can integrate an AWS Glue database for which you defined crawlers in AWS Glue itself. This allows you to use all crawler options from the AWS Glue Console. Important If you use this parameter, any crawlers you create in Collibra will not be taken into account during the S3 synchronization. You, however, will need to create a dummy crawler in Collibra to start the synchronization. A dummy crawler is a crawler with an invalid include path, such as s3://dummy. In a future release, we'll remove the need for a dummy crawler. The text must be in JSON format and can contain a block per database that you want to integrate. You can use any JSON validator to verify the format. Collibra is not responsible for the privacy, confidentiality, or protection of the data you submit to such JSON validators, and has no liability for such use. In a block, you can specify the Glue database name, region, and domain ID that must be ingested. The format is: `"glueDbName": “the name of the AWS Glue database”` `"glueDbRegion": “the region of the AWS Glue database”` `"dgcDomainId": “the domain ID in Collibra where assets of the AWS Glue database must be added”` If you don't add the domain ID, the assets are added in the same domain as the S3 File System asset. Example `[` `{` `"glueDbName": "integrations-auto-1", "glueDbRegion": "eu-west-1",` `"dgcDomainId": "a3fe0607-65af-43d6-bc2c-7c3adae6e162"` `}`, `{` `"glueDbName": "integrations-auto-2", "glueDbRegion": "eu-west-1"` `}` `]` In this example: Assets from the AWS Glue database "integrations-auto-1" will be ingested into the domain with ID "a3fe0607-65af-43d6-bc2c-7c3adae6e162". Assets from the AWS Glue database "integrations-auto-2" will be ingested into the same domain as the S3 File System asset.	No
Logging	Define if you want to create debug logs for the synchronization. If the value is True, the debug logs appear in the Collibra logs.	Yes

Click Create.
The capability is added to the Edge site.
The fields become read-only.

What's next?

The Edge preparations are completed. You can now continue with setup steps to integrate an Amazon S3 file system via Edge.