Add the GCS synchronization capability

Important 

In Collibra 2024.05, we've launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

After you have created a connection to the Google Cloud Platform (GCP) in your Edge site, you have to add the GCS synchronization capability to the connection.

Before you start

Required permissions

Steps

  1. Open an Edge site.
    1. On the main toolbar, click Products icon, and then click Cogwheel icon Settings.
      The Collibra settings page opens.
    2. In the tab pane, click Edge.
      The Sites tab opens and shows a table with an overview of the Edge sites.
    3. In the table, click the name of the Edge site whose status is Healthy.
      The Edge site page opens.
  2. In the Capabilities section, click Add capability.
    The Add capability page appears.
  3. Select the GCS synchronization capability template.
  4. Enter the required information.
    FieldDescriptionRequired

    Capability

    This section contains general information about the capability.

    Name

    The name of the Edge capability.

    Yes

    Description

    The description of the Edge capability.

    No

    Capability template

    The capability template. The value that you select in this field determines which sections appear on the page.

    Select the following Edge capability:

    GCS synchronization

    Yes

    GCP service account

    This section contains information on how to connect to Google Cloud Storage.
    GCP Connection
    The GCP connection to be used.

    Yes

    ConfigurationThis section contains information on the configuration of the crawlers. 
    Maximum number of files per crawler
    The maximum number of files that can be registered per crawler. The default value is 1,000.

    Yes

    Save input metadata

    Select the checkbox if you want to save the input metadata extracted from the data source in ZIP files. The files can be useful for troubleshooting. Select this option only on request of Collibra Support. The Collibra Support team can provide the location of the saved ZIP files after the synchronization.

    This checkbox is not selected by default.

    No

    Integrate Schemas from Dataplex

    Select the checkbox if you want to integrate the schemas from Dataplex based on the crawler path that will be specified in the GCS integration configuration.
    If the checkbox is not selected, no Dataplex data will be ingested.

    This checkbox is selected by default.

    No

    Project IDs
    Add a comma-separated list of the Project IDs where Dataplex is enabled.
    The capability will search in these projects for schemas based on the crawler path that will be specified in the GCS integration configuration. If the Project IDs field is empty, the integration will search in the project included in the provided GCP Service Account Credentials JSON.

    No

    Advanced Configuration
    • Logging configuration
    • Memory
    • JVM arguments

    These configuration options help when investigating issues with the capability.

    Important Only complete the fields Logging configuration, Memory (MiB), and JVM arguments on request of or together with Collibra Support.

    No

    Debug

    This setting is not valid for this integration. It should be set to false.

    An option to automatically send Edge infrastructure log files to Collibra Data Intelligence Platform. By default, this option is set to false.

    Note We highly recommend to only send Edge infrastructure log files to Collibra Data Intelligence Platform when you have issues with Edge. If you set it to true, it will automatically revert to false after 24h.

    No

    Log level

    This setting is not valid for this integration. It should be set to No logging.

    An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.

    No

  5. Click Create.
    The capability is added to the Edge site.
    The fields become read-only.

What's next?

You can now register a GCS file system.