Configure Collibra Connect

We have made the decision to transition away from Collibra Connect to provide customers a wider range of integration options.

Our native Collibra integrations (connectors) will be easier to implement and maintain, provide a better return on investment, and allow you to grow with and derive greater value from Collibra:

  • Collibra integrations and Spring Boot based frameworks will replace Collibra Connect as options to build integrations going forward.
  • You can choose any ESB or integration method for your use case.
  • Our intention is to enable Collibra connectors to support ingestion as well as use cases for data profiling, data classification and other cloud functionalities.
  • If you have an enterprise MuleSoft license, you can easily switch to it. For details on how to switch from Connect licenses to MuleSoft licenses see this Collibra Support article.

Rest assured Connect templates are and will remain compatible with our product, please contact us for any Connect-related question. Only support or any upgrades on these products will be discontinued.

Note As of September 2022, you will need a MuleSoft Community Edition license or your own proprietary paid license to run Connect templates.

Resources:

With Collibra Connect, you can connect to Collibra Platform with a tool of your own choice. Collibra Connect acts as the gateway between your tool and Collibra. For more details on Collibra Connect, consult the Collibra Connect user guide.

In this section, you can learn how to set the credentials to access Collibra Connect with your own tool.

Depending on your environment, follow this procedure either on the Services Configuration tab of the Collibra settings or in Collibra Console:

Important You can't edit the service configuration from the Settings page in the latest UI. If you use the latest UI, you can edit the service configuration only in Collibra Console. For more information, go to Collibra service configuration settings.

Prerequisites

Steps

  1. Open the Services Configuration page.
    1. On the main toolbar, click Products iconCogwheel icon Settings.
      The Collibra settings page opens.
    2. Click Services Configuration.
    3. Click Edit configuration.
    Open the DGC service settings for editing:
    1. Open Collibra Console.
      Collibra Console opens with the Infrastructure page.
    2. In the tab pane, expand an environment to show its services.
    3. In the tab pane, click the Collibra Platform service of that environment.
    4. Click Configuration.
    5. Click Edit configuration.
  2. Go to the Data Classification section.
  3. Enter the required information:

    Setting

    Description

    Machine Learning platform URL

    This setting requires the SUPER role.

    The address of the machine learning platform that will classify your data.

    Requester Name

    This setting requires the SUPER role.

    The unique name to identify the client when using Machine Learning platform.

    API key

    This setting requires the SUPER role.

    The API Key to authorize the requester when connecting to the Machine Learning platform.

    Enable Data Classification

    • True: Enable Collibra's data classification technology.
    • False (default): Do not use Collibra's data classification technology are not accepted.

    This setting is no longer in use. For information, go to About Data Classification.

    Classification job execution timeout

    The maximum amount of time (in seconds) that a data classification job can run until it is canceled.

    The default value is 604,800, which is 1 week. This is also the maximum value.

    Tip This is a global timeout limit for the entire classification process. However, in your Edge classification capability, you can configure timeout limits for various job stages, for that job. Timeout limits that you configure in your Edge capability cannot exceed the value that you set in this Console setting.

    For more information, go to Set up Unified Data Classification.

    Unified Classification enabled

    Enables the Unified Data Classification method on Edge.

    • True (default): The environment uses the new classification method, Unified Data Classification. This has an impact on the available data classes, the required capabilities, and the way you classify data.

      Note All existing data classes and classifications become unavailable.

      Tip  A migration process is available via setting Unified Classification migration tool enabled.

    • False: The feature is not enabled.

    Unified Classification migration tool enabled

    Enables the Unified Data Classification migration process.

    • True: You can manually start the migration process in Unified Data Classification. The migration process:
      • Copies classification information from the old classification methods, old Edge method and Cloud Data Classification Platform, into the Unified Data Classification method.
      • Creates data classes in the Unified Data Classification method for existing Advanced Data Types (ADTs). ADTs are supported only for Jobserver, which will be end of life on September 30, 2024.

      For all information about the process, go to Migrating to Unified Data Classification.

    • False (default): The migration process is not available.
  4. If needed, configure the automatic classification acceptance and rejection.

    Setting

    Description

    Enable automatic classification acceptance and rejection

    True: The automatic acceptance and rejection of data classification suggestions is active.

    False (default): Data classification suggestions are not automatically accepted or rejected.

    Tip Start by manually accepting and rejecting a suggested data class. Only activate the automatic acceptance and rejection feature if you are comfortable with the data classification results.

    Automatic acceptance threshold

    The percentage from which data classification suggestions must be accepted automatically.
    If you set this value to 75, then the classification suggestions with a confidence level of 75% or higher are automatically accepted.

    If multiple classification suggestions meet the threshold condition for a column, the classification suggestion with the highest confidence level percentage is accepted automatically if this classification suggestion is the only one to have that confidence level percentage.

    Example 

    You set the automatic acceptance threshold to 85%. You classify a table with 2 columns.

    • For column A, three classification suggestions are possible, one with confidence level 93%, one with 92%, and one with 90%.
    • For column B, two classification suggestions are possible. Their confidence level is the same, 86%.

    The results of the automatic acceptance will be:

    • For column A, the classification suggestion with 93% will be accepted automatically.
    • For column B, nothing is done, both suggestions will be visible.

    The default acceptance threshold is 90.

    Automatic rejection threshold

    The percentage from which data classification suggestions must be rejected automatically. If you set this value to 49, then all data classification suggestions with a confidence level of 49% or lower are automatically rejected.

    The default rejection threshold is 10.

    Note If the acceptance threshold and rejection threshold are set to the same value, and a data classification suggestion has this confidence level percentage, the classification suggestion will be rejected.

  5. Click Save all.