Configure Cloud Data Classification Platform

When you want to use the Cloud Data Classification Platform in Data Catalog, you first have to configure it.

Prerequisites

  • You have the ADMIN or SUPER role in Collibra Console.

Steps

  1. Open the DGC service settings for editing:
    1. Open Collibra Console.
      Collibra Console opens with the Infrastructure page.
    2. In the tab pane, expand an environment to show its services.
    3. In the tab pane, click the Data Governance Center service of that environment.
    4. Click Configuration.
    5. Click Edit configuration.
  2. Go to the Data Classification section.
  3. Enter the required information:

    Setting

    Description

    Machine Learning platform URL

    This setting requires the SUPER role.

    The address of the machine learning platform that will classify your data.

    Requester Name

    This setting requires the SUPER role.

    The unique name to identify the client when using Machine Learning platform.

    API key

    This setting requires the SUPER role.

    The API Key to authorize the requester when connecting to the Machine Learning platform.

    Enable Data Classification

    • True: Enable Collibra's data classification technology.
    • False (default): Do not use Collibra's data classification technology are not accepted.

    This setting is no longer in use. For information, go to About Data Classification.

    Classification job execution timeout

    The maximum amount of time (in seconds) that a data classification job can run until it is canceled.

    The default value is 604,800, which is 1 week. This is also the maximum value.

    Tip This is a global timeout limit for the entire classification process. However, in your Edge classification capability, you can configure timeout limits for various job stages, for that job. Timeout limits that you configure in your Edge capability cannot exceed the value that you set in this Console setting.

    For more information, go to Set up Unified Data Classification.

    Unified Classification enabled

    Enables the Unified Data Classification method on Edge.

    • False: The feature is not enabled.

    Unified Classification migration tool enabled

    Enables the Unified Data Classification migration process.

    • True: You can manually start the migration process in Unified Data Classification. The migration process:
      • Copies classification information from the old classification methods, old Edge method and Cloud Data Classification Platform, into the Unified Data Classification method.
      • Creates data classes in the Unified Data Classification method for existing Advanced Data Types (ADTs). ADTs are supported only for Jobserver, which will be end of life on September 30, 2024.
    • False (default): The migration process is not available.
    Maximum column length sum per classification capability request

    This setting specifies the maximum total length of all column names in a classification request. For example, if a classification request includes 3 columns, "CustomerID" (10 characters), "OrderDate" (9 characters), and "ProductCode" (11 characters), the total column length sum is 30 characters. The classification capability uses this setting value to check whether the total length exceeds the defined limit. If it does, an error message appears.

    The default value is 10,000. You can enter a value between 200 and 20,000.

    Adjusting this setting can help mitigate issues when classifying at the Schema or Database level, where many columns are included in a single request.

  4. If needed, configure the automatic classification acceptance and rejection.

    Setting

    Description

    Enable automatic classification acceptance and rejection

    True: The automatic acceptance and rejection of data classification suggestions is active.

    False (default): Data classification suggestions are not automatically accepted or rejected.

    Tip Start by manually accepting and rejecting a suggested data class. Only switch to automatic acceptance and rejection if you are comfortable with the data classification results.

    Automatic acceptance threshold

    The percentage that determines when data classification suggestions are automatically accepted.
    For example, if you set this value to 75%, classification suggestions with a confidence level of 75% or higher are automatically accepted.

    If multiple classification suggestions meet the threshold for a column, the suggestion with the highest confidence level percentage is accepted automatically, as long as this suggestion is the only one to have that confidence level percentage. If two or more suggestions have the same confidence level, none are accepted automatically, and all remain visible.

    Example 

    You set the automatic acceptance threshold to 85% and classify a table with 2 columns.

    • For column A, there are 3 classification suggestions with confidence level 93%,92%, and 90%.
    • For column B, there are 2 classification suggestions with the same confidence level of 86%.

    The results of the automatic acceptance will be:

    • For column A, the classification suggestion with 93% is accepted automatically.
    • For column B, both suggestions remain visible, none are accepted automatically.

    The default acceptance threshold is 90.

    Automatic rejection threshold

    The percentage that determines when data classification suggestions are automatically rejected. For example, if you set this value to 49%, data classification suggestions with a confidence level of 49% or lower are automatically rejected.

    The default rejection threshold is 10.

    Note If the acceptance threshold and rejection threshold are set to the same value, and a data classification suggestion has this confidence level percentage, the classification suggestion will be rejected.

  5. Click Save all.