Anonymize data via Jobserver

You can enable or disable the option to anonymize the content of columns with data type TEXT and GEO after the profiling process via Jobserver.

Tip If you profile and classify via Edge, data in columns with data type Text or Geo is automatically anonymized before it is sent to Collibra Data Intelligence Cloud.

Warning Currently, if you enable the data anonymization process you can no longer use automatic data classification via the Data Classification platform. However, you can still classify and anonymize profiling results if you use Edge.

Depending on your environment, follow this procedure either in the Services Configuration section of the Collibra settings or in Collibra Console:

Prerequisites

Steps

  1. Open the Services Configuration page.
    1. On the main menu, click , and then click Settings.
      The Collibra settings page opens.
    2. In the tab pane, click Services Configuration.
    3. Click Edit configuration.
    Open the DGC service settings for editing:
    1. Open Collibra Console.
      Collibra Console opens with the Infrastructure page.
    2. In the tab pane, expand an environment to show its services.
    3. In the tab pane, click the Data Governance Center service of that environment.
    4. Click Configuration.
    5. Click Edit configuration.
  2. In the Data Profiling section, enter the required information:
    SettingDescription
    Maximum number of samplesThe maximum number of samples you want to collect for a data source. The default value is 100. The maximum value is 1,000.
    This setting is specific to sample data.
    Maximum value lengthThe maximum length of a value extracted during profiling or sampling. Additional characters are trimmed.
    Default date patternThe default format used to decode dates. It is the default pattern used for detecting dates when the Date Pattern and/or Time Pattern attribute is not specified in Column assets.
    Default time patternThe default format used to decode times. It is the default pattern used for detecting times when the Date Pattern and/or Time Pattern attribute is not specified in Column assets.
    Default combined date and time patternThe default format used to decode combined dates and times. It is the default pattern used for detecting combined dates and times when the Date Pattern and/or Time Pattern attribute is not specified in Column assets.
    Empty values

    A comma separated list of strings enclosed in double quotes. A value that matches one of those expressions is considered an empty value.

    Please note that a database null value is always considered an empty value, for example "", "na" and "none".

    Data type detection thresholdThe percentage of matching Column values to reach for an Advanced Data Type to be considered a possible Data Type for that Column. This is expressed as a value between 0.0 and 1.0).

    Anonymize data

    An option to anonymize sensitive data.

    • True: Content in columns with data type Text or Geo is removed or replaced by a random hash value before the profiling results are sent to the cloud.
    • False (default): No content is removed or replaced by a random hash value.

    Tip If you profile and classify via Edge, the data in columns with data type Text or Geo is automatically anonymized before it is sent to Collibra Data Intelligence Cloud.

    Database profiling via Edge

    An option to enable profiling and classifying synchronized metadata via Edge instead of Jobserver.

    • True: Profiling and classification via Edge.
    • False: Profile via Jobserver and classify via the Data Classification Platform.

    Note You can only enable Database profiling via Edge if you also enabled Database registration via Edge.

    Parallel database profiling via Edge

    The maximum number of databases that Edge can profile and classify at the same time.

    Note Schemas in a database are always processed sequentially.

    By default, the value of the setting is one. This means Edge processes one profiling job at a time. The maximum value is four.
    If you change this setting, you must restart Collibra.

  3. Click Save all.