Configure data profiling behavior

In Collibra Data Intelligence Cloud, you can create a data profile when you register a data source. You can configure the behavior of data profiling.

Depending on your environment, you have to follow this procedure either in the Services Configuration section of the Collibra settings or in Collibra Console:

Prerequisites

Steps

  1. Open the Services Configuration page.
    1. In the main menu, click , then Settings.
      The Collibra settings page opens.
    2. In the tab pane, click Services Configuration.
    Open the DGC service settings for editing:
    1. Open Collibra Console.
      Collibra Console opens with the Infrastructure page.
    2. In the tab pane, expand an environment to show its services.
    3. In the tab pane, click the Data Governance Center service of that environment.
    4. Click Configuration.
    5. Click Edit configuration.
  2. In the section Data Profiling, make the necessary changes.
    SettingDescription
    Maximum number of samplesThe maximum number of rows taken as a sample during profiling.
    Maximum value lengthThe maximum length of a value extracted during profiling or sampling. Additional characters are trimmed.
    Default date patternThe default format used to decode dates. It is the default pattern used for detecting dates when the Date Pattern and/or Time Pattern attribute is not specified in Column assets.
    Default time patternThe default format used to decode times. It is the default pattern used for detecting times when the Date Pattern and/or Time Pattern attribute is not specified in Column assets.
    Default combined date and time patternThe default format used to decode combined dates and times. It is the default pattern used for detecting combined dates and times when the Date Pattern and/or Time Pattern attribute is not specified in Column assets.
    Empty values

    A comma separated list of strings enclosed in double quotes. A value that matches one of those expressions is considered an empty value.

    Please note that a database null value is always considered an empty value, for example "", "na" and "none".

    Data type detection thresholdThe percentage of matching Column values to reach for an Advanced Data Type to be considered a possible Data Type for that Column. This is expressed as a value between 0.0 and 1.0).

    Anonymize data

    An option to anonymize sensitive data.

    • True: Content in columns with data type Text or Geo is removed or replaced by a random hash value before the profiling results are sent to the cloud.
    • False (default): No content is removed or replaced by a random hash value.
    Database profiling via Edge

    An option to enable profiling and classifying synchronized metadata via Edge instead of Jobserver.

    • True: Profiling and classify via Edge.
    • False: Profile via Jobserver and classify via the Data Classification Platform.

    Note You can only enable Database profiling via Edge if you also enabled Database registration via Edge.

  3. Click the green Save all button.