Anonymize data via Jobserver
You can enable or disable the option to anonymize the content of columns with data type TEXT and GEO after the profiling process via Jobserver.
Tip If you profiling and classify via an Edge site, the profiling results are automatically anonymized.
Warning Currently, if you enable the data anonymization process you can no longer use automatic data classification via the Data Classification platform. However, you can still classify and anonymize profiling results if you use Edge.
Depending on your environment, you have to follow this procedure either in the Services Configuration section of the Collibra settings or in Collibra Console:
Prerequisites
- You have the ADMIN role in Collibra Console.
- You have a global role that has the System administration global permission.
- Platform configuration in the Collibra settings is enabled.
Steps
-
Open the Services Configuration page.
-
In the main menu, click
, then
Settings.
The Collibra settings page appears. -
In the tab pane, click
Services Configuration.
Open the DGC service settings for editing:- Open Collibra Console.
Collibra Console opens with the Infrastructure page. - In the tab pane, expand an environment to show its services.
- In the tab pane, click the Data Governance Center service of that environment.
- Click Configuration.
- Click Edit configuration.
-
In the main menu, click
- In the Data Profiling section, enter the required information:
Setting Description Maximum number of samples The maximum number of rows taken as a sample during profiling. Maximum value length The maximum length of a value extracted during profiling or sampling. Additional characters are trimmed. Default date pattern The default format used to decode dates. It is the default pattern used for detecting dates when the Date Pattern and/or Time Pattern attribute is not specified in Column assets. Default time pattern The default format used to decode times. It is the default pattern used for detecting times when the Date Pattern and/or Time Pattern attribute is not specified in Column assets. Default combined date and time pattern The default format used to decode combined dates and times. It is the default pattern used for detecting combined dates and times when the Date Pattern and/or Time Pattern attribute is not specified in Column assets. Empty values A comma separated list of strings enclosed in double quotes. A value that matches one of those expressions is considered an empty value.
Please note that a database null value is always considered an empty value, for example "", "na" and "none".
Data type detection threshold The percentage of matching Column values to reach for an Advanced Data Type to be considered a possible Data Type for that Column. This is expressed as a value between 0.0 and 1.0). An option to anonymize sensitive data.
True: Content in columns with data type Text or Geo is removed or replaced by a random hash value before the profiling results are sent to the cloud.
False (default): No content is removed or replaced by a random hash value.
Database profiling via Edge An option to enable profiling and classifying synchronized metadata via Edge instead of Jobserver.
True: Profiling and classify via Edge.
False: Profile via Jobserver and classify via the Data Classification Platform.
Note You can only enable Database profiling via Edge if you also enabled Database registration via Edge.
- Click the green Save all button.