Configure data profiling behavior
In Collibra Data Intelligence Cloud, you can create a data profile when you register a data source. You can configure the behavior of data profiling.
Depending on your environment, you have to follow this procedure either in the Services Configuration section of the Collibra settings or in Collibra Console:
Prerequisites
- You have the ADMIN role in Collibra Console.
- You have a global role that has the System administration global permission.
- Services Configuration in the Collibra settings is enabled.
Steps
-
Open the Services Configuration page.
-
In the main menu, click
, then
Settings.
The Collibra settings page opens. -
In the tab pane, click
Services Configuration.
Open the DGC service settings for editing:- Open Collibra Console.
Collibra Console opens with the Infrastructure page. - In the tab pane, expand an environment to show its services.
- In the tab pane, click the Data Governance Center service of that environment.
- Click Configuration.
- Click Edit configuration.
-
In the main menu, click
- In the section Data Profiling, make the necessary changes.
Setting Description Maximum number of samples The maximum number of rows taken as a sample during profiling. Maximum value length The maximum length of a value extracted during profiling or sampling. Additional characters are trimmed. Default date pattern The default format used to decode dates. It is the default pattern used for detecting dates when the Date Pattern and/or Time Pattern attribute is not specified in Column assets. Default time pattern The default format used to decode times. It is the default pattern used for detecting times when the Date Pattern and/or Time Pattern attribute is not specified in Column assets. Default combined date and time pattern The default format used to decode combined dates and times. It is the default pattern used for detecting combined dates and times when the Date Pattern and/or Time Pattern attribute is not specified in Column assets. Empty values A comma separated list of strings enclosed in double quotes. A value that matches one of those expressions is considered an empty value.
Please note that a database null value is always considered an empty value, for example "", "na" and "none".
Data type detection threshold The percentage of matching Column values to reach for an Advanced Data Type to be considered a possible Data Type for that Column. This is expressed as a value between 0.0 and 1.0). An option to anonymize sensitive data.
True: Content in columns with data type Text or Geo is removed or replaced by a random hash value before the profiling results are sent to the cloud.
False (default): No content is removed or replaced by a random hash value.
Database profiling via Edge An option to enable profiling and classifying synchronized metadata via Edge instead of Jobserver.
True: Profiling and classify via Edge.
False: Profile via Jobserver and classify via the Data Classification Platform.
Note You can only enable Database profiling via Edge if you also enabled Database registration via Edge.
- Click the green Save all button.