About profiling via Edge
Data profiling creates a summary of a data source that is registered with Data Catalog. The summary mainly contains statistics and graphics to give the user an idea what the registered data is about.
After registering a data source via Edge and synchronizing a schema using an Edge site that also has a JDBC profiling capability, you can profile and classify the metadata. The Edge site then initiates the profiling and classification process and sends the results to Collibra Data Intelligence Cloud.
Note Collibra Data Intelligence Cloud only has access to synchronized metadata, anonymized profiling results and classification suggestions, but not actual data from your data source.
Limitations
Profiling via Edge has the following limitations:
- Advanced data types are not supported.
- Not all data sources are certified for Edge.
Profiling options
Before you create a data profile of registered metadata, you have to indicate whether you want to profile everything or only a sample. You can also enable an option to automatically profile and classify synchronized metadata.
| Option | Description |
|---|---|
| Automatically run when a metadata extraction is synchronized |
Enable to automatically create a data profile and classify columns every time the synchronization process of one or more schemas finishes. This may take a long time. You can also add a schedule to profile and classify at regular intervals. |
| Full scan | Select to profile and classify based on all synchronized metadata. |
| Partial scan |
Select to profile and classify based on a sample of the synchronized metadata. When you select Partial scan, you can enter the maximum number of rows that you want to use for profiling and classification. By default, the maximum number of rows is 20000. Tip Edge uses push down sampling to create a random sample of the metadata. This option is only available for data sources that support push down sampling. |
Settings
Before you can profile via Edge, you have to configure the following settings in the Services Configuration section of the Collibra settings or in Collibra Console.
| Section | Setting | Description |
|---|---|---|
| Register a data source |
An option to enable database registration via Edge.
Note Enabling data source registration via Edge does not prevent you from registering a data source via Jobserver as well. |
|
|
Data profiling |
This setting is no longer relevant. All profiled data is automatically anonymized. |
|
|
Data profiling |
An option to enable profiling and classifying synchronized metadata via Edge instead of Jobserver.
Note You can only enable Database profiling via Edge if you also enabled Database registration via Edge. |
|
|
Cloud Data Classification configuration |
Enable data classification | Ensure the Enable data classification option in Cloud Data Classification configuration is set to false.If the Enable data classification option in Cloud Data Classification Configuration is set to true, the Classify button is available on Column and Table asset pages. This button allows you to classify data via the Data Classification Platform, However, when using profiling and classification via Edge, you no longer need the Data Classification Platform. |