Profiling via Edge

When you register a data source via Edge using an Edge site that also has a JDBC profiling capability, you can start the profiling and classification process via the Configuration tab page on the Database asset page. The Edge site then initiates the profiling and classification process and sends the results to Collibra Data Intelligence Cloud.

Note Collibra Data Intelligence Cloud only has access to synchronized metadata, anonymized profiling results and classification suggestions, but not actual data from your data source.

Warning Edge is only available in preview mode. If you want to profile and classify your data via Edge, you have to create a support ticket to request access to Edge.

Limitations

Currently, profiling via Edge has a few limitations:

Profiling options

Before you create a data profile of registered metadata, you have to indicate whether you want to profile everything or only a sample. You can also enable an option to automatically profile and classify synchronized metadata.

Option Description
Automatically run when a metadata extraction is synchronized Enable to automatically create a data profile and classify columns every time the synchronization process of one or more schemas finishes.
Full scan Select to profile and classify based on all synchronized metadata.
Partial scan

Select to profile and classify based on a sample of the synchronized metadata. When you select Partial scan, you can enter the maximum number of rows that you want to use for profiling and classification. By default, the maximum number of rows is 20000.

Tip Edge uses push down sampling to create a random sample of the metadata.

Settings

Before you profile via Edge, you must consider the following settings.

Section

Setting

Description

Data profiling

Anonymize data

This setting is no longer relevant. All profiled data is automatically anonymized.

Cloud Classification configuration

Enable data classification

If the Enable data classification option is set to True, profiling and classification via Edge is disabled. As a result, you can classify data via the Data Classification Platform, by clicking the Classify button on Column and Table asset pages.