About profiling via Edge

Data profiling creates a summary of a data source that is registered with Data Catalog. The summary mainly contains statistics and graphics to give the user an idea what the registered data is about.

After registering a data source via Edge and synchronizing a schema using an Edge site that also has a JDBC profiling capability, you can profile and classify the metadata. The Edge site then initiates the profiling and classification process and sends the results to Collibra Data Intelligence Cloud.

Note Collibra Data Intelligence Cloud only has access to synchronized metadata, anonymized profiling results and classification suggestions, but not actual data from your data source.

Limitations

Profiling via Edge has the following limitations:

Profiling options

Before you create a data profile of registered metadata, you have to indicate whether you want to profile everything or only a sample. You can also enable an option to automatically profile and classify synchronized metadata.

Option Description
Automatically run when a metadata extraction is synchronized

Enable to automatically create a data profile and classify columns every time the synchronization process of one or more schemas finishes.

This may take a long time. You can also add a schedule to profile and classify at regular intervals.

Full scan Select to profile and classify based on all synchronized metadata.
Partial scan

Select to profile and classify based on a sample of the synchronized metadata. When you select Partial scan, you can enter the maximum number of rows that you want to use for profiling and classification. By default, the maximum number of rows is 20000.

Tip Edge uses push down sampling to create a random sample of the metadata. This option is only available for data sources that support push down sampling.

Settings

Before you can profile via Edge, you have to configure the following settings in the Services Configuration section of the Collibra settings or in Collibra Console.

Section Setting Description
Register a data source

Database registration via Edge

An option to enable database registration via Edge.

  • True: Register a data source via Edge.
  • False: Register a data source via Jobserver only.

Note Enabling data source registration via Edge does not prevent you from registering a data source via Jobserver as well.

Data profiling

Anonymize data

This setting is no longer relevant. All profiled data is automatically anonymized.

Data profiling

Database profiling via Edge

An option to enable profiling and classifying synchronized metadata via Edge instead of Jobserver.

  • True: Profiling and classify via Edge.
  • False: Profile via Jobserver and classify via the Data Classification Platform.

Note You can only enable Database profiling via Edge if you also enabled Database registration via Edge.

Cloud Data Classification configuration

Enable data classification Ensure the Enable data classification option in Cloud Data Classification configuration is set to false.
If the Enable data classification option in Cloud Data Classification Configuration is set to true, the Classify button is available on Column and Table asset pages. This button allows you to classify data via the Data Classification Platform, However, when using profiling and classification via Edge, you no longer need the Data Classification Platform.