About profiling via Edge

Important 

In Collibra 2024.05, we launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

Data profiling creates a summary of a data source that is registered with Data Catalog and determines the data type of columns in the data source. The summary mainly contains statistics and graphics to give the user an idea what the registered data is about.

Important Advanced data types are not taken into account when profiling via Edge.

Note The Unified Data Classification process does not automatically run at the same time as profiling. You need to activate the classification process separately.

When you registered a data source via Edge and you have created a profiling capability, you can profile the data via the Database asset page of the registered data source.
Edge profiles the data on the Edge site itself and only sends the results to Collibra Data Intelligence Platform. The profiling results are automatically anonymized based on your anonymization configuration before they are sent to Collibra Data Intelligence Platform.

As a result, if you register and profile a data source via Edge:

  • Data Catalog has access to synchronized metadata and profiling results.
  • Data Catalog doesn't have access to the actual data from your data source.

Profiling steps in Edge

Step

Description

Before you start

Enable profiling via Edge

Create an Edge site with a JDBC connection, a JDBC ingestion capability, and a JDBC profiling capability.
Register a data source via Edge.

Synchronize one or more schemas.
Configure the profiling options for the synchronized schemas.
Profile the data.
The Edge site will initiate the profiling process and send the results to Collibra Data Intelligence Platform.

Tip You can trigger the profiling job manually, set up a schedule, or trigger it after synchronizing a schema.

Note The Unified Data Classification process does not automatically run at the same time as profiling. You need to activate the classification process separately.

Data used to create profiling results via Edge

To create the profiling results, Data Catalog uses a representative set of the data from the data source.

Note This data is not the same as the sample data that can be available for an asset.

Edge profiles the data on the Edge site itself and only sends the profiling results to Collibra Data Intelligence Platform.

  • If you use all rows, all the rows in a data source table are used by Edge for profiling, without limit.
  • If you use a random set of rows, the data source randomly selects data and sends it to Edge for profiling.

    Warning Only some data sources support the use of random rows. To verify if your data source allows it, go to Collibra-provided JDBC drivers.

For more information, go to Configure the profiling options via Edge.