Profile data via Edge

Important

In Collibra 2024.05, we launched a new user interface (UI) for Collibra Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

Latest UI Classic UI

After you have configured the profiling options, you can start the profiling process for the schemas in the data source.

Important Advanced data types are not taken into account when profiling via Edge.

Tip Collibra Platform only has access to synchronized metadata and profiling results, not to the actual data from your data source.

Prerequisites

You have configured the profiling options.
Your Edge site has a global role with the global permission Catalog.
Your Edge site has a global role with the global permission Register Profiling Information.
You have a global role with the Catalog global permission, for example, Catalog Author.
You have a global role with the View Edge connections and capabilities global permission, for example, Edge integration engineer.
You have a global role that has the Register profiling information global permission.

Steps

Manually profile
Automatically profile after each synchronization
Profile based on a schedule

Open the Database asset page of a registered database.
In the tab panebar, click Configuration. In the tab panebar, click Configuration.
Click the Profiling tab.
The options open.
Tip Only the synchronized schemas are available in the list.
Important If you want to profile only one or more schemas, ensure the default profiling option is set to Do Not Profile (unless specified in the schema-specific rule, and that you only define a specific rule for the relevant schemas.
On the Profiling tab page, click Run Profiling.
Data Catalog triggers the Edge or Collibra Cloud site to start a profiling job.
Depending on your profiling options, the Edge or Collibra Cloud site profiles all or some schemas and tables, based on all synchronized metadata or on a subset.

Open the Database asset page of a registered database.
In the tab panebar, click Configuration. In the tab panebar, click Configuration.
Click the Profiling tab.
The options open.
Tip Only the synchronized schemas are available in the list.
In the Default Rule section, click Edit.
Select Automatically Profile after Metadata Synchronization.
Synchronize one or more schemas.
When the schemas are synchronized, Data Catalog automatically triggers the Edge or Collibra Cloud site to start a profiling job.
Depending on your profiling options, the Edge or Collibra Cloud site profiles all or some schemas and tables, based on all synchronized metadata or on a subset.

Open the Database asset page of a registered database.
In the tab panebar, click Configuration. In the tab panebar, click Configuration.
Click the Profiling tab.
The options open.
Tip Only the synchronized schemas are available in the list.
In Synchronization Schedule, click Add Schedule to add a new schedule, or to edit an existing schedule.
The Edit Schedule dialog box appears.

Enter the required information.

Field	Description
Repeat	The interval when you want to synchronize automatically. The possible values are: Daily, Weekly, Monthly, and Cron expression.
Cron	The Quartz Cron expression that determines when the synchronization takes place. This field is only visible if you select `Cron expression` in the Repeat field.
Every	The day on which you want to synchronize, for example, Sunday. This field is only visible if you select `Weekly` in the Repeat field.
Every first	The day of the month on which you want to synchronize, for example, Tuesday. This field is only visible if you select `Monthly` in the Repeat field.
At	The time at which you want to synchronize automatically, for example, 14:00. You can only schedule on the hour. For example, you can add a synchronization schedule at 8:00, but not at 8:45. This field is only visible if you select `Daily`, `Weekly`, or `Monthly` in the Repeat field.
Time zone	The time zone for the schedule.

Click Save.
The profiling job starts according to the schedule.
Depending on your profiling options, the Edge or Collibra Cloud site profiles all or some schemas and tables, based on all synchronized metadata or on a subset.

What's next?

The Edge or Collibra Cloud site completes the profiling process and sends the results to Collibra Platform.

You can see the profiling job in the list of activities.
When the activity is completed, the results page gives an overview of the profiled data.
If something goes wrong, the job is reported as failed. By default, the capability will try to collect the data, calculate the statistics, and send the results two times with each attempt taking 30 minutes. You can change this in the capability configuration.
You can find the profiling results and charts in the Table and Column asset pages.
Note Columns mapped to following java.sql.Types are excluded from the profiling queries: ARRAY, BINARY, BLOB, CLOB, DATALINK, DISTINCT, JAVA_OBJECT, LONGVARBINARY, NCLOB, NULL, OTHER, REF, REF_CURSOR, ROWID, SQLXML, STRUCT, VARBINARY.

Note The Data Classification process does not automatically run at the same time as profiling. You need to activate the classification process separately.