Profile and classify data via Edge
After synchronizing schemas, you can start the profiling and classification process.
Note Collibra Data Intelligence Cloud only has access to synchronized metadata, anonymized profiling results and classification suggestions, but not to actual data from your data source.
Prerequisites
- You have created a support ticket to request access to Edge.
- You have created and installed an Edge site.
- Your Edge site has a global role with the following global permissions:
- Data Catalog
- Register Profiling Information
- Your Edge site has a JDBC profiling capability.
- You have enabled data source registration via Edge.
- You have enabled profiling and classification via Edge.
- You have registered a data source via Edge.
- You have synchronized one or more schemas of a registered database.
- You have configured the profiling options.
Steps
- Manually profile and classify
- Automatically profile and classify after each synchronization
- Automatically profile and classify based on a schedule
- Open the Database asset page of a registered database.
-
In the tab pane, click
Configuration. - Click the Profiling and Classification tab.
- On the Profiling and Classification tab page, click Run profiling and classification.
Data Catalog triggers the Edge site to start a profiling and classification job.
Depending on your profiling options, the Edge site profiles and classifies based on all synchronized metadata or on a sample.
- Open the Database asset page of a registered database.
-
In the tab pane, click
Configuration. - Click the Profiling and Classification tab.
- In the Profiling options section, click Edit.
- Select Automatically run when a metadata extraction is synchronized.
- Synchronize one or more schemas.
When the schemas are synchronized, Data Catalog automatically triggers the Edge site to start a profiling and classification job.
- Open the Database asset page of a registered database.
-
In the tab pane, click
Configuration. - Click the Profiling and Classification tab.
- In Synchronization schedule, click Add schedule to add a new schedule, or
to edit an existing schedule.
The Edit scheduling dialog box appears. - Enter the required information.
Field Description Repeat The interval when you want to synchronize the schemas automatically, for example daily, weekly or based on a Cron expression. CronThe Quartz Cron expression that determines when the synchronization takes place.
This field is only visible if you select
Cron expressionin the Repeat field.EveryThe day on which you want to synchronize the schemas, for example Sunday.
This field is only visible if you select
Weeklyin the Repeat field.Every firstThe day of the month on which you want to synchronize the schemas , for example Tuesday.
This field is only visible if you select
Monthlyin the Repeat field.AtThe time at which you want to synchronize the schemas automatically, for example 14:00.
This field is only visible if you select
Daily,WeeklyorMonthlyin the Repeat field.Timezone The time zone for the schedule. - Click Save.
All synchronized schemas rules are profiled and classified according to the schedule.
Depending on your profiling options, the Edge site profiles and classifies based on all synchronized metadata or on a sample.
What's next?
The Edge site starts the profiling and classification process and sends the results to Collibra Data Intelligence Cloud. You can see the profiling and classification job in the list of activities. Click the Result button to open the data profiling results.