Profile and classify data via Edge

After you have configured the profiling and classification options, you can start the profiling and classification process for the schemas in the data source.

Tip Collibra Data Intelligence Cloud only has access to synchronized metadata, anonymized profiling results and classification suggestions, not to the actual data from your data source.

Prerequisites

Steps

  1. Open the Database asset page of a registered database.
  2. In the tab pane, click Configuration.
  3. Click the Profiling and classification tab.
    The Profiling and classification options open.

    Tip Only the synchronized schemas are available in the list.

    Important If you only want to profile and classify one or more schemas, ensure the default profiling and classification option is set to Don't profile unless specified in the schema-specific rules, and that you have only defined a specific rule for the relevant schemas.
  4. On the Profiling and classification tab page, click Run profiling and classification.
    Data Catalog triggers the Edge site to start a profiling and classification job.
    Depending on your profiling and classification options, the Edge site profiles and classifies all or some schemas, based on all synchronized metadata or on a sample.
  1. Open the Database asset page of a registered database.
  2. In the tab pane, click Configuration.
  3. Click the Profiling and classification tab.
    The Profiling and classification options open.

    Tip Only the synchronized schemas are available in the list.

  4. In the Default profiling and classification rule section, click Edit.
  5. Select Automatically run when a metadata extraction is synchronized.
  6. Synchronize one or more schemas.
    When the schemas are synchronized, Data Catalog automatically triggers the Edge site to start a profiling and classification job.
    Depending on your profiling and classification options, the Edge site profiles and classifies all or some schemas, based on all synchronized metadata or on a sample.
  1. Open the Database asset page of a registered database.
  2. In the tab pane, click Configuration.
  3. Click the Profiling and classification tab.
    The Profiling and classification options open.

    Tip Only the synchronized schemas are available in the list.

  4. In Synchronization schedule, click Add Schedule to add a new schedule, or to edit an existing schedule.
    The Edit schedule dialog box appears.
  5. Enter the required information.
    FieldDescription
    RepeatThe interval when you want to synchronize the schemas automatically, for example daily, weekly or based on a Cron expression.
    Cron

    The Quartz Cron expression that determines when the synchronization takes place.

    This field is only visible if you select Cron expression in the Repeat field.

    Every

    The day on which you want to synchronize the schemas, for example Sunday.

    This field is only visible if you select Weekly in the Repeat field.

    Every first

    The day of the month on which you want to synchronize the schemas , for example Tuesday.

    This field is only visible if you select Monthly in the Repeat field.

    At

    The time at which you want to synchronize the schemas automatically, for example 14:00.

    This field is only visible if you select Daily, Weekly or Monthly in the Repeat field.

    Time zoneThe time zone for the schedule.
  6. Click Save.
    The profiling and classification job starts according to the schedule.
    Depending on your profiling and classification options, the Edge site profiles and classifies all or some schemas, based on all synchronized metadata or on a sample.

What's next?

The Edge site completes the profiling and classification process and sends the results to Collibra Data Intelligence Cloud.

  • You can see the profiling and classification job in the list of activities.
  • You can find the profiling information and charts on the Data Profiling tab page of Table and Column asset pages.
  • You can find the suggested data classes and provide feedback on them via the Database, Schema and Table asset pages.
  • In the Configuration of the Database asset page, if a schema is profiled and classified, you see a check symbol () next to the schema name. If the profiling or classification of a schema failed, an exclamation mark () is shown.