Profile and classify data via Edge

Important 

In Collibra 2024.02, we've launched a new user interface (UI) in beta for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

After you have configured the profiling and classification options, you can start the profiling and classification process for the schemas in the data source.

Tip Collibra Data Intelligence Platform only has access to synchronized metadata, profiling results, and classification suggestions, not to the actual data from your data source.

Important Advanced data types are not taken into account when profiling via Edge.

Important If you are using the Unified Data Classification method, the classification process does not automatically run at the same time as profiling. When you start the profiling and classification activity, only profiling results will be collected. You need to activate the classification process separately.

Before you begin

Required permissions

Steps

  1. Open the Database asset page of a registered database.
  2. In the tab panebar, click Configuration. In the tab panebar, click Configuration.
  3. Click the Profiling and Classification tab.
    The options open.

    Tip Only the synchronized schemas are available in the list.

    Important If you want to only profile and classify one or more schemas, ensure the default profiling and classification option is set to Do not Profile and Classify (unless specified in the schema-specific rule, and that you only define a specific rule for the relevant schemas.
  4. On the Profiling and Classification tab page, click Run Profiling and Classification.
    Data Catalog triggers the Edge site to start a profiling and classification job.
    Depending on your profiling and classification options, the Edge site profiles and classifies all or some schemas and tables, based on all synchronized metadata or on a subset.
  1. Open the Database asset page of a registered database.
  2. In the tab panebar, click Configuration. In the tab panebar, click Configuration.
  3. Click the Profiling and Classification tab.
    The options open.

    Tip Only the synchronized schemas are available in the list.

  4. In the Default Profiling and Classification Rule section, click Edit.
  5. Select Automatically run when a metadata extraction is synchronized.
  6. Synchronize one or more schemas.
    When the schemas are synchronized, Data Catalog automatically triggers the Edge site to start a profiling and classification job.
    Depending on your profiling and classification options, the Edge site profiles and classifies all or some schemas and tables, based on all synchronized metadata or on a subset.
  1. Open the Database asset page of a registered database.
  2. In the tab panebar, click Configuration. In the tab panebar, click Configuration.
  3. Click the Profiling and Classification tab.
    The options open.

    Tip Only the synchronized schemas are available in the list.

  4. In Synchronization Schedule, click Add Schedule to add a new schedule, or to edit an existing schedule.
    The Edit Schedule dialog box appears.
  5. Enter the required information.
    FieldDescription
    RepeatThe interval when you want to synchronize automatically. The possible values are: Daily, Weekly, Monthly, and Cron expression.
    Cron

    The Quartz Cron expression that determines when the synchronization takes place.

    This field is only visible if you select Cron expression in the Repeat field.

    Every

    The day on which you want to synchronize, for example, Sunday.

    This field is only visible if you select Weekly in the Repeat field.

    Every first

    The day of the month on which you want to synchronize, for example, Tuesday.

    This field is only visible if you select Monthly in the Repeat field.

    At

    The time at which you want to synchronize automatically, for example, 14:00.

    • You can only schedule on the hour. For example, you can add a synchronization schedule at 8:00, but not at 8:45. If you try to add it at 8:45, we will default it to 8:00. Use a cron expression if you don't want to schedule on the hour.
    • This field is only visible if you select Daily, Weekly, or Monthly in the Repeat field.
    Time zoneThe time zone for the schedule.
  6. Click Save.
    The profiling and classification job starts according to the schedule.
    Depending on your profiling and classification options, the Edge site profiles and classifies all or some schemas and tables, based on all synchronized metadata or on a subset.

What's next?

The Edge site completes the profiling and classification process and sends the results to Collibra Data Intelligence Platform.

  • You can see the profiling and classification job in the list of activities.
    When the activity is completed, the results page gives an overview of the profiled and classified data.
    If something goes wrong, the job is reported as failed. By default, the capability will try to collect the data, calculate the statistics, and send the results two times with each attempt taking 30 minutes. You can change this in the capability configuration.
  • You can find the profiling results and charts in the Table and Column asset pages.

    Note Columns mapped to following java.sql.Types are excluded from the profiling queries: ARRAY, BINARY, BLOB, CLOB, DATALINK, DISTINCT, JAVA_OBJECT, LONGVARBINARY, NCLOB, NULL, OTHER, REF, REF_CURSOR, ROWID, SQLXML, STRUCT, VARBINARY.

  • You can find the suggested data classes and provide feedback on them either via the Table asset page (in the Columns tab page), Column asset page (in the Data Profiling tab page), or the Physical Data Connector.

    Important If you are using the Unified Data Classification method, the classification process does not automatically run at the same time as profiling. When you start the profiling and classification activity, only profiling results will be collected. You need to activate the classification process separately.

  • In the Configuration tab page of the Database asset, if a schema is profiled and classified, you see a check symbol next to the schema name. If the profiling or classification of a schema failed, an exclamation mark is shown.