Configure the profiling and classification options via Edge

Through the profiling and classification options, you can determine:

  • whether you want to start the profiling and classification process automatically after each synchronization.
  • the default profiling behavior for the schemas, such as whether the profiling is based on all data or on a random subset of the data.
  • whether specific schemas do not use the default behavior but instead have their own behavior .
  • which schemas you want to profile and classify.

Prerequisites

Steps

  1. Open a Database asset page.
  2. In the tab pane, click Configuration.
  3. Click the Profiling and classification tab.
    The Profiling and classification options open.

    Tip Only the synchronized schemas are available in the list.

  4. In the Default profiling and classification rule section, click Edit.
  5. Enter the required information.
    OptionDescription
    Automatically run when a metadata extraction is synchronized

    Enable to automatically create a data profile and classify columns every time the synchronization process of one or more schemas finishes.

    This may take a long time. You can also add a schedule to profile and classify at regular intervals.

    Don't profile unless specified in the schema-specific rules

    Select if you don't want to define a default profiling behavior for the schemas.

    Important Use this option if you only want to profile and classify some of the schemas.
    If you select this option, Collibra only profiles and classifies the schemas for which a specific profiling and classification rule has been defined.

    Full scanSelect to, by default, profile the schemas based on all data.
    Partial scan

    Select to, by default, profile schemas based on a subset of the data.
    If you select this option, the Maximum number of rows field becomes available. You can enter the maximum number of rows that you want to use for profiling. By default, the maximum number of rows is 20 000.

    Note This option is only available for some data sources.

  6. Click Save.
  7. If you want to define a specific profiling and classification rule for a schema:
    1. In the Schema profiling and classification rules section, select the schema.
      The schema-specific information opens.
    2. Do one of the following:
      • To create a new table rule, click Add table rule.
      • To edit an existing table rule, click Edit .
    3. Enter the required information.
      OptionDescription
      Full scanSelect to profile the schema based on all data.
      Partial scanSelect to profile the schema based on a subset of the data.
      If you select this option, the Number of rows scanned (max) field becomes available. You can enter the maximum number of rows that you want to use for profiling and classification. By default, the maximum number of rows is 20 000.

      Note This option is only available for some data sources.

    4. Click Save.

What's next?

You can now profile and classify the data manually, automatically or add a schedule.