Enable profiling for Edge

To enable Edge profiling of synchronized metadata in Data Catalog, you need to run a command and enable multiple settings.

Depending on your environment, follow this procedure either on the Services Configuration tab of the Collibra settings or in Collibra Console:

Important You can't edit the Services Configuration from the Settings page in the latest UI. If you use the latest UI, you can configure settings only in Collibra Console. For more information, go to DGC service configuration settings.

Before you begin

You have enabled Database registration via Edge.

Required permissions

Steps

  1. Open the Services Configuration page.
    1. On the main toolbar, click Products icon, and then click Cogwheel icon Settings.
      The Collibra settings page opens.
    2. Click Services Configuration.
    3. Click Edit configuration.
    Open the DGC service settings for editing:
    1. Open Collibra Console.
      Collibra Console opens with the Infrastructure page.
    2. In the tab pane, expand an environment to show its services.
    3. In the tab pane, click the Data Governance Center service of that environment.
    4. Click Configuration.
    5. Click Edit configuration.
  2. In the Data profiling section, enter the required information:

    Setting

    Description

    Database profiling via Edge

    An option to enable profiling of synchronized metadata via Edge instead of Jobserver.

    • True: Profiling via Edge is active.
    • False: The profiling option via Edge is not active.

    Note You can enable Database profiling via Edge only if you also enabled Database registration via Edge.

    Maximum duration of a profiling Edge job

    The maximum time duration, in minutes, that a profiling Edge job can run before Data Profiling stops the job.

    The default value is 20,160 minutes, 2 days.
    You can increase this limit to a maximum of 4 days.

    Parallel schema profiling via Edge

    The maximum number of schemas that Edge can profile at the same time.

    By default, the value of this setting is 4. This means Edge processes four profiling jobs at a time. This can have a huge positive impact on the performance of the profiling activity.
    You can increase this number to a maximum of 16.

    Note 
    • If you increase this number to more than four jobs, make sure that your Edge site resources are aligned with the extra requests it will receive.
    • If you decrease this number and the running number of jobs exceeds the limit, no job will be canceled. Instead, there won't be any room to schedule a new job until at least one running job is completed.
    Example 

    The parallel schema profiling via Edge setting is set to 4.

    • For 1 database that contains 3 schemas, we will process all 3 schemas at the same time.
    • For 2 databases that contain 4 schemas in total, we will process all 4 schemas at the same time.
    • For 1 database that contains 8 schemas, we will start with 4 schemas and then proceed to the next ones as soon as a job is completed.
    Anonymize profiling data for all data types in Edge

    Enable this option to anonymize all Edge profiling results stored in Collibra.

    • True: Profiling results via Edge are anonymized for all columns.
    • False (default): Profiling results via Edge are anonymized only for columns with the Text or Geo data type.
    Note 

    You don't need to enable the Anonymize data (Jobserver) setting because this setting is not relevant for Edge.

  3. Click Save all.

What's next?

Continue the configuration for profiling.