Set up Unified Data Classification
Choose an option below to explore the documentation for the latest user interface (UI) or the classic UI.
Before you can start configuring and using Unified Data Classification data classes, you have to set up your environment.
Prerequisites
- You have registered a data source via Edge and have synchronized one or more schemas.
For Databricks Unity Catalog, you have set up and integrated Databricks Unity Catalog data sources.
For Dataplex Universal Catalog, you have set up and integrated Dataplex Universal Catalog data sources. - You have a global role with the Product Rights > System administration global permission.
- You have a global role that has the Manage connections and capabilities global permission, for example, Edge integration engineer.
Steps
-
Ensure that the Unified Classification enabled setting is enabled and that the settings Classification job execution timeout and Maximum column length sum per classification capability request are defined as desired.
Show howDepending on your environment, follow this procedure either in Collibra Console or on the Services Configuration tab of the Collibra settings:
Important You can't edit the service configuration from the Settings page in the latest UI. If you use the latest UI, you can edit the service configuration only in Collibra Console. For more information, go to DGC service configuration settings.Prerequisites
- You have the ADMIN or SUPER role in Collibra Console.
- You have a global role with the Product Rights > System administration global permission.
- The Services Configuration tab is available in the Collibra settings.
Steps
-
Open the Services Configuration tab:
-
On the main toolbar, click
→
Settings.
The Settings page opens. - Click Services Configuration.
- Click Edit configuration.
Open the DGC service settings for editing:- Open Collibra Console.
Collibra Console opens with the Infrastructure page. - In the tab pane, expand an environment to show its services.
- In the tab pane, click the Data Governance Center service of that environment.
- Click Configuration.
- Click Edit configuration.
-
On the main toolbar, click
- In the Data Classification configuration section, verify that the Unified Classification enabled setting is enabled.
Setting
Description
Unified Classification enabled
Enables the Unified Data Classification method on Edge.
True (default):
The environment uses the new classification method, Unified Data Classification.
This has an impact on the available data classes, the required capabilities, and the way you classify data.- All existing data classes and classifications become unavailable.
- A migration process is available via setting Unified Classification migration tool enabled.
False: The feature is not enabled.
Note You don't need to enable the Enable Data Classification setting in the Data Classification configuration section. This setting is no longer applicable because it relates to a deprecated Data Classification method.
- If needed, update the Classification job execution timeout.
This timeout is the maximum amount of time (in seconds) that a data classification job can run until it is canceled. The default value is 604,800 (1 week). This is also the maximum value.
You can still change timeout limits for various stages in the classification job when you configure a Catalog Data Classification Edge capability. -
If needed, update the Maximum column length sum per classification capability request setting.
This setting specifies the maximum total length of all column names in a classification request. For example, if a classification request includes 3 columns, "CustomerID" (10 characters), "OrderDate" (9 characters), and "ProductCode" (11 characters), the total column length sum is 30 characters. The classification capability uses this setting value to check whether the total length exceeds the defined limit. If it does, an error message appears.
The default value is 10,000. You can enter a value between 200 and 20,000.
Adjusting this setting can help mitigate issues when classifying at the Schema or Database level, where many columns are included in a single request.
-
Click Save all.
-
For each data source that you want to classify, add the Catalog Data Classification capability to the Edge connection.
NoteIf you are classifying data from Databricks Unity Catalog or Dataplex Universal Catalog integration, the Catalog Data Classification capability is created automatically during synchronization. Therefore, you don't need to create one manually.
Show how- Open a site.
-
On the main toolbar, click
→
Settings.
The Settings page opens. -
In the tab pane, click Edge.
The Sites tab opens and shows a table with an overview of your sites. - In the table, click the name of the site whose status is Healthy.
The site page opens.
-
On the main toolbar, click
- In the Capabilities section, click Add capability.
The Add capability page opens. - Enter the required information.
Field Description Required Capability
This section contains general information about the capability.
Name
The name of the capability.
Yes
Description
The description of the capability.
No
Capability template
The capability template. The value that you select in this field determines which sections appear on the page.
Select the following capability:
Catalog Data Classification
Yes
Connection
This section contains information to connect to the data source.
JDBC connection
Yes
Other Settings This section can contain additional capability properties.
Currently, you can use it to configure timeout limits for the data classification job. Click + Add Other Settings to add settings.By default, timeout limits are defined for the various stages in the data classification job. The default value for each stage is 86,400 seconds, which is 1 day.
In the capability, you can change these timeouts.Timeout limit for the entire classification processThe relevant parameter names are:
- table-classification-timeout
- table-classification-timeout-unit
The maximum value is 604,800, which is 1 week.
ExampleType Value Type Name Example value Text
Plaintext
table-classification-timeout
2 Text Plaintext table-classification-timeout-unit
DAYS
Query timeout limit for getting samples from the data sourceThe relevant parameter names are:
- table-sampling-queries-timeout
- table-sampling-queries-timeout-unit
The maximum value is 604,800, which is 1 week.
ExampleType Value Type Name Example value Text
Plaintext
table-sampling-queries-timeout
43200 Text Plaintext table-sampling-queries-timeout-unit
SECONDS
Query timeout limit for processing the samplesThe relevant parameter names are:
- table-sampling-processing-timeout
- table-sampling-processing-timeout-unit
ExampleType Value Type Name Example value Text
Plaintext
table-sampling-processing-timeout
18 Text Plaintext table-sampling-processing-timeout-unit
HOURS 
For information on the process, go to Understanding the automatic data classification process
For any of the timeout limits, you have to add 2 settings:
-
The first setting is an amount, expressed as an integer, for example
1,12, or10,000. - The second setting is the unit of time, for example
DAYS,HOURS, orSECONDS.
You can specify any of the following units of timeNANOS: NanosecondsMICROS: MicrosecondMILLIS: MillisecondSECONDS: SecondsMINUTES: MinutesHOURS: HoursDAYS: Days
Important- Timeout limits that you configure in the Edge capability can't exceed the value that is set in the Classification job execution timeout setting in Console.
- The unit value you enter must be in uppercase, for example
SECONDS.
No
General
This section contains general information about logging.
Debug
An option to automatically send Edge infrastructure log files to Collibra Platform. By default, this option is set to false.
Note We highly recommend to only send Edge infrastructure log files to Collibra Platform when you have issues with Edge. If you set it to true, it will automatically revert to false after 24h.
No
Log level
An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.
No
- Click Create.
The capability is added to the Edge or Collibra Cloud site.
The fields become read-only.
- Open a site.
-
Give the Edge site user the following global permissions:
- Classification > Data Classes > Read
- Classification > Data Classes > List Values
- View Permissions > View All
-
Give your data stewards the global permissions they need. For more information, go to Required permissions.
The available classification permissions are:- Classification > Data Classes > Classify
- Classification > Data Classes > Read
- Classification > Data Classes > Add
- Classification > Data Classes > Update
- Classification > Data Classes > Remove
- Classification > Data Classes > Classify
Users with the correct permissions can now start configuring the data classes.
If another classification method was used before, you can migrate the old data classes and classifications to Unified Data Classification. For more information, go to Migrating to Unified Data Classification.