Set up Unified Data Classification
Before you can start configuring and using Unified Data Classification data classes, you have to set up your environment.
Prerequisites
- You have created and installed an Edge site.
- You have register a data source via Edge
- You have synchronized one or more schemas
- You have a global role that has the System administration global permission.
- You have a global role that has the Manage connections and capabilities global permission, for example, Edge integration engineer.
Steps
-
Ensure that the Unified Classification enabled setting is enabled and that the Classification job execution timeout setting is defined as desired.
Show howDepending on your environment, follow this procedure either on the Services Configuration tab of the Collibra settings or in Collibra Console:
Important You can't edit the service configuration from the Settings page in the latest UI. If you use the latest UI, you can edit the service configuration only in Collibra Console. For more information, go to Collibra service configuration settings.Requirements and permissions
- You have the ADMIN or SUPER role in Collibra Console.
- You have a global role that has the System administration global permission.
- The Services Configuration tab is available in the Collibra settings.
Steps
-
Open the Services Configuration page.
-
On the main toolbar, click
→
Settings.
The Collibra settings page opens. - Click Services Configuration.
- Click Edit configuration.
Open the DGC service settings for editing:- Open Collibra Console.
Collibra Console opens with the Infrastructure page. - In the tab pane, expand an environment to show its services.
- In the tab pane, click the Collibra Platform service of that environment.
- Click Configuration.
- Click Edit configuration.
-
On the main toolbar, click
- In the Beta features section, verify that the Unified Classification enabled setting is enabled.
Setting
Description
Unified Classification enabled
Enables the new Unified Data Classification method on Edge.
True (default): The environment uses the new classification method, Unified Data Classification. This has an impact on the available data classes, the required capabilities, and the way you classify data.
Note All existing data classes and classifications become unavailable.
Tip A migration process is available via setting Unified Classification migration tool enabled.
False: The feature is not enabled.
Note You don't need to enable the Enable Data Classification setting in the Data Classification configuration section. This setting is no longer applicable because it relates to a deprecated Data Classification method.
- In the Data Classification configuration section, update the Classification job execution timeout if needed.
This timeout is the maximum amount of time (in seconds) that a data classification job can run until it is canceled. The default value is 604,800, which is 1 week. This is also the maximum value.Note You can still change timeout limits for various stages in the classification job when you configure a Catalog Data Classification Edge capability.
- Click Save all.
-
For each data source that you want to classify, add the Catalog Data Classification capability to the Edge connection.
Note If you used the old Edge classification method before, it's possible that Collibra already created the capabilities for you with the upgrade to 2024.07. For more information, go to Migrating to Unified Data Classification.
Show how- Open an Edge site.
-
On the main toolbar, click
→
Settings.
The Collibra settings page opens. -
In the tab pane, click Edge.
The Sites tab opens and shows a table with an overview of the Edge sites. - In the table, click the name of the Edge site whose status is Healthy.
The Edge site page opens.
-
On the main toolbar, click
- In the Capabilities section, click Add capability.
The Add capability page is shown. - Enter the required information.
Field Description Required Capability
This section contains general information about the capability.
Name
The name of the Edge capability.
Yes
Description
The description of the Edge capability.
No
Capability template
The capability template. The value that you select in this field determines which sections appear on the page.
Select the following Edge capability:
Catalog Data Classification
Yes
Connection
This section contains information to connect to the data source.
JDBC connectionYes
Other Settings This section can contain additional capability properties.
Currently, you can use it to configure timeout limits for the data classification job. Click + Add Other Settings to add settings.By default, timeout limits are defined for the various stages in the data classification job. The default value for each stage is 86,400 seconds, which is 1 day.
In the capability, you can change these timeouts.Timeout limit for the entire classification processThe relevant parameter names are:
- table-classification-timeout
- table-classification-timeout-unit
The maximum value is 604,800, which is 1 week.
ExampleType Value Type Name Example value Text
Plaintext
table-classification-timeout
2 Text Plaintext table-classification-timeout-unit
DAYS Query timeout limit for getting samples from the data sourceThe relevant parameter names are:
- table-sampling-queries-timeout
- table-sampling-queries-timeout-unit
The maximum value is 604,800, which is 1 week.
ExampleType Value Type Name Example value Text
Plaintext
table-sampling-queries-timeout
43200 Text Plaintext table-sampling-queries-timeout-unit
SECONDS Query timeout limit for processing the samplesThe relevant parameter names are:
- table-sampling-processing-timeout
- table-sampling-processing-timeout-unit
ExampleType Value Type Name Example value Text
Plaintext
table-sampling-processing-timeout
18 Text Plaintext table-sampling-processing-timeout-unit
HOURS For information on the process, go to Understanding the automatic data classification process
For any of the timeout limits, you have to add 2 settings:
-
The first setting is an amount, expressed as an integer, for example
1
,12
, or10,000
. - The second setting is the unit of time, for example
DAYS
,HOURS
, orSECONDS
.
You can specify any of the following units of timeNANOS
: NanosecondsMICROS
: MicrosecondMILLIS
: MillisecondSECONDS
: SecondsMINUTES
: MinutesHOURS
: HoursDAYS
: Days
Important- Timeout limits that you configure in the Edge capability can't exceed the value that is set in the Classification job execution timeout setting in Console.
- The unit value you enter must be in uppercase, for example
SECONDS
.
No
General
This section contains general information about logging.
Debug
An option to automatically send Edge infrastructure log files to Collibra Platform. By default, this option is set to false.
Note We highly recommend to only send Edge infrastructure log files to Collibra Platform when you have issues with Edge. If you set it to true, it will automatically revert to false after 24h.No
Log level
An option to determine the verbosity level of Catalog connector log files. By default, this option is set to No logging.
No
- Click Create.
The capability is added to the Edge site.
The fields become read-only.
- Open an Edge site.
-
Give the Edge Site user the following global permissions:
- Classification > Data Classes > Read
- Classification > Data Classes > List Values
- View Permissions > View All
-
Give your data stewards the global permissions they need. For more information, go to Required permissions.
TipThe available classification permissions are:
- Classification > Data Classes > Classify
- Classification > Data Classes > Read
- Classification > Data Classes > Add
- Classification > Data Classes > Update
- Classification > Data Classes > Remove
- Classification > Data Classes > Classify
What's next?
Users with the correct permissions can now start configuring the data classes.
Tip If another classification method was used before, you can migrate the old data classes and classifications to Unified Data Classification. For more information, go to Migrating to Unified Data Classification.