Automatically associating assets with Data Categories via Data Classification

Collibra Data Classification provides an automated data category association tool via Guided Stewardship. This tool can automatically create a relation between a Column asset and a Data Category asset based on the classification of the column.

About automatic Data Category association via Data Classification

If you have defined Data Category assets in Collibra, you can link them to one or more data classes. As a result, if a column is classified with a data class that is linked to a data category, we'll automatically create a relation between the column and the data category. This way, through Data Classification, you can create a list of the columns that contain specific data categories such as PII.

The process works as follows:

  1. Create Data Category assets and connect data classes with the Data Category assets.
    You can do this via the Data Classification page or from the Data Category asset page.

    Tip You can link up to 100 Data Category assets to one data class. You can link multiple data classes to one Data Category asset.

  2. Classify the columns and accept the classification manually or automatically.
    The tool works for both the manual and automatic data classification, as long as data classes are defined in Collibra and the data classification is accepted.
  3. For columns with a data classification that is linked to a data category, a relation is automatically created between the Column asset and the Data Category asset.
Important 

If you have enabled the automatic acceptance and rejection feature, the automatic association process can have an impact on the classification duration. This is because, while we accept data classes, we also create the required relations. Also, while you can always edit a classification, enabling the automatic acceptance of classifications removes a step in your ability to confirm their accuracy, including the accuracy of, for instance, a personal data (PII) classification.

Tip 
  • When you remove a classification or a link between a data class and a Data Category asset, we also remove the relation between any columns and the related data category. Similarly, when you delete an asset or data class, we clean up any existing relations.
  • The automatic creation of relations generates a background system job that is performed by the system. Only administrators can follow up on the progress via the Activities page in the Collibra General settings.

Required permissions for automatic Data Category association

Action

Global Permission(*)

Resource Permission (**)

View permission
Connect a data class with a Data Category asset from the Data Classification page or the Data Category asset page.

Product rights > Catalog.

Product rights > Guided Stewardship.

If you use Unified Data Classification: Classification > Data Classes > Update.

Asset > Attribute > Add on the Data Category asset.

 
View the connections between data classes and Data Category assets in the Data Classification page or the Data Category asset page.

Product rights > Catalog.

Product rights > Guided Stewardship.

If you use Unified Data Classification: Classification > Data Classes > Read.

  View permission on the Data Category asset.

(*) As a user, you need a role that has these global permissions.

(**) As a user, you need a role that has these resource permissions.

Example of automatic Data Category association

Prerequisites

In this case, we have registered a column called “pem” that contains customer email addresses. We also created a Data Category asset called “Personal Identifiable Information”. We have set up Unified Data Classification and have imported the out-of-the-box data class “Email” and “Email (personal)” in the Data Classification page. We have enabled the automatic classification acceptance feature in the classification settings.

Steps

  1. Link the data class with the Data Category asset.
    1. On the main toolbar, click , then Stewardship.
    2. Click Data Classification.
    3. For the “Email” data class, double-click the Data Category cell in the table.
    4. Select the “Personal Identifiable Information” Data Category asset and confirm your selection.
    5. Do the same for the “Email (personal)”.

  2. Start the data classification process.
    In our case:

    1. Navigate to the “pem” asset page.

    2. Click Classify.
      When the classification job is completed, the “pem” column is classified as “Email (personal)” and automatically accepted by the automatic classification acceptance feature.

      A relation is automatically created between the column and the data category.

  3. Refresh the “pem” asset page and check that an “is categorized by data category” relation is available between the asset and the “Personal Identifiable Information” data category.