Anonymization of profiling results via Edge

Important 

In Collibra 2024.02, we've launched a new user interface (UI) in beta for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

As profiling results are stored in Collibra, anonymization can be an important tool for your company's security requirements. The anonymization happens at the end of the profiling process via Edge, before the results are sent to Collibra.
Anonymization means that values are replaced by random hash values. Identical values in a column get the same hash value so that you can still recognize the values as identical.

Collibra provides two types of anonymization of profiling results via Edge:

  • By default, profiling results are automatically anonymized for columns with the Text or Geo data type.

    Note Edge detects the data type of a column during profiling and only anonymizes the results if the data type attribute is Text or Geo. However, if Edge detects a data type that does not correctly correspond with the actual data type, some data may not be anonymized or may have been wrongfully anonymized. To solve this, you can manually modify the column's data type and profile again.

    In this case, we:

    • Anonymize data distribution charts.
    • Anonymize Mode attributes
    • Anonymize Percentiles.
  • An administrator can also decide to anonymize the profiling results for all columns. This can make sure that all sensitive data is anonymized.
    In that case, we:
    • Anonymize frequency charts, categories, and Mode attributes.
    • Don't store or show data distribution charts.
    • Don't store or show Percentiles.
    • Don't store or show basic statistics such as Mean, Median, Variance, Standard deviation, Minimum value, and Maximum value.

    The required setting to activate this is "Anonymize Edge profiling results for all data types".

Important 

Sample data via Edge is always displayed in full. Sample data is not anonymized because:

  • Having access to the data examples via Edge is based on permissions.
  • If you have permission to view sample data, the examples will be collected and shown for a limited amount of time. The sample data is not stored in the Cloud or Collibra. It is collected on the Edge site and cached there for a maximum of 48 hours. For more details, go to the Sample data documentation.
Example 

You have profiled and classified a column with Text data via Edge.
If you go to the Summary, Overview or Data Profiling tab, all profiling results are removed or replaced by hashed values.

Important 

However, you see the sample data in full. Sample data is not anonymized because:

  • Having access to the data examples via Edge is based on permissions.
  • If you have permission to view sample data, the examples will be collected and shown for a limited amount of time. The sample data is not stored in the Cloud or Collibra. It is collected on the Edge site and cached there for a maximum of 48 hours. For more details, go to the Sample data documentation.