Databricks examples

Important 

In Collibra 2024.05, we launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

This documentation contains examples of how Databricks behaves with respect to certain data protection standards and data access rules.

Example 

Suppose that:

  • The Personally Identifiable Information (PII) and Personal Information (PI) data categories exist in Databricks. These two data categories contain a column named DOB.
  • A standard that applies to the HR group has been created. This standard requires hashing for the PII data category.
  • A standard that applies to the Marketing group has been created. This standard requires default masking for the PI data category.
Behavior

When the standards are synchronized and active, a function is created in Databricks for each standard and linked to the DOB column. A single column masking policy that combines the two policies is then created and applied to the DOB column. This column masking policy includes the protection defined in each standard.

Copy
CASE
  WHEN (
    current_user() == 'HR'
    or is_account_group_member('HR')
  ) THEN hash(val)

WHEN (
    current_user() == 'Marketing'
    or is_account_group_member('Marketing')
  ) THEN 0
  ELSE val
END
Example 

Suppose that:

  • The Personally Identifiable Information (PII) data category exists in Databricks.
  • The Employee Data data set exists in Databricks. This data set contains PII.
  • A standard that applies to the following groups has been created: Everyone, Human Resources, Marketing, and Sales. This standard requires default masking for the PII data category.

    Image of the standard

    Image of the standard

  • A rule that applies to the Human Resources group has been created. This rule does not require any masking for the PII columns in the Employee Data asset.

    Image of the rule

    Image of the rule

Behavior

When the standard is synchronized and active, masking policies are created in Databricks—one policy for each column. The masking functions are named collibra_masking_policy_<asset ID>.

Databricks masking policies

The following image shows a masking policy for the STRING data type. The data that is shown in the policy depends on the masking level selected in the standard and rule. In the policy, val indicates the value as it is stored in the table.

Masking policy for STRING

According to the standard, the Everyone, Human Resources, Marketing, and Sales groups have masked access to the data. However, according to the rule, the Human Resources group has unmasked access to the data. As a result:

  • The column is not masked for the Human Resources group.
  • The column is masked for the Everyone, Marketing, and Sales groups.
Example 

Consider the above rule with the following row filter added: Show rows where the Salary data classification has the code set value of 1000.

Row filter in rule

Row filter in rule

Behavior

Functions

Copy
CREATE
OR REPLACE FUNCTION protect_dev_catalog.tpch_dev.COLLIBRA_ROW_ACCESS_POLICY_9ba9f188_3247_4837_a14a_dae2b48ae287(SALARY decimal(10, 0)) RETURN IF(
  (
    (
      current_user() == 'HR'
      or is_account_group_member('HR')
    )
    and SALARY IN (1000)
  ),
  true,
  false
)

The row access functions are named collibra_row_access_policy_<asset ID>. The masking and row access policy functions are created at the schema level in Databricks.

Note Protect for Databricks supports Databricks external tables.