Databricks examples

Important

In Collibra 2024.05, we launched a new user interface (UI) for Collibra Platform! You can learn more about this latest UI in the UI overview.

Use the following options to see the documentation in the latest UI or in the previous, classic UI:

Latest UI Classic UI

This topic contains examples of how Databricks behaves with respect to certain data protection standards and data access rules.

Example

Suppose that:

The Personally Identifiable Information (PII) and Personal Information (PI) data categories exist in Databricks. These two data categories contain a column named DOB.
A standard is set for the HR group. This standard requires hashing for the PII data category.
A standard is set for the Marketing group. This standard requires default masking for the PI data category.

Behavior

When the standards are synchronized and active, a function is created in Databricks for each standard and linked to the DOB column. A single column masking policy that combines the two policies is then created and applied to the DOB column. This column masking policy includes the protection defined in each standard.

Copy

CASE
  WHEN (
    current_user() == 'HR'
    or is_account_group_member('HR')
  ) THEN hash(val)

WHEN (
    current_user() == 'Marketing'
    or is_account_group_member('Marketing')
  ) THEN 0
  ELSE val
END

Example

Suppose that:

The Personally Identifiable Information (PII) data category exists in Databricks.
The Employee Data data set exists in Databricks. This data set contains PII.
A standard is set for the following groups: Everyone, Human Resources, Marketing, and Sales. This standard requires default masking for the PII data category.
A rule is set for the Human Resources group. This rule does not require any masking for the PII columns in the Employee Data asset.

Behavior

When the standard is synchronized and active, masking policies are created in Databricks—one policy for each column. The masking functions are named collibra_masking_policy_<asset ID>.

Databricks masking policies

The following image shows a masking policy for the STRING data type. The data that is shown in the policy depends on the masking level selected in the standard and rule. In the policy, val indicates the value as it is stored in the table.

Masking policy for STRING

According to the standard, the Everyone, Human Resources, Marketing, and Sales groups have masked access to the data. However, according to the rule, the Human Resources group has unmasked access to the data. As a result:

The column is not masked for the Human Resources group.
The column is masked for the Everyone, Marketing, and Sales groups.

Example

Consider the above rule with the following row filter added: Show rows where the Salary data classification has the code set value of 1000.

Row filter in rule

Behavior

Functions

Copy

CREATE
OR REPLACE FUNCTION protect_dev_catalog.tpch_dev.COLLIBRA_ROW_ACCESS_POLICY_9ba9f188_3247_4837_a14a_dae2b48ae287(SALARY decimal(10, 0)) RETURN IF(
  (
    (
      current_user() == 'HR'
      or is_account_group_member('HR')
    )
    and SALARY IN (1000)
  ),
  true,
  false
)

The row access functions are named collibra_row_access_policy_<asset ID>. The masking and row access policy functions are created at the schema level in Databricks.

Note Protect for Databricks supports Databricks external tables.