Databricks data types

Databricks provides several functions to transform the data. This documentation describes how Databricks transforms the data for a given Protect masking type.

  • Default masking: Databricks does not support this masking type. Protect, however, uses the default masking type to apply protection to a wide range of data types. A default masking value is applied to each column according to the data type of the column.
  • Hashing: Uses the following Databricks functions:
    • SHA2 (for strings)
    • HASH (for numbers)
    • right(hash(value), (precision - scale)) (for decimals)
  • Show last: Uses the following expressions:
    • right(value,n) (for strings)
    • mod(value, cast(power(10,n) AS INT)) (for integers)
    • regexp_replace(substr(string(value), length(value) - (n-1), n), '^$', '0') (for floating-point numbers and decimals)
      Tip In the expressions, value indicates the content and n indicates the number of characters to be shown.
  • No masking: Returns the raw content.
Note 
  • You can apply the Hashing and Show last masking types to only the following Databricks data types: BIGINT, DECIMAL, DOUBLE, FLOAT, INT, SMALLINT, STRING, and TINYINT.
  • If a selected masking type cannot be applied to a certain data type—for example, when you attempt to apply the Hashing masking type to the DATE data type—the Default masking type is applied to the data type to guarantee protection.