Databricks data types

Databricks provides several functions to transform the data. This documentation describes how Databricks transforms the data for a given Protect masking type.

Default masking: Databricks does not support this masking type. Protect, however, uses the default masking type to apply protection to a wide range of data types. A default masking value is applied to each column according to the data type of the column.

Default masking values for data types

Column data type	Databricks data type	Default masking value
NUMERIC	BIGINT	bigint('0')
BIGNUMERIC	BIGINT	bigint('0')
BYTEINT	BIGINT	bigint('0')
BIGINT	BIGINT	bigint('0')
BINARY	BINARY	binary('00')
VARBINARY	BINARY	binary('00')
BYTES	BINARY	binary('00')
BOOLEAN	BOOLEAN	false
DATE	DATE	1970-01-01
DATETIME	DATE	1970-01-01
DECIMAL	DECIMAL(p,s)	decimal('0.0')
DOUBLE	DOUBLE	double('0.0')
DOUBLE PRECISION	DOUBLE	double('0.0')
REAL	DOUBLE	double('0.0')
FLOAT	FLOAT	float('0.0')
FLOAT4	FLOAT	float('0.0')
FLOAT8	FLOAT	float('0.0')
INT	INT	int('0')
NUMBER	NUMBER	int('0')
BIT	INT	int('0')
INTEGER	INT	int('0')
SMALLINT	SMALLINT	smallint('0')
STRING	STRING	mask('S','*')
CHAR	STRING	mask('S','*')
CHARACTER	STRING	mask('S','*')
VARCHAR	VARCHAR	mask('S','*')
TEXT	STRING	mask('S','*')
TIMESTAMP	TIMESTAMP	1970-01-01 00:00:00.000
TIME	TIMESTAMP	1970-01-01 00:00:00.000
TIMESTAMP_NTZ	TIMESTAMP	1970-01-01 00:00:00.000
TIMESTAMP_LTZ	TIMESTAMP	1970-01-01 00:00:00.000
TIMESTAMP_TZ	TIMESTAMP	1970-01-01 00:00:00.000
TINYINT	TINYINT	tinyint('0')
ARRAY	ARRAY <elementType >	array()
MAP	MAP < keyType,valueType >	map()
STRUCT	STRUCT < [fieldName : fieldType [NOT NULL][COMMENT str][, …]] >	struct(0) or struct(0,0) Tip The dynamic value depends on how many fields are defined for the STRUCT datatype.

Hashing: Uses the following Databricks functions:
- SHA2 (for strings)
- HASH (for numbers)
- right(hash(value), (precision - scale)) (for decimals)
Show last: Uses the following expressions:
- right(value,n) (for strings)
- mod(value, cast(power(10,n) AS INT)) (for integers)
- regexp_replace(substr(string(value), length(value) - (n-1), n), '^$', '0') (for floating-point numbers and decimals)
  Tip In the expressions, value indicates the content and n indicates the number of characters to be shown.
No masking: Returns the raw content.

Note

You can apply the Hashing and Show last masking types to only the following Databricks data types: BIGINT, DECIMAL, DOUBLE, FLOAT, INT, SMALLINT, STRING, and TINYINT.
If a selected masking type cannot be applied to a certain data type—for example, when you attempt to apply the Hashing masking type to the DATE data type—the Default masking type is applied to the data type to guarantee protection.