Protect FAQ
Why does Protect for BigQuery require a separate connection than the one used for Catalog ingestion?
Protect uses GCP APIs for specific data protection tasks such as creating taxonomies, tags, and data policies. These tasks cannot be accomplished with the JDBC connection used for Catalog ingestion. Therefore, Protect for BigQuery requires a separate GCP connection.
If I delete a standard or rule, is the corresponding policy also deleted from the data source?
Yes, the corresponding policy is also deleted from the data source in the next synchronization cycle.
If I have a standard and a rule that affect the same Protect group, which of the two takes precedence?
The rule takes precedence over the standard.
If I protect a table via a standard or rule, would the view created on the table also inherit that standard or rule?
Yes. While Protect does not directly support views, if a view is created on the table you protect, the view is also protected.
What happens if I have my own policy tags assigned to columns in BigQuery and I start using Protect?
Only a single tag can be assigned to a column. Protect creates and assigns its own policy tags to the columns, replacing your existing policy tags. Protect, however, does not alter or delete any other policy tags.
Does Protect support referential integrity (preserving the integrity of data)?
Yes. Protect supports referential integrity for hashing.
Is hashing irreversible?
Yes. For information about how hashing is implemented for each data source, go to the respective documentation about data types in Protect data sources.
When a Protect policy that granted access to Databricks or Snowflake data is deleted, why doesn’t Protect automatically revoke the access as it would with AWS Lake Formation and BigQuery?
This is because Protect can't determine if the access was granted through itself or another source. Although Protect removes any masking or row filtering, users can still access the data until they manually revoke the access in Databricks or Snowflake.
In AWS Lake Formation and BigQuery, data protection and access control are integrated, whereas in Databricks and Snowflake, they are separate. The Grant access to the data linked to the assets checkbox in the Protect rule is applicable to only Databricks and Snowflake, reflecting this distinct approach.
If I remove a column from a data classification or category path, is the protection removed from the column?
Yes, the protection is removed from the column in the next synchronization cycle, which occurs every hour by default, but can also be configured.
What happens when a standard or rule has columns to multiple data sources, but its group(s) is mapped to only one of the sources?
If a standard protects both BigQuery and Snowflake columns, Protect expects group mappings for both data sources. If one of the group mappings is omitted, the standard won’t apply the tag to both the sources and will fail.
Do I need to create a custom path for a data category that follows the path Data Category > Data Attribute > Column?
Yes. Protect supports the following path: Data Category > Data Set > Data Attribute > Column. Therefore, you can relate the data attribute to a data set using the contains relation and relate that data set to the data category.
Can I apply a personalized masking level, such as showing data as GDPR instead of 0?
Yes, but only with Databricks and Snowflake because they allow for customization. You can create your own masking function in Databricks or Snowflake, register the function in Protect, and then select that function in a standard or rule.
I applied masking and row-filtering to a Snowflake column via Protect. But in Snowflake, the row-filtering is not applied. Why?
Snowflake does not allow the application of both masking and row-filtering to the same column. If you have a row filter, you cannot mask the column that's being used in that row filter.
Can Protect synchronize existing policies from a data source and create them as standards or rules in Collibra?
No. Protect can't import or synchronize policies created directly in the data source. Synchronization is one-way: you must define your policies (standards and rules) in Collibra, which are then enforced on the data source.