Collibra DQ improves your data lake
Warning This documentation is archived and is no longer maintained.
- Data and Privacy in Place. Data never has to move for a DQ check. The latency saved from operating in place, the added hybrid flexibility, the privacy maintained serves many new use cases that were not possible before. It also removes any unnecessary consolidation for the sake of simply consolidation. DQ doesn't have to start by first moving it into a data lake.
- DQ or Any Rules applied in the Stream. The DQ rules learned by DQ can be applied back to the source on data in the stream. However, other non-DQ rules learned in the data lake can also be added to the DQ check.
- Self-Service and DQ push-down fix. DQ can enable a self-service push-down fix (recommendation engine) to anything flagged at the source. The best time to fix DQ is when and where the problem started. This enables tighter integration with Data Governance tools since DQ is maintained at the source once, not downstream where corruption beyond just the data can occur.
- Multi-cloud/On-prem/Hybrid. Collibra DQ can scan/alert/report at the source or can operate natively on the target data lake such as Databricks Delta in Azure or Snowflake on AWS, or Qubole on GCP. Why compromise DQ just because your data is not in one place? Why settle on a DQ strategy that only works if the data is first migrated or moved?
- DQ Dashboards. Many DQ problems result from an improper or a too slow observation of business rules related to the data. What is not caught by handmade visual inspection or a potentially outdated man-made rule can only be flagged by AI Machine Learning. Conversely, what does get flagged should also be easily triaged and then immediately fixed with the aid of AI. The most important metric for a DQ Dashboard is the time to fix, not simply the overall DQ score.