Shapes (automatic)

Collibra DQ automatically detects inconsistencies in data formats. These inconsistencies are where Data Scientists spend an enormous amount of time cleaning the data before building a ML model. Many reports have documented that over 80% of the time it takes to build a credible model comes from first understanding all the different formats and then writing munging or ETL style code to clean it before processing. What about all the patterns the process or person doesn't even know about?

Drill-in to any Shape anomaly and see a visual example

See an itemized list view of the most infrequent or odd shapes in your datasets.

Shapes tab

Shape Tuning

Shapes detection is on by default. Click the on the Findings page to manually tune the Shape Settings. Click and drag the sliders to adjust the tuning of Occurrence %, Format per Column, and Character Length.

Shapes tuning

TopN Shape

Beneath the TopN Values chart on the Profile page is the TopN Shape chart. The TopN Shape chart represents the exceptional shapes for a particular column on the Shapes tab.

TopN Shapes