Data profiling information
If you want to create a data profile of registered data, data profiling information is generated.
The shown information depends on the profile options that you selected when you registered the data source and the profiling method, either via Jobserver or via Edge, that you used.
Profiling via:
| Column | Profiling option | Supported | Statistics | Description |
|---|---|---|---|---|
| Original Name |
No |
No |
N/A |
Column name of the registered table. |
| Data Type |
Store Data Profile If you want to have Advanced Data Type detected, select Detect advanced data types |
Yes |
N/A |
Data type of the column. This type is detected by the profiling process. This can differ from the Technical Data Type value. For example, if a database has a column with text as data type, and the column contains only integer values, the profiling process will set the Whole Number data type instead of text. If you enable the Anonymize data option in Collibra Console, Collibra anonymizes data in Column assets that have data type Text and Geo. If the profiling process has detected a wrong data type, you can update it afterwards. |
| Row Count | Store Data Profile |
Yes |
Exact |
The number of rows in the source. |
| Empty Values Count | Store Data Profile |
Yes |
Exact |
The number of rows that are empty. |
| Number of distinct values | Store Data Profile |
Yes |
Exact or approximate depending on column cardinality |
The number of unique values in the column. |
|
Chart |
Store Data Profile |
Yes |
Depending on chart type |
This column displays whether a chart was generated ( The chart type varies per data type. There are three charts available:
|
|
Frequency
|
Store Data Profile |
Yes |
Exact or approximate depending on column cardinality |
A bar chart showing frequency data. |
|
Distribution - Histogram
|
Store Data Profile |
Yes |
Approximate |
A histogram showing the representation of the distribution of numerical data. |
|
Distribution - Probability distribution curve
|
Store Data Profile |
Yes |
Approximate |
A curve showing the representation of the probability distribution of numerical data. |
| Technical Data Type | No |
No |
N/A |
Data type of the column as defined in the source. This value can differ from the Data Type value. |
| Descriptive statistics (decile, percentile, quartiles) | Store Data Profile |
Yes |
Approximate |
The value of the calculated statistic of the registered data. |
| Categorical Data | Store Data Profile |
Yes |
Exact or approximate depending on column cardinality |
Indication whether the data in the column is categorical or not. For example, if 100 000 rows are registered and there are only five distinct values, then the data is considered to be categorical. |
| Categories | Store Data Profile |
Yes |
Exact or approximate depending on column cardinality |
List of detected categories. This column has only values if the data is categorical. |
| Char octet Length | No |
No |
N/A |
Maximum number of bytes in a character type's column. |
| Column Position | No |
No |
N/A |
The index of the column in the source table. |
| Is Auto Incremented | No |
No |
N/A |
Indication whether the data in the column is auto-incremented or not. |
| Is Generated | No |
No |
N/A |
Indication whether the data in the column is generated or not. |
| Is Nullable | No |
No |
N/A |
Indication whether the column can store NULL values or not. |
| Is Primary Key | No |
No |
N/A |
Indication whether the column is a primary key or not. |
| Maximum Text Length | Store Data Profile |
Yes |
Exact |
The length of the longest text value in the column, including white spaces. |
| Maximum Value | Store Data Profile |
Yes |
Exact |
The maximum value in the column. |
| Mean | Store Data Profile |
Yes |
Exact |
The mean of all the values in the column, excluding empty rows. |
| Median | Store Data Profile |
Yes |
Exact |
The median value of the column. |
| Minimum Text Length | Store Data Profile |
Yes |
Exact |
The length of the shortest text value in the column. |
| Minimum Value | Store Data Profile |
Yes |
Exact |
The minimum value in the column. |
| Mode | Store Data Profile |
Yes |
Exact or approximate depending on column cardinality |
The value with the highest frequency for categorical data. |
| Number Of Fractional Digits | No |
No |
N/A |
The number of fractional digits. |
| Original Column Name | No |
No |
N/A |
The column name as defined in the source. |
| Primary Key Name | No |
No |
N/A |
The name of the primary key composed by the column. |
| Size | No |
No |
N/A |
The size of the column in the table. |
| Standard Deviation | Store Data Profile |
Yes |
Exact |
The statistical standard deviation of numeric values. |
| Variance | Store Data Profile |
Yes |
Exact |
The statistical variance of numeric values. |
| Sample | Store Sample Data |
No |
N/A |
A random sample of the data set that represents the entire data set. |