Data profiling information
In Collibra 2024.05, we launched a new user interface (UI) for Collibra Data Intelligence Platform! You can learn more about this latest UI in the UI overview.
Use the following options to see the documentation in the latest UI or in the previous, classic UI:
If you create a data profile of registered data, profiling results are generated in the Table and Column assets.
- If you use Jobserver to register the data source, data profiling information depends on the profile options that you selected when you registered the data source.
- If you use Edge to register the data source, most information is available only after you specifically profiled the data. For an overview of the data that becomes available after the registration of a data source via Edge, see Data source registration information.
Profiling via:
Column attribute | Profiling option | Statistics | Description | Retrieved from JDBC property |
---|---|---|---|---|
Column Name |
No option selected |
N/A |
The column name of the registered table. | COLUMN_NAME |
Data Type |
If you want to have Advanced Data Type detected, select Detect advanced data types. |
N/A |
The data type of the column. This type is detected by the profiling process. This data type can differ from the Technical Data Type value. Show information about data type detection via Edge.
By default, the data type is based on the technical data type.
Edge doesn't take Advanced data types into account. By default, Collibra anonymizes profiling results in Column assets that have data with the Text or Geo data type. However, it is possible to anonymize the profiling results for all columns. For more information, go to Anonymization via Edge. Note The Data Type is available in the Metadata tab of the descriptive statistics. If the profiling process has detected a wrong data type, you can update the data type there. If you enable the Anonymize data option in Collibra Console, Collibra anonymizes data in Column assets that have data type Text and Geo. If the profiling process has detected a wrong data type, you can update it afterwards. |
|
Description from Source | No option selected | N/A | The description of the column in the data source. | REMARKS |
Row Count | Store Data Profile |
Exact |
The number of rows in the data source. | |
Empty Values Count | Store Data Profile |
Exact |
The number of rows that are empty. | |
Number of distinct values | Store Data Profile |
Exact or approximate depending on column cardinality |
The number of unique values in the column. | |
Descriptive Statistics Chart |
Store Data Profile |
Depending on chart type |
This column displays whether charts were generated () for the column or not (no icon available). Click the icon to open the chart in a dialog box. There you can zoom in, hover over a data point, and so on. If you hover over the icon, a preview of the chart appears. If you hover over a data point in the preview, extra data appears for the data point. The chart type varies per data type. Following charts can be shown:
Note
Charts are never available for the following data types:
|
|
Frequency
|
Store Data Profile |
Exact or approximate depending on column cardinality |
Note
This chart is available only if Categorical Data = true. |
|
Distribution - Histogram
|
Store Data Profile |
Approximate |
A histogram showing the representation of the distribution of numerical data. |
|
Distribution - Probability distribution curve
|
Store Data Profile |
Approximate |
A curve showing the representation of the probability distribution of numerical data. |
|
Technical Data Type | No option selected |
N/A |
Data type of the column as defined in the source. This value can differ from the Data Type value. |
TYPE_NAME |
|
Store Data Profile |
Approximate |
The value of the calculated statistic of the registered data. | |
Categorical Data | Store Data Profile |
Exact or approximate depending on column cardinality |
Indication whether the data in the column is categorical or not. For example, if 100 000 rows are registered and there are only five distinct values, then the data is considered to be categorical. |
|
Category | Store Data Profile |
Exact or approximate depending on column cardinality |
List of detected categories. This column has only values if the data is categorical. | |
Char octet Length | No option selected |
N/A |
Maximum number of bytes in a character type's column. | CHAR_OCTET_LENGTH |
Column Position | No option selected |
N/A |
The index of the column in the source table. | ORDINAL_POSITION |
Is Auto Incremented | No option selected |
N/A |
Indication whether the data in the column is auto-incremented or not. | IS_AUTOINCREMENT |
Is Generated | No option selected |
N/A |
Indication whether the data in the column is generated or not. | IS_GENERATEDCOLUMN |
Is Nullable | No option selected |
N/A |
Indication whether the column can store NULL values or not. | IS_NULLABLE |
Is Primary Key | No option selected |
N/A |
Indication whether the column is a primary key or not. | True if the primary keys resultSet contains the COLUMN_NAME |
Maximum Text Length | Store Data Profile |
Exact |
The length of the longest text value in the column, including white spaces. | |
Maximum Value | Store Data Profile |
Exact |
The maximum value in the column. | |
Mean | Store Data Profile |
Exact |
The mean of all the values in the column, excluding empty rows. | |
Median | Store Data Profile |
Exact |
The median value of the column. | |
Minimum Text Length | Store Data Profile |
Exact |
The length of the shortest text value in the column. | |
Minimum Value | Store Data Profile |
Exact |
The minimum value in the column. | |
Mode | Store Data Profile |
Exact or approximate depending on column cardinality |
The value with the highest frequency for categorical data. | |
Number Of Fractional Digits | No option selected |
N/A |
The number of fractional digits. | DECIMAL_DIGITS |
Primary Key Name | No option selected |
N/A |
The name of the primary key composed by the column. | PK_NAME |
Size | No option selected |
N/A |
The size of the column in the table. | COLUMN_SIZE |
Standard Deviation | Store Data Profile |
Exact |
The statistical standard deviation of numeric values. | |
Variance | Store Data Profile |
Exact |
The statistical variance of numeric values. | |
Sample | Store Sample Data |
N/A |
A random sample of the data set that represents the entire data set. Note In Edge, viewing sample data is not linked to the profiling feature. See sample data. |
Table attribute | Profiling option | Statistics | Description | From JDBC property |
---|---|---|---|---|
Table Name |
No option selected |
N/A |
The table name in the data source. | TABLE_NAME |
Table Type | No option selected |
N/A |
The table type in the data source, such as TABLE or VIEW. | TABLE_TYPE |
Description from Source | No option selected | N/A | The description of the table in the data source. | REMARKS |