Data profiling information

Important 

Choose an option below to explore the documentation for the latest user interface (UI) or the classic UI.

If you create a profile of data, profiling results are created for Table and Column assets.

Tip 
Column attribute

Accuracy of statistics after profiling based on all rows

Accuracy of statistics after profiling based on random rows Description
Column Name

 

  The column name in the table.
Data Type

Edge tries to detect the data type by looking at the first 10,000 rows.

Edge tries to detect the data type based on the number of random rows, with a maximum of 10,000 rows.

The data type of the column. This type is detected by the profiling process. This data type can differ from the Technical Data Type value.
For example, if a database has a column with text as technical data type, and the column contains only integer values, the profiling process will set the Whole Number data type instead of text.

By default, Collibra anonymizes profiling results in Column assets that have data with the Text or Geo data type. However, it is possible to anonymize the profiling results for all columns. For more information, go to Anonymization via Edge.

Note  The Data Type is available in the Metadata tab of the descriptive statistics. If the profiling process has detected a wrong data type, you can update the data type there.

If you enable the Anonymize data option in Collibra Console, Collibra anonymizes data in Column assets that have data type Text and Geo.

If the profiling process has detected a wrong data type, you can update it afterwards.

Description from Source N/A   The description of the column in the data source.
Row Count

Exact

Exact The total number of rows in the table.
Profiled Row Count Exact Exact The number of rows used to create statistics.
Empty Values Count

Exact

Exact or approximate The number of rows that are empty.
Number of distinct values

Exact or approximate depending on column cardinality

Exact or approximate depending on column cardinality The number of unique values in the column.

Descriptive Statistics

Chart

Depending on chart type

Depending on chart type

This column displays whether charts were generated () for the column or not (no icon available).

Click the icon to open the chart in a dialog box. There you can zoom in, hover over a data point, and so on.

If you hover over the icon, a preview of the chart appears. If you hover over a data point in the preview, extra data appears for the data point.

The chart type varies per data type. Following charts can be shown:

  • Frequency chart
  • Distribution chart (histogram) that shows distribution and a probability distribution curve
Frequency

Exact

Exact or approximate depending on the number of profiled rows

A frequency chart, showing how many times each distinct value appears.

Note A frequency chart is available only if Categorical Data = true.

Distribution - Histogram

Approximate

Approximate

A histogram showing the representation of the distribution of numerical data.

Note A distribution chart is available only if the data type is Whole Number or Decimal Number.

Distribution - Probability distribution curve

Approximate

Approximate

A curve showing the representation of the probability distribution of numerical data.

Note A distribution chart is available only if the data type is Whole Number or Decimal Number.

Technical Data Type

N/A

 

Data type of the column as defined in the source. This value can differ from the Data Type value.

Quantiles Descriptive statistics (decile, percentile, quartiles)

Approximate

Approximate The value of the calculated statistic of the data.
Categorical Data

Exact or approximate depending on column cardinality

Exact or approximate depending on column cardinality

The value is True if the data is categorical and False if it is not.

A column is considered categorical if it meets both of the following conditions:

  • Fewer than 10% of the rows have unique values, meaning most values are repeated.
  • There are at least 2 frequent values, and the second most frequent value appears at least twice.

This ensures the column has a pattern of repeated values, which is typical for categorical data.

Category

Exact or approximate depending on column cardinality

Exact or approximate depending on column cardinality List of detected categories. This column has only values if the data is categorical.
Char octet Length

N/A

  Maximum number of bytes in a character type's column.
Column Position

N/A

  The index of the column in the source table.
Is Auto Incremented

N/A

  Indication whether the data in the column is auto-incremented or not.
Is Generated

N/A

  Indication whether the data in the column is generated or not.
Is Nullable

N/A

  Indication whether the column can store NULL values or not.
Is Primary Key

N/A

  Indication whether the column is a primary key or not.
Maximum Text Length

Exact

Exact within the profiled rows The length of the longest text value in the column, including white spaces.
Maximum Value

Exact

Exact within the profiled rows The maximum value in the column.
Mean

Exact

Exact within the profiled rows The mean of all the values in the column, excluding empty rows.
Median

Exact

Exact within the profiled rows The median value of the column.
Minimum Text Length

Exact

Exact within the profiled rows The length of the shortest text value in the column.
Minimum Value

Exact

Exact within the profiled rows The minimum value in the column.
Mode

Exact

Exact or approximate depending on the number of profiled rows The value with the highest frequency for categorical data.
Number Of Fractional Digits

N/A

  The number of fractional digits.
Primary Key Name

N/A

  The name of the primary key composed by the column.
Size

N/A

  The size of the column in the table.
Standard Deviation

Exact

Exact within the profiled rows The statistical standard deviation of numeric values.
Variance

Exact

Exact within the profiled rows The statistical variance of numeric values.

Note For very large tables, regardless of whether all rows or a subset of rows are profiled, profiling creates compact, approximate summaries that enable efficient analysis and calculation of statistics. Some of the generated statistics are approximate and not exact. However, they are sufficient for analysis and decision-making, and reduce computational costs.

Tip In Edge, viewing sample data isn't linked to the profiling feature. For more information, go to About sample data.

Column attribute Profiling option Statistics Description Retrieved from JDBC property
Column Name

No option selected

N/A

The column name in the table. COLUMN_NAME
Data Type

Store Data Profile

If you want to have Advanced Data Type detected, select Detect advanced data types.

N/A

The data type of the column. This type is detected by the profiling process. This data type can differ from the Technical Data Type value.
For example, if a database has a column with text as technical data type, and the column contains only integer values, the profiling process will set the Whole Number data type instead of text.

By default, Collibra anonymizes profiling results in Column assets that have data with the Text or Geo data type. However, it is possible to anonymize the profiling results for all columns. For more information, go to Anonymization via Edge.

Note  The Data Type is available in the Metadata tab of the descriptive statistics. If the profiling process has detected a wrong data type, you can update the data type there.

If you enable the Anonymize data option in Collibra Console, Collibra anonymizes data in Column assets that have data type Text and Geo.

If the profiling process has detected a wrong data type, you can update it afterwards.

 
Description from Source No option selected N/A The description of the column in the data source. REMARKS
Row Count Store Data Profile

Exact

The number of rows in the data source.  
Empty Values Count Store Data Profile

Exact

The number of rows that are empty.  
Number of distinct values Store Data Profile

Exact or approximate depending on column cardinality

The number of unique values in the column.  

Descriptive Statistics

Chart

Store Data Profile

Depending on chart type

This column displays whether charts were generated () for the column or not (no icon available).

Click the icon to open the chart in a dialog box. There you can zoom in, hover over a data point, and so on.

If you hover over the icon, a preview of the chart appears. If you hover over a data point in the preview, extra data appears for the data point.

The chart type varies per data type. Following charts can be shown:

  • Frequency chart
  • Distribution chart (histogram) that shows distribution and a probability distribution curve
 
Frequency

Store Data Profile

Exact

A frequency chart.

Note A frequency chart is available only if Categorical Data = true.

 
Distribution - Histogram

Store Data Profile

Approximate

A histogram showing the representation of the distribution of numerical data.

Note A distribution chart is available only if the data type is Whole Number or Decimal Number.

 
Distribution - Probability distribution curve

Store Data Profile

Approximate

A curve showing the representation of the probability distribution of numerical data.

Note A distribution chart is available only if the data type is Whole Number or Decimal Number.

 
Technical Data Type No option selected

N/A

Data type of the column as defined in the source. This value can differ from the Data Type value.

TYPE_NAME
Quantiles Descriptive statistics (decile, percentile, quartiles) Store Data Profile

Approximate

The value of the calculated statistic of the data.  
Categorical Data Store Data Profile

Exact or approximate depending on column cardinality

The value is True if the data is categorical and False if it is not.

A column is considered categorical if it meets both of the following conditions:

  • Fewer than 10% of the rows have unique values, meaning most values are repeated.
  • There are at least 2 frequent values, and the second most frequent value appears at least twice.

This ensures the column has a pattern of repeated values, which is typical for categorical data.

 
Category Store Data Profile

Exact or approximate depending on column cardinality

List of detected categories. This column has only values if the data is categorical.  
Char octet Length No option selected

N/A

Maximum number of bytes in a character type's column. CHAR_OCTET_LENGTH
Column Position No option selected

N/A

The index of the column in the source table. ORDINAL_POSITION
Is Auto Incremented No option selected

N/A

Indication whether the data in the column is auto-incremented or not. IS_AUTOINCREMENT
Is Generated No option selected

N/A

Indication whether the data in the column is generated or not. IS_GENERATEDCOLUMN
Is Nullable No option selected

N/A

Indication whether the column can store NULL values or not. IS_NULLABLE
Is Primary Key No option selected

N/A

Indication whether the column is a primary key or not. True if the primary keys resultSet contains the COLUMN_NAME
Maximum Text Length Store Data Profile

Exact

The length of the longest text value in the column, including white spaces.  
Maximum Value Store Data Profile

Exact

The maximum value in the column.  
Mean Store Data Profile

Exact

The mean of all the values in the column, excluding empty rows.  
Median Store Data Profile

Exact

The median value of the column.  
Minimum Text Length Store Data Profile

Exact

The length of the shortest text value in the column.  
Minimum Value Store Data Profile

Exact

The minimum value in the column.  
Mode Store Data Profile

Exact

The value with the highest frequency for categorical data.  
Number Of Fractional Digits No option selected

N/A

The number of fractional digits. DECIMAL_DIGITS
Primary Key Name No option selected

N/A

The name of the primary key composed by the column. PK_NAME
Size No option selected

N/A

The size of the column in the table. COLUMN_SIZE
Standard Deviation Store Data Profile

Exact

The statistical standard deviation of numeric values.  
Variance Store Data Profile

Exact

The statistical variance of numeric values.  
Sample Store Sample Data

N/A

A random sample of the data set that represents the entire data set.

 
Table attribute Profiling option Statistics Description From JDBC property
Table Name

No option selected

N/A

The table name in the data source. TABLE_NAME
Table Type No option selected

N/A

The table type in the data source, such as TABLE or VIEW. TABLE_TYPE
Description from Source No option selected N/A The description of the table in the data source. REMARKS