About sample data

Important 

Choose an option below to explore the documentation for the latest user interface (UI) or the classic UI.

Note If you're using a Collibra Cloud site, go the Collibra Cloud site documentation to check if your data source is supported.

Sample data is a set of randomly collected data from a data source. It provides examples of the data, helping you understand what to expect when using the asset. Depending on your environment, sample data can be shown for Column, Table, Data Set, Data Product, and Data Product Port assets.

If sample data is available, it is shown in:

Asset type If Catalog experience is active, you can see the sample data in: If Catalog experience is not active, you can see the sample data in:
Table Summary tab pane
Sample data tab pane
Details tab pane
Sample data tab pane
Column Summary tab pane
Data profiling tab pane
Details tab pane
Sample data tab pane
Data Set

Summary tab pane
Sample data tab pane

Details tab pane
Sample data tab pane

Note Currently, you can't request sample data via Edge for Data Set assets.

Data Product Sample data tab pane Sample data tab pane
Data Product Port Sample data tab pane Sample data tab pane
  • In Column assets, if sample data is available, it is shown on the Summary tab in the Descriptive Statistics section.
  • In Table, Data Set, Data Product, and Data Product Port assets, if sample data is available, it is shown on the Sample Data tab.
    Tip 
    • In Table, Data Set, Data Product, and Data Product Port assets, you see sample data only for columns where you have the required permissions. If you don't have access, the column shows the text <sensitive> instead of sample data.
    • Columns that show sample data in the asset include a chart icon next to the column name. Clicking the icon opens the Descriptive Statistics dialog box, which includes metadata and profiling information for the column, if available.

    Note Currently, you can't request sample data via Edge for Data Set assets.

Conditions to show sample data

You can view sample data for an asset only if the following conditions are met:

  • The sample data feature is active.
  • You have the required permissions.
  • Sample data is available for the asset.
  • Depending on the method used to get the sample data, the asset must be a Table, Column, Data Set, Data Product, or Data Product Port asset.

    Important Currently, you can't request sample data via Edge for Data Set assets.

Sample data in Collibra

The way Collibra handles sample data depends on how the assets are added to Collibra and how the sample data is collected.

Sample data for assets that are added via Edge

  • If sample data for an asset is uploaded via the Catalog REST API - Profiling, the sample data is stored in the Collibra cloud repository and is shown to all users with the required permissions.
  • Sample data can be manually requested for an asset if the asset is connected to an Edge or Collibra Cloud site via the related Database asset. This applies when the asset has been registered via the Edge Catalog data source registration process. In the latest UI, this is also available for integrated Databricks Unity Catalog and Dataplex Catalog assets.

    If sample data is requested, the randomly collected sample data is cached on the Edge site. It isn't stored in the Collibra cloud repository. The sample data is shown only to users with the required permissions and if the sample data has been requested. For more information, go to Configure the use of sample data via Edge: Steps.

    Tip Sample data available through Edge isn't automatically anonymized or masked.
    Anonymization hides data by replacing the original values with values that can't be traced back to the original data.
    Masking hides data by replacing specific parts of the original values while retaining the format.
    If you have permission to view sample data through Edge, the full value is visible. The sample data is collected and shown for a limited time and isn't stored in Collibra.

    You can apply masking by defining policies for the data source and the user assigned to the Edge data source connection, using Protect. When masking is applied at the data source, sample data is masked for all Collibra users, and profiling or classification is no longer possible for the masked data.

    Note 
    • Currently, you can't request sample data via Edge for Data Set assets.
    • Random rows are collected from the data source. However, the data within the rows isn't switched around. All data for each randomly collected row is shown.

Sample data for assets that are manually added or imported

Sample data must be uploaded via the Catalog REST API - Profiling. In this case, the sample data is stored in the Collibra cloud repository and is shown to all users with the required permissions.

Sample data for assets that are added via Jobserver

Related topics

Understanding the process to show sample data
Sample data limitations and guidelines
Configure the use of sample data via Edge: Steps
Configure the use of sample data via Jobserver

Helpful resources

Sample data training on Collibra University