Sample data limitations and guidelines

  • Sample data via Edge may require additional Edge site memory, CPU and disk space.
  • Currently, you can request sample data via Edge only for Table and Column assets.
  • For performance reasons, the number of samples to display must be less than 1,000. This limit is configurable in the Maximum number of samples setting, in the Data Profiling section. The default value is 100. The maximum value is 1,000.
    Go to Configure the use of sample data via Edge or Configure the use of sample data via Jobserver.
  • For performance reasons, avoid sampling tables with more than 1,500 columns.
    This limit is not configurable at the moment.
  • The sampling feature always uses push-down sampling if push-down sampling is available for the data source. Push-down sampling increases the sample data extraction speed.
    We advise to only allow sampling on data sources that support push-down sampling. To know if your data source allows for push-down sampling (called partial scan in Edge), go to Data sources supported by Edge or Overview of Collibra-certified JDBC drivers (Jobserver).

    Note If you try sampling on a data source that does not allow push-down sampling, the sample data extraction time is proportional to the database table size. The bigger the table, the longer it will take to retrieve the samples.