Sample data beta feature
limitations and guidelines
A public beta is an upcoming feature or product that is made available to all customers before it is fully ready for general availability so it can be tested and evaluated early. Learn more
1 Sample data limitations
-
Currently, you can only request sample data via Edge for Table and Column assets.
-
Sample data in the Edge cache is not encrypted. This means that the data is stored in clear text on the Edge site 24-48 hours. Only the key that allows to identify the sample data origin is encrypted.
-
For performance reasons, the number of samples to display should be less than 1,000. This limit is configurable in the Maximum number of samples setting, in the Profiling section. The default value is 100.
See Configure the use of sample data via Edge and Configure the use of sample data via Jobserver.
The limit of 1,000 will be enforced in a later release. -
For performance reasons, avoid sampling tables with more than 1,500 Column assets.
This limit is not configurable and will be enforced in a later release. -
The sampling feature always uses push-down sampling if push-down sampling is available for the data source. Push-down sampling increases the sample data extraction speed.
We advise to only allow sampling on data sources that support push-down sampling. To know if your data source allows for push-down sampling (called partial scan in Edge), see Data sources supported by Edge and Overview of Collibra-provided JDBC drivers (Jobserver).
2 Sample data guidelines
During the Beta testing phase, we advise to:
- Keep the maximum number of samples to the default limit of 100 samples or do not exceed 1,000 samples.
- Don’t request sample data for tables with more than 1,500 columns.
- Use sampling only on data sources that support push-down sampling.
3 Your feedback is welcome
- During the Beta testing phase, we do advise to use sampling only on data sources that support push-down sampling.
However, we are looking for feedback on data sources that do not allow push-down sampling.Note If you try sampling on a data source that does not allow push-down sampling, note that the sample data extraction time is proportional to the database table size. The bigger the table, the longer it will take to retrieve the samples.
- We are looking for feedback on large parallel sample data requests. This happens when a lot of users want to see sample data at the same time.
Tip If you experience issues in this situation, you can decrease the number of Edge data sources for which the sampling capability is enabled.