Only using part of the data to create profiling results
With source-driven random sampling, you can use only part of the data in a data source to generate profiling results. It means the data source creates a random set of data to profile.
Important Source-driven random sampling can be done using dynamic SQL query, if the data source supports it. To verify if source-driven random sampling is available for your data source, go to Collibra-provided JDBC drivers.
The data source randomly selects data to profile and transfers it to
- Jobserver
- Edge
Source-driven random sampling is not used by default on Jobserver. To use source-driven random sampling, do the following:
Step |
When |
Description |
---|---|---|
1 | Manage the driver | Add the pushDownSampling connection property. |
2 | Register your data source | Follow the usual steps to register a data source, but include the following options:
|
Select the Random Rows option when you configure the profiling options to apply source-driven random sampling. For details about the options, go to Profiling options.