Push down sampling

Push down sampling means that the task of creating the data sample is delegated to the data source itself.

Tip   In Edge, push down sampling is called partial scan.

Push down sampling drastically increases the performance of sampling.

Enable push down sampling

Push down sampling is not used by default. To use push down sampling, do the following:

Step

When

Description

1 Manage the driver

Add the pushDownSampling connection property.

2 Register your data source

Follow the usual steps to register a data source, but include the following options:

  1. Enter a value for the pushDownSampling connection property.

    Note   
    • The value must be between 100 and 1 000 000. Your data source creates the sample of that amount of rows.
    • If the size of the amount of rows exceeds the limit of the cache storage (Collibra recommends 10 to 20 GB), the amount of rows is reduced.
    • If you typed a value that is bigger than the amount of rows in the data source, the entire data source is used as a sample.
  2.  Select the following Profiling options:
    • Store Data Profile and, optionally, Store Sample Data to profile via Jobserver.
    • Profile and classify data to profile and classify via Edge.