Calculating the Edge hardware requirements to show sample data

Before you configure sample data via Edge, it is important to validate that your Edge site has enough memory, CPU and available cache disk space.

Memory and CPU requirements

The Edge capability Catalog JDBC Sampling consists of two possible operations, which require Edge resources:

  • Extracting the sample data, which collects the data from a data source and caches the data on the Edge site.
  • Reading the sample data, which reads sample data from the Edge cache and returns it as a result of an API call or displays it in an asset page.

The following table shows how much resources are required for one request.

  Extracting the sample data Reading the sample data
Memory per request 4 Gb 900 Mb
CPU per request 1 cpu 0.9 cpu
Note 
  • If you want to support multiple requests to run at the same time, you need to multiply these numbers by the number of parallel requests.

  • These resources must be added to the other resource requirements for Edge, the operating system and any other software you would like to run on the same machine.

    You also need to add the requirements of other Edge capabilities if you want run other capabilities in parallel of the sampling capability. If you accept that operations are executed one after the other and that some need to wait in a queue, then only the highest requirements need to be fulfilled.

Hard disk requirement

Extracted sample data remains in the cache of an Edge site. This means that enough disk space must be available to cache this data. If the Edge cache is still full and a new sample data request is added, the request will fail. For more information, go to Edge cache.

The required disk space largely depends on the expected number of tables for which sample data will be requested, per day. You can estimate the disk space in bytes as follows:

(Number of tables per day *2) * Number of columns * (Number of characters for one column name + (Number of samples * Number of characters for one sample)) * 2.05

  • You need to multiply the number of tables per day by 2 because the sample data can stay in the Edge cache about two days.
  • You need to multiply the number by 2.05 because, for each character, we calculate 2 bytes and some margin for the data serialization format.

Also consider that the Edge cache may hold other data than the sample data, like a copy of the JDBC drivers used to connect to the data sources. So it is best to round up the required space.

Example 

In this example,

  • You expect to receive requests for sample data for 100 tables per day.
  • Each table has about 20 columns with an average column name of 30 characters.
  • For each column, you want to collect 100 samples.
    This number is set in the Maximum number of samples DGC Service setting.
  • Each sample has an average of 100 characters.
    You can define the maximum number of characters to collect via the Maximum value length DGC Service setting.

The numbers for the calculation are:

  • Number of tables per day: 100.
  • Number of columns: 20.
  • Number of characters for one column name: 30.
  • Number of samples: 100.
  • Number of characters for one sample: 100.

The estimated disk space in bytes is:

(100 *2) * 20 * (30 + (100*100) * 2.05 = 882,246,000 bytes = 882,246 kb = 82 Mb

As a conclusion for this example, having around 100 Mb of disk space available for sample data on the Edge site cache should be sufficient.