Configure job limits

Limits allow admins to control data thresholds globally across Collibra to ensure job runs are consistent, relevant, and efficient.

Prerequisites

You have the Data Quality Admin global role with the Data Quality > Manage Settings global permission.

Steps

  1. On the main toolbar, click Products iconSettings.
  2. Click the Data Quality tab.
  3. Click Limits.
  4. Update the limits, as needed.
  5. Limit Description
    Lookback limit

    The lookback limit is the number of previous job runs that Data Quality & Observability examines to train the adaptive rules model.

    The default value is 21. You can adjust it by entering a value greater than 0.

    Score thresholds

    Score thresholds are specific data quality scores that you can configure to determine whether a job is passing or failing. Scores between the passing and failing thresholds are marked as warnings to alert users to possible or emerging data quality issues.

    The scoring scale uses two sliders for the passing and failing thresholds and is visually represented in color-coded segments:

    • Red for failing scores (default 0-75)
    • Orange for warning scores (default 76-89)
    • Green for passing scores (default 90-100)

    screenshot of scoring thresholds settings

    To set custom scoring thresholds, click and drag the passing or failing slider until the associated number matches your desired score threshold.

    Maximum concurrent pushdown jobs

    The number of Pushdown jobs that can run at the same time.

    The default is 5. You can adjust it by entering a value between 1 and 25.

    Maximum concurrent pullup jobs

    The number of Pullup jobs that can run at the same time.

    The default is 5. You can adjust it by entering a value between 1 and 25.

    Pullup job sizing limits

    The maximum Spark resources each Pullup job can request.

    For recommendations on how you might consider configuring these settings, go to Pullup settings recommendations.

    Maximum number of executors

    The maximum number of executors that the Spark agent can allocate to a job to process it efficiently.

    The default is 4.

    Maximum executor memory

    The maximum amount of RAM per executor that the Spark agent can allocate to a job to process it efficiently.

    The default is 8 GB.

    Maximum number of driver cores

    The maximum number of driver cores that the Spark agent can allocate to a job to process it efficiently.

    The default is 1.

    Maximum driver memory

    The maximum amount of driver memory that the Spark agent can allocate to a job to process it efficiently.

    The default is 2 GB.

    Maximum executor cores

    The maximum number of executor cores that the Spark agent can allocate to a job to process it efficiently.

    The default is 4.

    Maximum worker memory

    The maximum amount of memory per Spark worker that the Spark agent can allocate to a job to process it efficiently. This value is proportionate to the total number of workers in your environment.

    The default is 8 GB.

    Maximum worker cores

    The maximum number of CPU cores per worker that the Spark agent can allocate to a job to process it efficiently. This value is proportionate to the total number of workers in your environment.

    The default is 4.

    Maximum partitions

    The maximum number of partitions in your data to allow Spark to process large jobs more efficiently.

    A partition is an even split of the total number of rows in your data. For large jobs, increasing the number of partitions can improve performance and processing efficiency. If your data is unevenly distributed, you may need to increase the number of partitions to avoid data skew.

    The default is 100.

  6. Click Save changes.

What's next