Setting up Livy

Livy is required to preview data and use Estimate Job on remote file connections. It is also required to use the Result Preview feature on the Rule Workbench. This section shows you how to configure Livy for a Kubernetes deployments Collibra DQ.

Steps

  1. To create a Livy pod, perform a Helm upgrade or update the values.yaml file with the following command:
  2. Copy
    --set global.livy.enabled=true

    Tip global.livy.enabled is set to false by default. In order to use Livy, this must be set to true

  3. When the Livy pod is running in Ready status, restart the web pod.
  4. Note The Livy pod will update with the web pod.

  5. Sign in to Collibra DQ.
  6. Click Explorer in the sidebar navigation menu, then click Session Management next to Remote File Connections.
  7. The Session Management dialog opens.

    Note When Livy is unavailable, Session Management appears greyed-out

  8. Enter the required information.
  9. Option Description
    Cores

    The number of executor cores for Spark processing.

    The default value is 1.

    Memory

    The amount of RAM allocated as storage during Spark processing.

    Allocate at least 1 GB of memory per Livy pod.

    Workers

    The number of Livy sessions available in a Spark cluster for Livy to distribute processing tasks.

    When you increase the number of workers, an additional Livy session is created for each additional worker you specify. For example, if you increase your workers from 1 to 3, then 2 additional Livy sessions are created for a total of 3.

    The minimum value is 1.

    Tip Set the memory and worker values based on your file size. For larger files, you may need to increase the memory and workers. We do not recommend increasing the number of cores to a value greater than 1.

  10. Click Update Session.
  11. A confirmation message appears and asks you to acknowledge that all cached preview data will be evicted.
  12. Create a DQ Job on a Remote File Connection.

Troubleshooting Livy

  • If Livy does not show RUNNING status in the session manager, you can validate that pods are available with the following command:
Copy
$ kubectl get pods
  • If you find that there are errors or data does not load on the Load File step, this may indicate an issue with Livy. You can try to update the session. If there are ongoing issues, you can terminate your session from Session Management Terminate Session.