Setting up Livy
Livy is required to preview data and use Estimate Job on remote file connections. It is also required to use the Result Preview feature on the Rule Workbench. This section shows you how to configure Livy for a Kubernetes deployments Collibra DQ.
Steps
- To create a Livy pod, perform a Helm upgrade or update the values.yaml file with the following command:
- When the Livy pod is running in Ready status, restart the web pod.
- Sign in to Collibra DQ.
- Click in the sidebar navigation menu, then click Session Management next to Remote File Connections. The Session Management dialog opens.
- Enter the required information.
- Click Update Session. A confirmation message appears and asks you to acknowledge that all cached preview data will be evicted.
- Create a DQ Job on a Remote File Connection.
--set global.livy.enabled=true
Tip global.livy.enabled
is set to false
by default. In order to use Livy, this must be set to true
Note The Livy pod will update with the web pod.
Note When Livy is unavailable, Session Management appears greyed-out
Option | Description |
---|---|
Cores |
The number of executor cores for Spark processing. The default value is 1. |
Memory |
The amount of RAM allocated as storage during Spark processing. Allocate at least 1 GB of memory per Livy pod. |
Workers |
The number of Livy sessions available in a Spark cluster for Livy to distribute processing tasks. When you increase the number of workers, an additional Livy session is created for each additional worker you specify. For example, if you increase your workers from 1 to 3, then 2 additional Livy sessions are created for a total of 3. The minimum value is 1. |
Tip Set the memory and worker values based on your file size. For larger files, you may need to increase the memory and workers. We do not recommend increasing the number of cores to a value greater than 1.
Troubleshooting Livy
- If Livy does not show RUNNING status in the session manager, you can validate that pods are available with the following command:
$ kubectl get pods
- If you find that there are errors or data does not load on the Load File step, this may indicate an issue with Livy. You can try to update the session. If there are ongoing issues, you can terminate your session from Session Management Terminate Session.