Setting up Livy

Livy is required to preview data and use Estimate Job on remote file connections. It is also required to use the Result Preview feature on the Rule Workbench. This section shows you how to configure Livy for a Kubernetes deployments Collibra DQ.

Steps

  1. To create a Livy pod, perform a Helm upgrade or update the values.yaml file with the following command:
  2. Copy
    --set global.livy.enabled=true

    Tip global.livy.enabled is set to false by default. In order to use Livy, this must be set to true

  3. When the Livy pod is running in Ready status, restart the web pod.
  4. Note The Livy pod will update with the web pod.

  5. Sign in to Collibra DQ.
  6. Click Explorer in the sidebar navigation menu, then click Session Management next to Remote File Connections.
  7. The Session Management dialog opens.

    Note When Livy is unavailable, Session Management appears greyed-out

  8. Enter the required information.
  9. Option Description
    Cores

    The number of executor cores for Spark processing.

    The default value is 1.

    Memory

    The amount of RAM allocated as storage during Spark processing.

    Allocate at least 1 GB of memory per Livy pod.

    Workers

    The number of Livy sessions available in a Spark cluster for Livy to distribute processing tasks.

    When you increase the number of workers, an additional Livy session is created for each additional worker you specify. For example, if you increase your workers from 1 to 3, then 2 additional Livy sessions are created for a total of 3.

    The minimum value is 1.

    Tip Set the memory and worker values based on your file size. For larger files, you may need to increase the memory and workers. We do not recommend increasing the number of cores to a value greater than 1.

  10. Click Update Session.
  11. A confirmation message appears and asks you to acknowledge that all cached preview data will be evicted.
  12. Create a DQ Job on a Remote File Connection.

Using Livy with Snowflake datasets with private-key authentication

Additional configuration is required to use the Rules Preview feature with Livy for Snowflake datasets that use private-key authentication.

  1. Configure the Snowflake connection string to include a private key and password in the URL (instead of in the driver properties of the connection).
  2. Enable Kerberos and Keytab during Helm deployment.
  3. Set additional Spark configurations for the Livy session in values.yaml:
  4. Copy
    global:
     agent:
      extraJvmOptions: "-Dnet.snowflake.jdbc.enableBouncyCastle=TRUE"
     web:
      extraJvmOptions: "-Dnet.snowflake.jdbc.enableBouncyCastle=TRUE"
     livy:
      extraSparkConf:
       spark.kubernetes.driver.secrets.dq-keytab-<helm-release-name>: /tmp/keytab
       spark.kubernetes.executor.secrets.dq-keytab-<helm-release-name>: /tmp/keytab
       spark.kubernetes.driverEnv.JAVA_TOOL_OPTIONS: "-Dnet.snowflake.jdbc.enableBouncyCastle=TRUE"
       spark.executorEnv.JAVA_TOOL_OPTIONS: "-Dnet.snowflake.jdbc.enableBouncyCastle=TRUE"
  5. Update the Helm chart for the additional Spark configurations:
  6. Copy
     kerberos:
        enabled: true
        kdc: ""
        realm: ""
        krb5:
          path: /etc
          name: krb5.conf
          create: true
          extraOptions: |-
        keytab:
          path: /tmp/keytab
          data: "IA=="
          create: true

Troubleshooting Livy

  • If Livy does not show RUNNING status in the session manager, you can validate that pods are available with the following command:
Copy
$ kubectl get pods
  • If you find that there are errors or data does not load on the Load File step, this may indicate an issue with Livy. You can try to update the session. If there are ongoing issues, you can terminate your session from Session Management Terminate Session.