Deploy Apache Kyuubi

Charmed Apache Spark can work with Apache Kyuubi, which is a distributed and multi-tenant gateway to provide serverless SQL on lakehouses. The following guide is to set up Apache Kyuubi for running SQL queries with Charmed Apache Spark.

Prerequisites

To continue, make sure that you have a properly set up Charmed Apache Spark K8s environment with Juju.

See the How to setup K8s environment guide for instructions on how to set up with different cloud providers and follow the Environment setup page in our Tutorial to set up a service account.

Initial setup

Add a new Juju model:

juju add-model kyuubi

Deploy the Apache Kyuubi charm:

juju deploy kyuubi-k8s --trust --channel=latest/edge

Open a new terminal window/tab and run the following for a persistent view of the model status:

juju status --watch=1s

Wait for the kyuubi-k8s app to have the blocked status.

Get back to the previous terminal, then deploy the spark-integration-hub-k8s:

juju deploy spark-integration-hub-k8s --trust --channel=latest/edge

Wait for it to initialise (change its status from waiting to active) and integrate the two charms:

juju integrate kyuubi-k8s spark-integration-hub-k8s

You should see the blocked status with the Missing Object Storage backend message.

Object storage setup

Kyuubi currently supports two mutually-exclusive backends for the object storage:

S3-compatible storage
Azure DataLake storage

Please see the following sections for guidance on how to configure each.

S3

For S3-compliant storage, deploy S3 integrator:

juju deploy s3-integrator

Configure S3 integrator to use your S3 object storage by setting up endpoint address, bucket, path to folder, and credentials:

juju config s3-integrator endpoint=<ADDRESS> bucket=<BUCKET-NAME> path=<FOLDER-NAME>
juju run s3-integrator/leader sync-s3-credentials access-key=<ACCESS-KEY> secret-key=<SECRET-KEY>

As a result, the message ok: Credentials successfully updated. should appear.

Now it’s time to integrate the s3-integrator charm:

juju integrate s3-integrator spark-integration-hub-k8s

After that, kyuubi-k8s app becomes blocked with the Missing authentication database relation message.

Azure DataLake

For Azure DataLake storage, deploy Azure storage integrator:

juju deploy azure-storage-integrator

Create a Juju secret containing the Azure Storage account secret key, and grant the charm access to this secret.

juju add-secret azure-storage-secret secret-key=XXXXXXX
juju grant-secret azure-storage-secret azure-storage-integrator

Configure the charm to use the newly added secret (secret id is displayed in the add-secret command’s output) to access your Azure DataLake storage account and container:

juju config azure-storage-integrator credentials=<secret-id>
juju config azure-storage-integrator storage-account=XXXX container=XXXX

And finally integrate it with the integration hub:

juju integrate spark-integration-hub-k8s azure-storage-integrator

After that, kyuubi-k8s app becomes blocked with the Missing authentication database relation message.

Authentication database setup

For the Apache Kyuubi to work with Charmed Apache Spark, it needs a postgresql-k8s charm deployed and serve as its authentication database.

Deploy and integrate the postgresql-k8s charm:

juju deploy postgresql-k8s --channel=14/stable --trust kyuubi-users
juju integrate kyuubi-k8s:auth-db kyuubi-users

Wait for all statuses to become active.

Connect

Expose the kyuubi-k8s app for external connection:

juju config kyuubi-k8s expose-external=loadbalancer

Get Apache Kyuubi connection details:

juju run kyuubi-k8s/leader get-jdbc-endpoint
juju run kyuubi-k8s/leader get-password

Finally, use the external endpoint address and database password to connect with your JDBC-compliant tool of choice:

beeline -u <ENDPOINT> -n admin -p <PASSWORD>

Connection example

For example, you can use the beeline tool that is a part of the Apache Kyuubi charm. We recommend deploying a new, dedicated Apache Kyuubi app for that:

juju deploy kyuubi-k8s --channel=latest/edge --trust beeline

Wait for the new app to become blocked, but don’t integrate it to anything. Instead, open its SSH terminal, enter bash, and run the beeline tool from inside:

juju ssh --container=kyuubi beeline/0 bash
/opt/kyuubi/bin/beeline -u jdbc:hive2://10.41.189.148:10009/ -n admin -p 06yJY5OkhcxVhQ0C

Wait a few minutes for the Apache Kyuubi to start. You will see log output with Pending phase while Kubernetes pods are starting:

25/05/27 18:29:05 INFO LoggingPodStatusWatcherImpl: Application status for spark-7244f14bac9e4e3f9c505143f4dce374 (phase: Pending)

After the initialisation is complete, you will get the Beeline version and prompt:

Connected to: Spark SQL (version 3.4.2)
Driver: Kyuubi Project Hive JDBC Client (version 1.10.1)
Beeline version 1.10.1 by Apache Kyuubi
0: jdbc:hive2://10.41.189.148:10009/>

You can run your SQL commands now, for example:

create database abc;
use abc;
create table users (id int);
insert into users values (1);
select * from users;

Metastore

To make Apache Kyuubi units stateless, we need to set up external storage for metadata, or metastore.

Deploy a new instance of postgresql-k8s charm for the metastore:

juju deploy postgresql-k8s --trust --channel=14/stable metastore

Now integrate Apache Kyuubi app with the metastore:

juju integrate kyuubi-k8s:metastore-db metastore

High availability

For high availability, add more units for the Apache Kyuubi app:

juju scale-application kyuubi-k8s 3

Now the application is blocked, as it requires ZooKeeper to work in a multi-node mode. Deploy and relate the ZooKeeper charm:

juju deploy zookeeper-k8s --channel=3/edge --trust -n 1
juju relate kyuubi-k8s zookeeper-k8s

Wait for all the units of the kyuubi-k8s application to become active.

Previous Deploy Charmed Apache Spark Next Manage Service Accounts using the snap

Last updated a day ago. Help improve this document in the forum.

Quick links

Quick links

Quick links

Quick links

Quick links

Quick links

Quick links

Quick links

Quick links

Categories

Industries

Case studies ›

Partner programs

Quick links

Roles by department

Working here

Explore Canonical

Latest updates

Company highlights ›

Deploy Apache Kyuubi

Prerequisites

Initial setup

Object storage setup

S3

Azure DataLake

Authentication database setup

Connect

Connection example

Metastore

High availability