Docs Home
Viewing docs for
Self-ManagedNot available for BYOC

Deploying Fluss on Kubernetes

On this page

This document guides developers through deploying and hardening a Fluss cluster on Kubernetes using Helm. It covers installation workflows, multi-cluster management, client connectivity, and production optimization strategies.

Overview

The fluss-bundle Helm chart deploys a complete Fluss cluster on Kubernetes. The chart bundles Fluss with ZooKeeper (Bitnami, with the image mirrored to the Ververica registry) and exposes the Fluss client endpoint on port 9124.

The default deployment topology includes the following components:

ComponentOwner
Coordinator server1
Tablet servers3
ZooKeeper nodes3

The Ververica Platform registry at registry.ververica.cloud hosts all images for both Fluss and ZooKeeper. A single image pull secret covers both of these components.

This manual covers a minimal deployment and common production hardening options. You can find information about remote storage, SASL authentication, and Prometheus monitoring in separate manuals.

Prerequisites

  • Kubernetes 1.24+ cluster with kubectl configured and pointing to the target cluster
  • Helm 3.8+ (OCI chart support required)
  • Access to the Ververica registry (registry.ververica.cloud)
  • Sufficient cluster capacity for the default pod count (7 pods) plus your configured resource requests

Registry Credentials

To obtain credentials, view available artifact endpoints, and find registry login commands for both Helm OCI and Docker, see Obtaining Registry Access. The remainder of this manual assumes you have exported REGISTRY_USERNAME and REGISTRY_PASSWORD in your shell.

Create the Image Pull Secret

Create a Kubernetes image pull secret in your target namespace before installing the chart. Both the Fluss and ZooKeeper pods use this secret:

BASH
1kubectl create namespace fluss
2kubectl create secret docker-registry ververica-registry \
3  --docker-server=registry.ververica.cloud \
4  --docker-username="$REGISTRY_USERNAME" \
5  --docker-password="$REGISTRY_PASSWORD" \
6  --namespace fluss

Verify Registry Access

Before installing the chart, confirm that both the Helm OCI registry and the Kubernetes image pull path work end-to-end. See Obtaining Registry Access.docx for the canonical artifact paths.

Verify Helm OCI access

Pull the chart locally without installing it by running the following command:

BASH
1helm pull oci://registry.ververica.cloud/platform-charts/fluss-bundle \
2  --version 0.9.1-vv-2 \
3  --destination /tmp

A fluss-bundle-0.9.1-vv-2.tgz file should appear under /tmp. Authentication failures surface as 401 Unauthorized or 403 Forbidden. If you encounter these errors, see Troubleshooting.

Verify Docker image pull from the cluster

Run a one-shot pod in your Fluss namespace that uses the pull secret to fetch the Fluss image:

BASH
1kubectl -n fluss run registry-check \
2  --rm -it --restart=Never \
3  --image=registry.ververica.cloud/platform-images/fluss:0.9.1-vv-2 \
4  --overrides='{"spec":{"imagePullSecrets":[{"name":"ververica-registry"}]}}' \
5  -- /bin/sh -c "echo image pulled successfully"

If the pod prints that the image pulled successfully and exits cleanly, you have wired up the secret correctly. A pod stuck in ImagePullBackOff indicates an authentication or naming problem. If you encounter this issue, see Troubleshooting.

Install the Chart

Choose a Version

Create a values.yaml file that references the pull secret. You must include the zookeeper.image.pullSecrets entry because the Ververica Platform registry also serves the bundled ZooKeeper image. The fluss.image.tag defaults to the image version shipped with this chart release and should not be overridden:

YAML
1fluss:
2  image:
3    pullSecrets:
4      - ververica-registry
5zookeeper:
6  image:
7    pullSecrets:
8      - ververica-registry

For the full set of configurable fields under fluss:, refer to the Fluss Helm Chart documentation.

Install

BASH
1helm install fluss oci://registry.ververica.cloud/platform-charts/fluss-bundle \
2  --version 0.9.1-vv-2 \
3  --namespace fluss \
4  -f values.yaml

Upgrade

When you change a value in values.yaml and want to apply it without bumping the chart version, run the following command:

BASH
1helm upgrade fluss oci://registry.ververica.cloud/platform-charts/fluss-bundle \
2  --version 0.9.1-vv-2 \
3  --namespace fluss \
4  -f values.yaml

To upgrade to a new chart release, point to the new version by running the following command:

BASH
1helm upgrade fluss oci://registry.ververica.cloud/platform-charts/fluss-bundle \
2  --version 0.9.1-vv-2 \
3  --namespace fluss \
4  -f values.yaml

Verify the Deployment

Verify that all pods reach the Running state by running the following command:

BASH
1kubectl get pods -n fluss

Expected pods include the following:

PodCount
coordinator-server-01
tablet-server-0/1/23
fluss-zookeeper-0/1/23

List the services to identify the client endpoint by running the following command:

BASH
1kubectl get services -n fluss

The coordinator server service exposes the Fluss client port (9124).

Managing Multiple Fluss Clusters

Each Helm release in a separate Kubernetes namespace acts as an independent Fluss cluster. No state is shared between namespaces.

To run a second cluster alongside the first, execute the following command:

BASH
1kubectl create namespace fluss-staging

Create the image pull secret in the new namespace by following the same steps as Create the Image Pull Secret, substituting --namespace fluss-staging.

BASH
1helm install fluss-staging oci://registry.ververica.cloud/platform-charts/fluss-bundle \
2  --version 0.9.1-vv-2> \
3  --namespace fluss-staging \
4  -f values-staging.yaml

Manage each cluster independently with helm upgrade or helm uninstall commands scoped to its specific namespace.

Connecting to Fluss

Fluss clients connect using bootstrap.servers pointing to the coordinator pod through its headless service on port 9124. The Fluss chart uses fixed service names without a Helm release-name prefix, so the bootstrap address is namespace-scoped:

TEXT
1coordinator-server-0.coordinator-server-hs.<namespace>.svc.cluster.local:9124

For a cluster deployed in the fluss namespace, use the following address:

TEXT
1coordinator-server-0.coordinator-server-hs.fluss.svc.cluster.local:9124

The coordinator and tablet pods have readiness and liveness probes configured, so a pod in the Running state with ready containers has already passed the port check. To verify connectivity manually from inside the cluster, run a temporary pod:

BASH
1kubectl run netcat-check --rm -it --restart=Never \
2  --image=busybox:latest -n fluss -- \
3  nc -zv coordinator-server-0.coordinator-server-hs.fluss.svc.cluster.local 9124

Substitute fluss with your namespace if it is different.

For information about configuring Flink SQL and Java SDK clients against this bootstrap address, see Reading and Writing Fluss

Production Hardening

Persistent Storage

By default, tablet servers use /tmp/fluss/data for the data.dir configuration. This path is an ephemeral, in-pod path that does not survive pod restarts. You must enable persistent volumes for production deployments.

YAML
1fluss:
2  tablet:
3    storage:
4      enabled: true
5      size: 500Gi
6      storageClass: gp3
7zookeeper:
8  persistence:
9    enabled: true
10    storageClass: gp3
11    accessModes: ["ReadWriteOnce"]
12    size: 8Gi
13    dataLogDir:
14      size: 8Gi

Replace gp3 with your cluster's storage class.

To list the available classes, run the following command:

BASH
1kubectl get storageclasses

Resource Requests and Limits

The Helm chart ships with no resource requests or limits set. You must configure these settings for your production environments:

YAML
1fluss:
2  resources:
3    coordinatorServer:
4      requests:
5        cpu: "2"
6        memory: 4Gi
7      limits:
8        cpu: "2"
9        memory: 4Gi
10    tabletServer:
11      requests:
12        cpu: "4"
13        memory: 8Gi
14      limits:
15        cpu: "4"
16        memory: 8Gi
17zookeeper:
18  resources:
19    requests:
20      memory: 2Gi
21      cpu: "1"
22    limits:
23      memory: 2Gi
24      cpu: "1"

Replication and Bucketing

The Helm chart sets the following default values:

YAML
1fluss:
2  configurationOverrides:
3    default.bucket.number: 3
4    default.replication.factor: 3
  • default.replication.factor: Specifies how many tablet server replicas hold a copy of each bucket. This value must not exceed the value that you set for fluss.tablet.numberOfReplicas.
  • default.bucket.number: Specifies the default number of buckets (shards) per table. Individual tables can override this setting at creation time using the ‘bucket.num’ table property in the Flink SQL WITH clause. A value equal to or a multiple of the tablet server count distributes the load evenly.
  • fluss.tablet.numberOfReplicas: Controls the number of tablet server pods. The default value is 3. You might lower this value in resource-constrained environments, but you must adjust default.replication.factor accordingly.

Service Account

The Helm chart creates no service account by default (fluss.serviceAccount.create: false). You can either link an existing service account or have the chart create one for you.

YAML
1fluss:
2  serviceAccount:
3    create: true   # set to false to use an existing account
4    name: fluss-sa

You need a service account when you bind Fluss pods to a workload identity, such as AWS IRSA or GKE Workload Identity. This binding grants the pods access to cloud resources, such as remote storage.

Uninstall

BASH
1helm uninstall fluss --namespace fluss

The cluster does not delete persistent volume claims automatically. To remove them, you must manually delete the claims:

BASH
1kubectl delete pvc -n fluss --all

Troubleshooting

Pods not starting

To troubleshoot issues, you can inspect the pod events and logs by running the following commands:

BASH
1kubectl describe pod <POD_NAME> -n fluss
2kubectl logs <POD_NAME> -n fluss

Common causes for this issue include a missing image pull secret, insufficient cluster resources, or a PVC provisioning failure.

Pod stuck in ImagePullBackOf or ErrImagePull

To find the precise registry error, inspect the pod events by running the following command:

BASH
1kubectl -n fluss describe pod <POD_NAME>

Common causes for this issue include the following items:

Symptom in eventsCauseFix
pull access denied, unauthorizedPull secret missing, wrong name, or wrong namespace.Recreate the secret in the pod's namespace, and then confirm that the pullSecrets field in your value.yaml file references its exact name.
manifest unknownTag does not exist.Confirm that the <TAG> matches the value that Ververica Platform communicated to you.
dial tcp . . . i/o timeoutCluster nodes cannot reach registry.ververica.cloud.Check egress firewalls, NAT gateways, and any registry mirror configurations on the nodes.

Image pull secret not picked up by the pod

You must set imagePullSecrets on the pod template, not on the namespace itself. Confirm the following details:

  • The Helm release was installed with both fluss.image.pullSecrets and zookeeper.image.pullSecrets set to the secret name.
  • The secret exists in the same namespace as the Helm release.

To verify the secret exists, run the following command:

BASH
1kubectl -n fluss get secret ververica-registry

A namespace mismatch between the secret and the release is the most common cause of this issue. In this scenario, the chart installs successfully, but pods fail to pull images.

Bundled Zookeeper Pods Fail to Pull

If the Fluss pods start but the ZooKeeper pods do not, you likely configured the pull secret under fluss.image.pullSecrets but omitted it under zookeeper.image.pullSecrets. Both configuration blocks must reference the secret because the bundled ZooKeeper image is also served from the Ververica Platform registry.

401 Unauthorized or 403 Forbidden or helm registry login or helm pull

  • Verify that your username and password are correct and do not contain leading or trailing whitespace.
  • Confirm that your credentials were issued specifically for the Fluss projects on the registry, rather than for an unrelated Ververica Platform product.
  • Run helm registry logout registry.ververica.cloud to clear stale cached credentials, and then log in again.
  • Re-run docker logout registry.ververica.cloud and log in again to clear stale cached credentials.

Complete values.yaml example

The following example shows a production-ready values.yaml file that combines all the configuration settings described in this manual:

YAML
1fluss:
2  image:
3    pullSecrets:
4      - ververica-registry
5  configurationOverrides:
6    default.bucket.number: 3
7    default.replication.factor: 3
8  coordinator:
9    numberOfReplicas: 1
10  tablet:
11    numberOfReplicas: 3
12    storage:
13      enabled: true
14      size: 500Gi
15      storageClass: gp3
16  resources:
17    coordinatorServer:
18      requests:
19        cpu: "2"
20        memory: 4Gi
21      limits:
22        cpu: "2"
23        memory: 4Gi
24    tabletServer:
25      requests:
26        cpu: "4"
27        memory: 8Gi
28      limits:
29        cpu: "4"
30        memory: 8Gi
31zookeeper:
32  image:
33    pullSecrets:
34      - ververica-registry
35  persistence:
36    enabled: true
37    storageClass: gp3
38    accessModes: ["ReadWriteOnce"]
39    size: 8Gi
40    dataLogDir:
41      size: 8Gi
42  resources:
43    requests:
44      cpu: "1"
45      memory: 2Gi
46    limits:
47      cpu: "1"
48      memory: 2Gi

Further Reading

Was this helpful?