Docs Home
Viewing docs for
Self-ManagedNot available for BYOC

Configuring Remote Storage

On this page

Overview

Remote storage lets Fluss offload log segments and KV snapshots to object storage, providing durability and capacity beyond ephemeral pod storage. Without remote storage, Fluss keeps all data in data.dir inside the pod and loses it on pod restart.

The supported backends are:

BackendURI SchemePlugin
Amazon S3s3://fs-s3
Azure Blob Storage (ADLS Gen2)abfs://fs-azure
OpenShift Data Foundation (ODF) and other S3-compatible storess3://fs-s3

Enabling remote storage requires two changes:

  • Server-side: Install a filesystem plugin into the Fluss pods using an init container, then set the relevant configurationOverrides.
  • Flink client-side: Add the matching filesystem JAR to Flink's lib/ directory so Flink jobs that read from Fluss can resolve remote segments. See Reading and Writing Fluss.

Prerequisites

  • A running Fluss cluster deployed using the fluss-bundle chart. See Deploying Fluss on Kubernetes.
  • A storage location accessible from the Kubernetes cluster: an S3 bucket, an Azure ADLS Gen2 container, or an S3-compatible bucket (for example, one provisioned through an ODF ObjectBucketClaim).
  • Credentials for the chosen backend: an access key pair for S3 and S3-compatible stores, or an account key and service principal for Azure.

Plugin Installation

Each remote storage backend requires a filesystem plugin installed into the Fluss pods at startup. The fluss-setup tool, a Ververica-specific binary bundled in the Fluss Docker image, installs these plugins. It runs as an init container before the Fluss server starts, writing plugin JARs into a shared volume that the main container mounts at the correct path.

For the full reference, including available plugins, command syntax, private Maven repository setup, pre-baking plugins into a custom image, and troubleshooting, see Installing Fluss Setups.

The sections below show the complete Helm values pattern for each backend, including the init container and volume configuration.

Keeping Credentials Out of values.yaml

The per-backend sections below place credentials directly in configurationOverrides for clarity. In practice you should source them from a Kubernetes Secret. Two paths are viable:

  1. Default credential chain (preferred where it applies). For S3, S3-compatible stores, and ODF, the underlying Hadoop S3A layer reads AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from the pod environment using the SDK's default credential-provider chain. Alternatively, you can set s3.aws.credentials.provider to an env-var provider such as com.amazonaws.auth.EnvironmentVariableCredentialsProvider. With this approach, you can omit s3.access.key and s3.secret.key from configurationOverrides entirely. This is the documented path. This is the documented path; see the upstream Fluss S3 filesystem documentation for the IRSA / instance-profile / env-var variants.
  2. Environment-variable substitution in server.yaml. The Fluss Docker image's entrypoint runs envsubst over server.yaml before startup, replacing ${VAR} placeholders inside configurationOverrides with pod environment values. This approach works for any configuration key, including Azure's fs.azure.* keys and other backends that lack a documented env-var fallback. However, it relies on undocumented entrypoint behavior. See Fluss Helm Chart: Additional Notes for the mechanism, caveats, and full examples.

Choose path 1 when the backend supports a documented env-var credential chain. Fall back to path 2, or to a Helm-render-time secret injector such as External Secrets Operator, when it does not.

Amazon S3

Cloud Infrastructure Requirements

You need:

  • An S3 bucket with public access blocked and versioning disabled.
  • An IAM user with an access key, and the following policy attached.

Set the access key ID and secret access key as s3.access.key and s3.secret.key in the Helm values below:

JSON
1{
2  "Version": "2012-10-17",
3  "Statement": [
4    {
5      "Effect": "Allow",
6      "Action": ["s3:ListBucket", "s3:GetBucketLocation"],
7      "Resource": "arn:aws:s3:::<BUCKET_NAME>"
8    },
9    {
10      "Effect": "Allow",
11      "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
12      "Resource": "arn:aws:s3:::<BUCKET_NAME>/*"
13    }
14  ]
15}

For setup instructions, see the AWS S3 documentation and the IAM documentation.

YAML
1fluss:
2  configurationOverrides:
3    remote.data.dir: s3://<BUCKET_NAME>/<PREFIX>/
4    s3.access.key: <AWS_ACCESS_KEY_ID>
5    s3.secret.key: <AWS_SECRET_ACCESS_KEY>
6    s3.region: <AWS_REGION>

Then install the fs-s3 plugin on both coordinator and tablet. See IInstalling Fluss .docx for the complete init container and volume configuration.

For the full set of configurable fields under fluss:, see the Fluss Helm chart documentation.

Storing S3 Credentials in a Kubernetes Secret

Avoid placing credentials directly in values.yaml. Create a Kubernetes Secret with AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY keys, inject them into the coordinator and tablet pods using extraEnv or envFrom, and let the AWS default credential chain pick them up. You can then omit s3.access.key and s3.secret.key from configurationOverrides. See Keeping Credentials Out of values.yaml above and the upstream Fluss S3 filesystem documentation for the IRSA, instance-profile, and env-var variants.

Azure Blob Storage (ADLS Gen2)

Cloud Infrastructure Requirements

You need:

  • An Azure Storage Account with:
    • Hierarchical Namespace (HNS) enabled, which is required for the abfs:// URI scheme used by ADLS Gen2.
    • A storage tier of Standard LRS or higher.
  • An ADLS Gen2 filesystem container inside the storage account (for example, named remote-storage).
  • A Service Principal assigned the Storage Blob Data Contributor role on the storage account.
  • The client ID, client secret, and tenant ID for the service principal.
  • The storage account access key, used alongside OAuth for the DynamicTemporaryAzureCredentialsProvider.

The OAuth token endpoint follows the format:

https://login.microsoftonline.com/<TENANT_ID>/oauth2/token

For setup instructions, see the Azure ADLS Gen2 documentation and the Service Principal documentation.

Helm Values

The abfs:// URI format is abfs://<CONTAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<PATH>.

Add the following configurationOverrides to your values.yaml:

YAML
1fluss:
2  configurationOverrides:
3    remote.data.dir: abfs://<CONTAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<PATH>
4    fs.azure.account.key: <STORAGE_ACCOUNT_KEY>
5    fs.azure.account.oauth.provider.type: org.apache.fluss.fs.azure.token.DynamicTemporaryAzureCredentialsProvider
6    fs.azure.account.oauth2.client.id: <SP_CLIENT_ID>
7    fs.azure.account.oauth2.client.secret: <SP_CLIENT_SECRET>
8    fs.azure.account.oauth2.client.endpoint: https://login.microsoftonline.com/<TENANT_ID>/oauth2/token

Then install the fs-azure plugin on both coordinator and tablet, see Installing Fluss Setups for the complete init container and volume configuration.

For the full set of configurable fields under fluss:, see the Fluss Helm chart documentation.

Storing Azure Credentials in a Kubernetes Secret

The Azure fs.azure.* keys do not have a documented env-var fallback equivalent to AWS's credential chain, so the path 1 approach in Keeping Credentials Out of values.yaml does not apply here. You have two practical options:

  1. Helm-render-time injection. Use a tool such as External Secrets Operator, Vault Agent, or sealed-secrets to materialize the storage account key, service principal client secret, and tenant-specific OAuth endpoint into the values that Helm renders. The credentials never appear in source-controlled files.
  2. ${VAR} substitution in server.yaml. Inject the values into the pods using extraEnv from a Kubernetes Secret, then reference them with ${VAR} placeholders inside configurationOverrides (for example, fs.azure.account.key: ${AZURE_ACCOUNT_KEY}). This relies on the Fluss Docker image's entrypoint envsubst pass. See Fluss Helm Chart: Additional Notes for the full mechanism and the caveat that this is not part of Fluss's public configuration contract.

OpenShift Data Foundation (ODF) and Other S3-Compatible Stores

On OpenShift clusters with OpenShift Data Foundation installed, Fluss can use ODF's S3-compatible Multicloud Object Gateway (MCG, backed by NooBaa) as its remote storage. The same configuration pattern works for any S3-compatible store reachable from the cluster, such as MinIO, Ceph RGW, or Wasabi. Only the endpoint and credentials differ. This section uses ODF as the concrete example.

Cloud Infrastructure Requirements

NooBaa-on-ODF uses a three-resource stack:

  • A BackingStore defines where the bytes physically live.
  • A BucketClass is a placement policy that points at one or more BackingStores.
  • An ObjectBucketClaim (OBC) is the namespace-scoped claim that creates a bucket and emits the credentials and endpoint that Fluss consumes.

You need:

  • OpenShift Data Foundation installed in the cluster (the operator and a StorageCluster). See the Red Hat ODF documentation for installation.
  • A NooBaa BackingStore that determines where NooBaa writes the object bytes. ODF ships with a default BackingStore backed by the cluster's Ceph storage. This default is sufficient for most installations and requires no extra configuration. Define a custom BackingStore (and a matching BucketClass) only when you need NooBaa to target a specific external system, such as an AWS S3 bucket in a particular account or region, an on-prem Ceph RGW, Azure Blob, GCS, or an IBM COS account.

Example of an AWS-S3-backed BackingStore, with credentials supplied through a Secret. For EKS or IRSA, use awsSTSRoleARN in place of the secret reference:

YAML
1apiVersion: noobaa.io/v1alpha1
2kind: BackingStore
3metadata:
4  name: fluss-aws-backingstore
5  namespace: openshift-storage
6spec:
7  type: aws-s3
8  awsS3:
9    targetBucket: <BACKING_BUCKET_NAME>
10    region: <AWS_REGION>
11    secret:
12      name: fluss-aws-backingstore-creds
13      namespace: openshift-storage
14---
15apiVersion: noobaa.io/v1alpha1
16kind: BucketClass
17metadata:
18  name: fluss-bucketclass
19  namespace: openshift-storage
20spec:
21  placementPolicy:
22    tiers:
23      - backingStores:
24          - fluss-aws-backingstore

See the NooBaa BackingStore CRD reference for the full list of supported types (aws-s3, s3-compatible, azure-blob, google-cloud-storage, ibm-cos, pv-pool) and the credential schema for each.

  • An ObjectBucketClaim (OBC) in the namespace where Fluss will run, requesting a NooBaa bucket. The OBC controller automatically provisions two resources alongside it:
    • A ConfigMap with the bucket name and S3 endpoint (BUCKET_NAME, BUCKET_HOST, BUCKET_PORT).
    • A Secret with the access key and secret key (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY).

Set storageClassName to ODF's built-in default openshift-storage.noobaa.io when using the default BackingStore. When using a custom BackingStore, set it to the storage class generated for your custom BucketClass.

Example OBC:

YAML
1apiVersion: objectbucket.io/v1alpha1
2kind: ObjectBucketClaim
3metadata:
4  name: fluss-remote-storage
5  namespace: fluss
6spec:
7  generateBucketName: fluss-remote
8  storageClassName: openshift-storage.noobaa.io   # or the BucketClass storage class for a custom BackingStore

The S3 endpoint inside the cluster is typically s3.openshift-storage.svc:443, or whatever BUCKET_HOST:BUCKET_PORT resolves to in the OBC's ConfigMap. NooBaa's S3 API requires path-style addressing.

Helm Values

Add the following configurationOverrides to your values.yaml:

YAML
1fluss:
2  configurationOverrides:
3    remote.data.dir: s3://<BUCKET_NAME>/<PREFIX>/
4    s3.endpoint: <S3_ENDPOINT>          # e.g. https://s3.openshift-storage.svc:443
5    s3.access.key: <ACCESS_KEY_ID>
6    s3.secret.key: <SECRET_ACCESS_KEY>
7    s3.region: us-east-1                # NooBaa accepts any region; us-east-1 is the default
8    s3.path-style-access: "true"
9    s3.aws.credentials.provider: org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider

Then install the fs-s3 plugin on both coordinator and tablet. See Installing Fluss Setups for the complete init container and volume configuration. The same fs-s3 plugin handles both AWS S3 and S3-compatible endpoints.

For the full set of configurable fields under fluss:, see the Fluss Helm chart documentation.

Storing ODF Credentials in a Kubernetes Secret

The OBC produces a Secret named after the OBC (in this example, fluss-remote-storage) with AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY keys, plus a ConfigMap of the same name with BUCKET_NAME, BUCKET_HOST, and BUCKET_PORT. Inject the Secret into the coordinator and tablet pods using extraEnv or envFrom, then rely on the AWS default credential chain. See Keeping Credentials Out of values.yaml above. For the non-credential values from the ConfigMap (bucket name and endpoint), either inline them at Helm-render time or use ${VAR} substitution as described in Fluss Helm Chart: Additional Notes.

Other S3-Compatible Stores

For MinIO, Ceph RGW, Wasabi, or any other S3-compatible service, use the same Helm values pattern. Substitute the endpoint, region, and credentials for the service in question. Keep s3.path-style-access: "true" and SimpleAWSCredentialsProvider unless the service supports virtual-hosted-style addressing or a different credentials provider.

Verify Remote Storage

After applying the updated values, check that the init container completed successfully:

kubectl logs -n fluss <POD_NAME> -c install-plugins

Expected output contains lines like:

+ /opt/fluss/bin/setup/install.sh fluss fs-s3 --force -- -q

[INFO] Plugin fs-s3 installed successfully.

After the pods are running, confirm that remote storage is active by checking the server configuration:

BASH
1# Tablet server
2kubectl exec -n fluss tablet-server-0 -- \
3  cat /opt/fluss/conf/server.yaml | grep remote.data.dir
4# Coordinator
5kubectl exec -n fluss coordinator-server-0 -- \
6  cat /opt/fluss/conf/server.yaml | grep remote.data.dir

The value should match the remote.data.dir you configured.

To verify the plugin is loaded, check that the plugin directory is populated:

kubectl exec -n fluss tablet-server-0 -- ls /opt/fluss/plugins/s3/

# or for Azure:

kubectl exec -n fluss tablet-server-0 -- ls /opt/fluss/plugins/azure/

Troubleshooting

Init Container Fails to Start

Inspect the init container logs:

kubectl logs -n fluss <POD_NAME> -c install-plugins

Common causes:

  • A network policy blocking outbound Maven repository access.
  • A missing or misconfigured Maven settings file.
  • An incorrect image tag.

Pods Stuck in Init:0/1

Check init container status:

kubectl describe pod -n fluss <POD_NAME>

Look for events under the install-plugins init container. Image pull failures indicate a missing or expired image pull secret. Verify credentials with:

kubectl get secret ververica-registry -n fluss

Remote Segments Not Visible After Writing

Confirm the remote.data.dir URI scheme (s3:// or abfs://) matches the installed plugin. A mismatch causes writes to fall back to local storage silently. Check coordinator logs:

kubectl logs -n fluss coordinator-server-0 | grep -i remote

Azure: Authentication Errors

The DynamicTemporaryAzureCredentialsProvider requires both the account key (fs.azure.account.key) and a valid OAuth endpoint. Verify that the service principal has the Storage Blob Data Contributor role on the storage account itself, not on a container. Check the tenant ID in the endpoint URL.

Further Reading

Was this helpful?