Configuring Prometheus Monitoring

Applies toSelf-Managed v3

7 min read

On this page

This guide explains how to install and configure the Prometheus metrics reporter plugin to expose server-side metrics on Ververica Platform Fluss pods. It also details how to set up target scraping and deploy the fluss-grafana chart to visualize cluster data using curated dashboards.

Overview

Fluss exposes server-side metrics through a pluggable reporter framework. The Prometheus reporter publishes a text-format /metrics endpoint on every coordinator and tablet pod. A Prometheus instance then scrapes this endpoint, and you can visualize the data in Grafana.

Configuring monitoring involves the following 3 layers:

Plugin: Install the metrics-prometheus reporter into the Fluss pods so the reporter classes are on the classpath.
Reporter: Enable the reporter in the Fluss Helm values. The upstream chart then renders the metrics block in server.yaml and creates a headless metrics Service per role (coordinator and tablet).
Scrape and Visualize: Point a Prometheus instance at the metrics services, either through annotation-based scraping or using a ServiceMonitor managed by the Prometheus Operator. You can then install the fluss-grafana chart to deploy a curated set of dashboards.

Note

This documentation assumes that you already have a Prometheus stack running in your Kubernetes cluster. Setting up the Prometheus stack is out of scope for this guide.

Prerequisites

Before you configure monitoring, ensure that your environment meets the following requirements:

A Running Fluss Cluster: Deploy a Fluss cluster using the fluss-bundle chart. For details, see Deploying Fluss on Kubernetes.docx.
A Prometheus Instance: Set up a Prometheus instance that scrapes the cluster. You can either configure the instance for annotation-based service discovery, or run it with the Prometheus Operator to discover ServiceMonitor resources. Both methods are supported, so choose the option that matches your existing setup.
Grafana Access: Ensure that Grafana is reachable from the cluster, and connect the Prometheus instance as a data source.
Registry Access: Verify that you have access to the Ververica registry to pull the fluss-grafana chart. For details, see Obtaining Registry Access.docx.

Installing the Prometheus Reporter Plugin

The Prometheus metrics reporter is shipped as a plugin and you must install it into the Fluss pods at startup. The fluss-setup component runs as an init container before the Fluss server starts. This container writes the plugin JAR files to a shared volume that the main container mounts at /opt/fluss/plugins/prometheus/.

The init container and volumes must be applied to both fluss.coordinator and fluss.tablet. The Prometheus reporter runs inside both the coordinator server and every tablet server.

Note

The pattern below installs only metrics-prometheus. If you also need filesystem or lake plugins, such as fs-s3, fs-azure or lake-iceberg-* add them to the same init container. For details on the multi-plugin pattern, see Installing Fluss .docx

For a complete reference, including available plugins, command syntax, private Maven repository setup, pre-baking plugins into a custom image, and troubleshooting, see Installing Fluss .docx.

YAML

1fluss:
2  coordinator:
3    extraVolumes:
4      - name: fluss-plugins
5        emptyDir: {}
6    extraVolumeMounts:
7      - name: fluss-plugins
8        mountPath: /opt/fluss/plugins/prometheus
9        subPath: prometheus
10    initContainers:
11      - name: install-plugins
12        image: registry.ververica.cloud/platform-images/fluss:0.9.1-vv-2
13        command:
14          - /bin/sh
15          - -c
16          - |
17            set -e
18            /opt/fluss/bin/setup/install.sh fluss \
19              metrics-prometheus \
20              --force \
21              -- \
22              -s /path/to/settings.xml -q
23        volumeMounts:
24          - name: fluss-plugins
25            mountPath: /opt/fluss/plugins
26  tablet:
27    extraVolumes:
28      - name: fluss-plugins
29        emptyDir: {}
30    extraVolumeMounts:
31      - name: fluss-plugins
32        mountPath: /opt/fluss/plugins/prometheus
33        subPath: prometheus
34    initContainers:
35      - name: install-plugins
36        image: registry.ververica.cloud/fluss/fluss:<TAG>
37        command:
38          - /bin/sh
39          - -c
40          - |
41            set -e
42            /opt/fluss/bin/setup/install.sh fluss \
43              metrics-prometheus \
44              --force \
45              -- \
46              -s /path/to/settings.xml -q
47        volumeMounts:
48          - name: fluss-plugins
49            mountPath: /opt/fluss/plugins

The -s /path/to/settings.xml flag is optional. You only need this flag when you pull the plugin JAR files from a private Maven repository. If you use the default public Maven Central resolution, you can drop this flag.

To learn how to mount a settings.xml file from a ConfigMap or a Secret , see Installing Fluss.

For the full set of configurable fields under fluss:, refer to the Fluss Helm Chart documentation.

Enabling the Metrics Reporter in Fluss

Enable the Prometheus reporter under the top-level fluss-metrics block instead of configurationOverrides. The upstream chart explicitly rejects setting metrics.reporters or metrics.reporter.<name>.port through configurationOverrides and will fail the Helm install with a validation error.

Add the following to your values.yaml file:

YAML

1fluss:
2  metrics:
3    reporters: prometheus
4    prometheus:
5      port: 9249

Field	Purpose
Field	Purpose
metrics.reporters	Comma-separated list of reporters to enable. Use prometheus to expose the /metrics endpoint over HTTP.
metrics.prometheus.port	TCP port the reporter listens on (default 9249). The chart also wires this into server.yaml automatically.

Behind the scenes the chart:

Renders metrics.reporters: prometheus and metrics.reporter.prometheus.port: 9249 into /opt/fluss/conf/server.yaml on the pod.
Creates two headless ClusterIP Service resources (-coordinator-server-metrics-hs and -tablet-server-metrics-hs), each exposing the configured port

The next section adds either annotations or labels to those metrics services so a Prometheus instance can discover them.

The next section explains how to add either annotations or labels to those metrics services so a Prometheus instance can discover them.

For the full set of configurable fields under fluss:, refer to the Fluss Helm chart documentation.

Exposing the Metrics Endpoint to Prometheus

Two scraping mechanisms work with the metrics services that the chart creates: annotation-based scraping (where you configure a Prometheus instance to discover services by annotation) and ServiceMonitor-based scraping (which uses the Prometheus Operator). Choose the option that matches your existing Prometheus setup.

The upstream Fluss reference for both methods is Metrics and Monitoring.

Annotation-based scraping

Add Prometheus scrape annotations to the metrics services through metrics.prometheus.service.annotations. The underlying HTTP server for the reporter accepts any path, including / and /metrics. However, the standard Prometheus convention is /metrics, which the upstream fluss-bundle chart annotation test asserts.

YAML

1fluss:
2  metrics:
3    reporters: prometheus
4    prometheus:
5      port: 9249
6      service:
7        annotations:
8          prometheus.io/scrape: "true"
9          prometheus.io/path: "/metrics"
10          prometheus.io/port: "9249"

Important

Your Prometheus configuration must include a scrape job that targets Kubernetes services using these annotations.

ServiceMonitor-based scraping (Prometheus Operator)

Tag the metrics services with a label and create a ServiceMonitor resource that matches it. Because the Helm chart does not ship a ServiceMonitor, you must create one yourself.

YAML

1fluss:
2  metrics:
3    reporters: prometheus
4    prometheus:
5      port: 9249
6      service:
7        portName: metrics
8        labels:
9          monitoring: enabled

Field	Purpose
Field	Purpose
metrics.prometheus.service.portName	Name of the metrics port on the headless Service.Must be metrics to match a ServiceMonitor configured with port: metrics.
metrics.prometheus.service.labels	Extra labels applied to the metrics Service. Choose labels that your ServiceMonitor selector matches.

YAML

1apiVersion: monitoring.coreos.com/v1
2kind: ServiceMonitor
3metadata:
4  name: fluss-metrics
5  namespace: <FLUSS_NAMESPACE>
6spec:
7  selector:
8    matchLabels:
9      monitoring: enabled
10  endpoints:
11    - port: metrics
12      path: /

The Prometheus Operator picks up this ServiceMonitor only if its serviceMonitorSelector and ServiceMonitorNameSpaceSelector allow it; set both to {} to enable cluster-wide discovery.

Deploying the fluss-grafana Chart

The fluss-grafana chart bundles 2 curated Grafana dashboards (fluss-overview and fluss-detail) for Fluss server metrics. Ververica Platform packages these dashboards as a single ConfigMap that the Grafana dashboard sidecar picks up automatically.

The PromQL expressions in both dashboards use a namespace variable as a parameter. This allows a single Grafana instance to serve multiple Fluss deployments.

To customize the dashboards (such as adding panels, changing thresholds, or dropping charts), execute the following steps:

Pull the chart locally using helm pull.
Edit the dashboard JSON files located under the templates/ directory.
Install your modified copy instead of the stock release.

Prerequisites

Grafana running in the cluster, configured with the Grafana sidecar for dashboards (kube-prometheus-stack enables this by default).
A Prometheus datasource registered in Grafana that scrapes the Fluss metrics services.

Configurable values

Value	Default	Purpose
Value	Default	Purpose
datasource	prometheus	UID of the Grafana datasource the dashboard panels query against.
labels	{ grafana_dashboard: “1” }	Labels applied to the dashboards ConfigMap. The default matches the standard Grafana-sidecar discovery contract. Override (or extend) if your sidecar is configured with a different label key/value.
annotations	{}	Annotations applied to the dashboards ConfigMap. Useful for the sidecar's folder hint — e.g. k8s-sidecar-target-directory: /tmp/dashboard/Fluss to load the Fluss dashboards under a dedicated folder rather than the sidecar's default.

Install

Log in to the Ververica registry following Obtaining Registry Access, then install the chart, pointing it at your Prometheus datasource:

BASH

1helm install fluss-grafana \
2  oci://registry.ververica.cloud/platform-charts/fluss-grafana \
3  --version 0.9.1-vv-2 \
4  --namespace <GRAFANA_NAMESPACE> \
5  --set datasource=<PROMETHEUS_DATASOURCE_UID>

The datasource value is the UID of a Grafana data source, not its display name. For kube-prometheus-stack , the default Prometheus data source UID is prometheus. To list your data source UIDs, go to Connections → Data sources in the Grafana UI, or query the Grafana API.

To discover the correct chart version, see Obtaining Registry Access.

To override the discovery label for a non-default sidecar setup and pin the dashboards to a dedicated Grafana folder, use the following configuration:

BASH

1helm install fluss-grafana \
2  oci://registry.ververica.cloud/platform-charts/fluss-grafana \
3  --version 0.9.1-vv-2 \
4  --namespace <GRAFANA_NAMESPACE> \
5  --set datasource=<PROMETHEUS_DATASOURCE_UID> \
6  --set-string labels.grafana_dashboard=1 \
7  --set-string annotations.'k8s-sidecar-target-directory'=/tmp/dashboards/Fluss

After installation, the Grafana dashboard sidecar imports the 2 dashboards into the configured folder, or into the default folder if you did not set a k8s-sidecar- target-directory annotation. Each dashboard exposes a namespace template variable. You must set this variable to the namespace where your Fluss release runs.

Each dashboard exposes a namespace template variable — set it to the namespace your Fluss release runs in.

Verifying End-to-End

1. Confirm Fluss Is Exposing Metrics

Hit the Prometheus reporter directly on a pod:

BASH

1kubectl exec -n <FLUSS_NAMESPACE> <FLUSS_RELEASE>-tablet-server-0 -- \
2  wget -qO- http://localhost:9249/
3kubectl exec -n <FLUSS_NAMESPACE> <FLUSS_RELEASE>-coordinator-server-0 -- \
4  wget -qO- http://localhost:9249/

Expected output is raw Prometheus text:

TEXT

1# HELP fluss_tabletserver_messagesInPerSecond ...
2# TYPE fluss_tabletserver_messagesInPerSecond gauge
3fluss_tabletserver_messagesInPerSecond 0.0

2. Confirm the Metrics Services Route to the Pods

BASH

1kubectl run curl --image=curlimages/curl -it --rm --restart=Never -- \
2  curl http://<FLUSS_RELEASE>-tablet-server-metrics-hs.<FLUSS_NAMESPACE>.svc.cluster.local:9249/
3kubectl run curl --image=curlimages/curl -it --rm --restart=Never -- \
4  curl http://<FLUSS_RELEASE>-coordinator-server-metrics-hs.<FLUSS_NAMESPACE>.svc.cluster.local:9249/

3. Confirm Prometheus Is Scraping Fluss

Port-forward Prometheus and check Status → Targets for entries matching the Fluss ServiceMonitor:

BASH

1kubectl port-forward -n <PROMETHEUS_NAMESPACE> \
2  svc/<PROMETHEUS_SERVICE> 9090:9090

Then open http://localhost:9000/targets?search=fluss. Every Fluss pod should appear with the state UP.

A quick instant query also works:

BASH

1curl -s "http://localhost:9090/api/v1/query?query=fluss_coordinator_activeTabletServerCount" | jq .
2curl -s "http://localhost:9090/api/v1/label/__name__/values" | \
3  jq '.data[] | select(startswith("fluss"))'

4. Confirm Dashboards Are Loaded in Grafana

Port-forward Grafana, log in, and search for fluss under Dashboards. The fluss-overview and fluss-detail dashboards should be present. Set the namespace template variable to your Fluss namespace.

Troubleshooting

Reporter returns connection refused

The reporter port is closed. Verify that the reporter is configured by inspecting the rendered server.yaml file on a pod:

BASH

1kubectl exec -n <FLUSS_NAMESPACE> <FLUSS_RELEASE>-tablet-server-0 -- \
2  cat /opt/fluss/conf/server.yaml | grep metrics

Expected output:

YAML

1metrics.reporters: prometheus
2metrics.reporter.prometheus.port: 9249

If those lines are missing, the fluss.metrics.reporters value did not take effect. Re-check your Helm values to ensure that you set the reporter under fluss.metrics instead of fluss.configurationOverrides.

If the lines are present but the port is still closed, the metrics-prometheus plugin is not on the classpath. Re-check the init container logs and inspect the /opt/fluss/plugins/prometheus directory inside the pod.

Metrics services have no metrics port

Confirm that the chart created the metrics services with a named port:

BASH

1kubectl describe svc -n <FLUSS_NAMESPACE> \
2  <FLUSS_RELEASE>-coordinator-server-metrics-hs
3kubectl describe svc -n <FLUSS_NAMESPACE> \
4  <FLUSS_RELEASE>-tablet-server-metrics-hs

Look for the following output:

TEXT

1Port:  metrics  9249/TCP

If the port is unnamed, your metrics-prometheus.service.portName value did not propagate, or metrics.reporters might be empty. Without the reporter enabled, the chart skips creating the metrics services entirely. A ServiceMonitor configured with port: metrics will not resolve until you fix both issues.

ServiceMonitor is not discovered by Prometheus

The metrics services resolve, but Prometheus cannot scrape them. This issue might occur due to the following common causes:

Wrong Endpoint Path: The Fluss Prometheus reporter serves metrics at /, not /metrics. Set endpoints[].path: / on your ServiceMonitor.
Network Restrictions: A network policy or service mesh might deny traffic from the Prometheus pod namespace to <FLUSS_NAMESPACE> on port 9249.
Mismatched Port: The ServiceMonitor uses port: metrics, but the metrics service exposes a differently named port. For details, see the previous section.

Grafana shows “No data” on dashboards

Confirm that the datasource value passed to helm install matches a real Grafana data source UID, and verify that the data source queries the Prometheus instance that is scraping Fluss.

Confirm that the namespace template variable for the dashboard matches the namespace where Fluss is running. The dashboards inject {namespace=”$namespace”} into every PromQL query, so an unset or incorrect value yields zero series.