Docs Home
Viewing docs for
BYOCSelf-Managed

Monitoring and Metrics

On this page

Out of the box, Flink jobs that run in a Bring-Your-Own-Cloud (BYOC) workspace expose metrics by using:

  • JMX (Java Management Extensions)
  • Prometheus (HTTP endpoint scraping)

This page describes what is already configured in your BYOC deployment so you can plug the data into your own monitoring stack (for example, Prometheus + Grafana).
Setting up or operating Prometheus / Grafana itself is outside the scope of this documentation and remains entirely under your control.

What’s Pre-configured?

1. Pod-level Prometheus Annotations

Every Flink pod (JobManager and TaskManager) includes annotations that instruct a Prometheus scraper to collect metrics automatically:

YAML
1annotations:
2  prometheus.io/path: /metrics
3  prometheus.io/port: "9999"
4  prometheus.io/scrape: "true"
  • prometheus.io/path: The HTTP path where metrics are exposed (/metrics).
  • prometheus.io/port: The container port (9999) where the metrics endpoint listens.
  • prometheus.io/scrape: Indicates that the pod should be scraped (true).

The following metric reporters are enabled by default in the Flink cluster configuration shipped with BYOC:

YAML
1metrics.reporters: jmx:promappmgr
2
3# JMX Reporter
4metrics.reporter.jmx.factory.class: org.apache.flink.metrics.jmx.JMXReporterFactory
5metrics.reporter.jmx.port: 10000-10240  # Port range for JMX
6
7# Prometheus Reporter
8metrics.reporter.promappmgr.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
ReporterPurposeWhere It Listens
JMXFor JVM-based monitoring tools or exporters.Ports 10000–10240 on each pod.
PrometheusExposes human-readable metrics on the HTTP endpoint defined by the pod annotations.Port 9999 (/metrics).

Next Steps

  1. Scrape the Metrics
    • Point your in-cluster Prometheus deployment at the Kubernetes namespace (or use ServiceMonitor objects) so it detects pods with the prometheus.io/scrape: "true"annotation. For more details, visit the official Prometheus documentation website.
  2. Visualize in Grafana
    • Build your own using the Prometheus data source.
  3. Define Alerts
    • Define alert rules in Prometheus or Grafana Alerting to monitor job health (e.g., restart count, checkpoint failures, backpressure).

No additional configuration inside Ververica Cloud: Bring-Your-Own-Cloud is required. All metrics are emitted automatically once the Flink cluster starts.

Was this helpful?