Universal Blob Storage

Applies toSelf-Managed v2

4 min read

On this page

Configuration
Services

Ververica Platform provides centralized configuration of blob storage for its services.

Storage Provider	Scheme	Artifact Management	Flink 1.19	Flink 1.18	Flink 1.17	Flink 1.16	Flink 1.15	Flink 1.14	Flink 1.13	Flink 1.12
Storage Provider	Scheme	Artifact Management	Flink 1.19	Flink 1.18	Flink 1.17	Flink 1.16	Flink 1.15	Flink 1.14	Flink 1.13	Flink 1.12
AWS S3	s3://	✓	✓	✓	✓	✓	✓	✓	✓	✓
Microsoft ABS	wasbs://	✓	✓	✓	✓	✓	✓	✓	✓	✓
Apache Hadoop® HDFS	hdfs://	✓	✓	✓	✓	✓	✓	✓	✓	✓
Google GCS	gs://	✓	(✓)	(✓)	(✓)	(✓)	(✓)	(✓)	✓	✓
Alibaba OSS	oss://	✓	x	x	x	x	x	x	x	x
Microsoft ABS Workload Identity	wiaz://	✓	✓*	✓*	✓*	✓*	✓*	x	x	x

If you want to run Flink jobs in a namespace other than VVP itself (the recommended way), you need to create a Kubernetes service account in that namespace and a federated identity for your Azure principal yourself.

Before you can run a Deployment you must assign the service account names to pods:

YAML

1spec:
2  template:
3    spec:
4      kubernetes:
5        pods:
6          labels:
7            azure.workload.identity/use: 'true'
8          serviceAccountName: ververica-platform-ververica-platform

Alternatively, you can configure the taskManager and jobManager independently, for example:

YAML

1spec:
2  template:
3    spec:
4      kubernetes:
5        jobManagerPodTemplate:
6          metadata:
7            labels:
8              azure.workload.identity/use: 'true'
9          spec:
10            serviceAccountName: ververica-platform-ververica-platform
11        taskManagerPodTemplate:
12          metadata:
13            labels:
14              azure.workload.identity/use: 'true'
15          spec:
16            serviceAccountName: ververica-platform-ververica-platform

Note

You cannot mix configuration methods. Either specify the pods attribute or the jobManagerPodTemplate and taskManagerPodTemplate.

Important

If you have created your own namespace and related service account dedicated for deployments, you need to replace serviceAccountName: ververica-platform-ververica-platform with your service account name: serviceAccountName: <deployment-related-service-account-name>.

Credentials

Ververica Platform supports using a single set of credentials to access your configured blob storage, and will automatically distribute these credentials to Flink jobs that require them.

These credentials can be either specified directly in values.yaml, or added to a Kubernetes secret out-of-band and referenced in values.yaml by name.

Option 1: `values.yaml`

The following options are configurable, example values are shown:

YAML

1blobStorageCredentials:
2  azure:
3    connectionString: DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=vvpArtifacts;AccountKey=VGhpcyBpcyBub3QgYSB2YWxpZCBBQlMga2V5LiAgVGhhbmtzIGZvciB0aG9yb3VnaGx5IHJlYWRpbmcgdGhlIGRvY3MgOikgIA==;
4  s3:
5    accessKeyId: AKIAEXAMPLEACCESSKEY
6    secretAccessKey: qyRRoU+/4d5yYzOGZVz7P9ay9fAAMrexamplesecretkey
7  hdfs:
8    # Apache Hadoop® configuration files (core-site.xml, hdfs-site.xml)
9    # and optional Kerberos configuration files. Note that the keytab
10    # has to be base64 encoded.
11    core-site.xml: |
12      <?xml version="1.0" ?>
13      <configuration>
14        ...
15      </configuration>
16    hdfs-site.xml: |
17      <?xml version="1.0" ?>
18      <configuration>
19      ...
20      </configuration>
21    krb5.conf: |
22      [libdefaults]
23        ticket_lifetime = 10h
24      ...
25    keytab: BQIAA...AAAC
26    keytab-principal: flink

Option 2: Pre-create Kubernetes Secret

To use a pre-created Kubernetes secret, its keys must match the pattern <provider>.<key>. For example, s3.accessKeyId and s3.secretAccessKey. To configure Ververica Platform to use this secret, add the following snippet to your Helm values.yaml file:

YAML

1blobStorageCredentials:
2  existingSecret: my-blob-storage-credentials

Important

The values in a Kubernetes secret must be base64-encoded.

Example: Apache Hadoop® HDFS

For UBS with Apache Hadoop® HDFS we recommend to pre-create a Kubernetes secret with the required configuration files in order to avoid duplication of the configuration files in the Ververica Platform values.yaml file.

BASH

1kubectl create secret generic my-blob-storage-credentials \
2--from-file hdfs.core-site.xml=core-site.xml \
3--from-file hdfs.hdfs-site.xml=hdfs-site.xml \
4--from-file hdfs.krb5.conf=krb5.conf \
5--from-file hdfs.keytab=keytab \ 
6--from-file hdfs.keytab-principal=keytab-principal

After you have created the Kubernetes secret, you can reference it in the values.yaml (#configuration) as an existing secret. Note that the Kerberos configuration is optional.

Option 3: Loading Credentials from Mounted Files

An alternative way to provide credentials securely to VVP is to access the credentials as mounted files.

To do so, each security key must be configured via a separate file, and the files must be named following the pattern <provider>.<key>.

Example: Blob Storage Credentials for https

YAML

1$ .cat ./http.basicAuthUser
2admin
3
4$ .cat ./http.basicAuthPassword
5password

The directory that contains the credentials files must then be mounted. Assuming the files are under the path /conf/blob-creds, they can be mounted either using environment variables, or using VVP properties. In both cases, the setting is made in values.yaml:

Using environment variables:

YAML

1env:
2  - name: "vvp.blob-storage.credentials-dir"
3    value: "/conf/blob-creds"

Using VVP properties:

YAML

1vvp:
2  blobStorage:
3    credentialsDir: /conf/blob-creds

Important

You can choose any appropriate name for the mounted directory, but the credentials filenames must exactly follow the pattern <provider>.<key>, for example http.basicAuthUser, http.basicAuthPassword.

Advanced Configuration

AWS EKS

When running on AWS EKS or AWS ECS your Kubernetes Pods inherit the roles attached to the underlying EC2 instances. If these roles already grant access to the required S3 resources you only need to configure vvp.blobStorage.baseUri without configuring any blobStorageCredentials.

Apache Hadoop® Versions

UBS with Apache Hadoop® HDFS uses a Hadoop 2 client for communication with the HDFS cluster. Hadoop 3 preserves wire compatibility with Hadoop 2 clients and you are able to use HDFS blob storage with both Hadoop 2 and Hadoop 3 HDFS clusters.

However, note that there may be incompatabilities between Hadoop 2 and 3 with respect to the configuration files core-site.xml and hdfs-site.xml. As an example, Hadoop 3 allows to configure durations with a unit suffix such as 30s which results in a configuration parsing error with Hadoop 2 clients. It's generally possible to work around these issues by limiting configuration to Hadoop 2 compatible keys/values.

Apache Flink® Hadoop Dependency

When using HDFS UBS, Ververica Platform dynamically adds the Hadoop dependency flink-shaded-hadoop-2-uber to the classpath. You can use the following annotation to skip this step:

YAML

1kind: Deployment
2spec:
3  template:
4    metadata:
5      annotations:
6        ubs.hdfs.hadoop-jar-provided: true

This is useful if you your Docker image provides a Hadoop dependency. If you use this annotation without a Hadoop dependency on the classpath, your Flink application will fail.

Services

The following services make use of the universal blob storage configuration.

Apache Flink® Jobs

Flink jobs are configured to store blobs at the following locations:

Blob	Storage Location
Blob	Storage Location
Checkpoints	${baseUri}/flink-jobs/namespaces/${ns}/jobs/${jobId}/checkpoints
Savepoints	${baseUri}/flink-savepoints/namespaces/${ns}/deployments/${deploymentId}
High Availability	${baseUri}/flink-savepoints/namespaces/${ns}/deployments/${deploymentId}

User-provided configuration has precedence over universal blob storage.

Artifact Management

Artifacts are stored in the following location:

BASH

1${baseUri}/artifacts/namespaces/${ns}

SQL Service

The SQL Service depends on blob storage for storing deployment information and JAR files of user-defined functions.

SQL Deployments

Before a SQL query can be deployed it needs to be optimized and translated to a Flink job. SQL Service stores the Flink job and all JAR files that contain an implementation of a user-defined function which is used by the query at the following locations:

Blob	Storage Location
Blob	Storage Location
Job	${baseUri}/flink-jobs/namespaces/${ns}/jobs/${jobId}/jobgraph
UDF JAR Files	${baseUri}/flink-jobs/namespaces/${ns}/jobs/${jobId}/udfs

After a query has been deployed, Application Manager maintains the same blobs as for regular Flink jobs (#apache-flink-jobs), i.e., checkpoints, savepoints, and high-availability files.

UDF Artifacts

The JAR files of UDF Artifacts that are uploaded via the UI are stored in the following location:

BASH

1${baseUri}/sql-artifacts/namespaces/${ns}/udfs/${udfArtifact}

Connectors, Formats, and Catalogs

The JAR files of Custom Connectors and Formats and Custom Catalogs that are uploaded via the UI are stored in the following location:

BASH

1${baseUri}/sql-artifacts/namespaces/${ns}/custom-connectors/

Was this helpful?

Yes No

Universal Blob Storage

Configuration

Storage Providers

Additional Provider Configuration

Microsoft ABS Workload Identity

Credentials

Option 1: `values.yaml`

Option 2: Pre-create Kubernetes Secret

Example: Apache Hadoop® HDFS

Option 3: Loading Credentials from Mounted Files

Example: Blob Storage Credentials for https

Advanced Configuration

AWS EKS

Apache Hadoop® Versions

Apache Flink® Hadoop Dependency

Services

Apache Flink® Jobs

Artifact Management

SQL Service

SQL Deployments

UDF Artifacts

Connectors, Formats, and Catalogs

Universal Blob Storage

Microsoft ABS Workload Identity

Option 1: values.yaml

Option 2: Pre-create Kubernetes Secret

Example: Apache Hadoop® HDFS

Option 3: Loading Credentials from Mounted Files

Example: Blob Storage Credentials for https

AWS EKS

Apache Hadoop® Versions

Apache Flink® Hadoop Dependency

SQL Deployments

UDF Artifacts

Connectors, Formats, and Catalogs

Option 1: `values.yaml`