Delta Lake
Applies toBYOC
1 min read
On this page
The Delta Lake connector allows you to read and write Delta Lake tables in Flink. Delta Lake is an open-source storage layer that brings reliability to data lakes with ACID transactions, schema enforcement, and time travel.
Supported Version: VERA Engine 4.3
Background Information
A Delta table is a directory containing Parquet data files and a transaction log (the _delta_log folder) that tracks all operations. Engines like VERA Engine read this log to maintain a consistent view of the table.
Features
The Delta Lake connector provides the following capabilities:
- Source and Sink Support: Use Delta tables as both input sources and output sinks in your SQL jobs.
- Streaming Mode: Submit queries in streaming mode for continuous data processing.
- Parquet Format: Uses the embedded Parquet format for efficient storage and retrieval.
- Schema Enforcement: Automatically reflects schema evolution from Delta Lake.
Prerequisites
- A Delta Catalog must be configured. Using the Delta Lake connector without a pre-created Delta Catalog will fail.
- Storage for Delta tables (such as an S3 bucket) that is accessible from Ververica Cloud.
Syntax
SQL
1CREATE TABLE delta_table (
2 c1 VARCHAR,
3 c2 INT
4) WITH (
5 'connector' = 'delta',
6 'table-path' = 's3a://your-bucket/path/to/table'
7);Parameters in the WITH Clause
SQL Usage Examples
Create a Delta Table and Insert Data
SQL
1-- Create a Delta table backed by an S3 location
2CREATE TABLE c_delta.db_new.t_foo (
3 c1 VARCHAR,
4 c2 INT
5) WITH (
6 'connector' = 'delta',
7 'table-path' = 's3a://warehouse/'
8);
9
10-- Write data
11INSERT INTO c_delta.db_new.t_foo
12VALUES ('a', 42);Create a Partitioned Delta Table
SQL
1CREATE TABLE testTable (
2 id BIGINT,
3 data STRING,
4 part_a STRING,
5 part_b STRING
6)
7PARTITIONED BY (part_a, part_b)
8WITH (
9 'connector' = 'delta',
10 'table-path' = 's3a://your-bucket/testTable',
11 'delta.appendOnly' = 'true'
12);Move Data Between Delta Tables
Both the source and sink tables must use the Delta connector, and their schemas must match.
SQL
1INSERT INTO sinkTable
2SELECT *
3FROM sourceTable;Limits
- Streaming Mode Only: Queries must be submitted in streaming mode. Batch queries may fail to commit output correctly.
- Append-only Writes: Only append writes are currently supported for result tables.
- Physical Columns Only: The connector currently supports only physical columns. Metadata and computed columns are not supported.
- Mode Property: For all source table queries, use the
/*+ OPTIONS('mode' = 'streaming') */property modifier to ensure proper committing to the delta log.
Related Topics
Was this helpful?