Data Ingestion

Applies toBYOCSelf-Managed v3

1 min read

On this page

Why YAML for CDC?
Quick start (UI)
YAML schema overview
Minimal example – MySQL → MySQL
Common fields and patterns

Engine support: VERA 4.1 (Flink 1.20).

Why YAML for CDC?

Simplified job management – Declarative, human‑readable configuration.
Reusability & consistency – Template values, reuse across envs.
CI/CD‑friendly – Store in Git, review, promote, rollback.
Environment separation – Swap credentials/topics/URIs per env.
Faster onboarding – No Flink internals required.
Tooling compatibility – Validate/lint/test YAML.
Separation of concerns – Data flow vs. runtime/platform config.

Quick start (UI)

Go to Data Ingestion → New Draft and select Blank Draft.
Name the draft and pick your Engine Version (match to what the job was tested with).
Paste a YAML CDC config in the Preview panel and click OK.

You can also create drafts from the SQL Editor or import files from your repo.

YAML schema overview

At minimum, a CDC YAML config contains source and sink sections. A job can include one source and one sink per file; compose multiple files for multiple flows.

YAML

1source:
2  type: <connector>        # e.g., mysql, postgres, oracle, sqlserver, kafka
3  name: <human name>
4  hostname: <host or service>
5  port: <int>
6  username: ${secret_values.mysqlusername}
7  password: ${secret_values.mysqlpassword}
8  database: <db-name>      # optional depending on connector
9  tables: <regex or list>  # e.g., "mysql\.\.*" or [ db.schema.table1, db.schema.table2 ]
10  server-id: <range>       # connector-specific; example for MySQL
11  snapshot.mode: initial   # connector-specific snapshot policy
12
13sink:
14  type: <connector>        # e.g., mysql, postgres, kafka, iceberg, hudi, jdbc
15  name: <human name>
16  hostname: <host-or-broker>
17  port: <int>
18  username: <user>
19  password: <pass>
20  database: <db>
21  table: <table or pattern>
22  upsert-key: <col or list>  # for upsert sinks

Secrets & variables.

Use ${…} expressions (e.g., ${secret_values.mysqlpassword}) to reference values injected at deploy time. Treat credentials as secrets. Do not hardcode.

Minimal example – MySQL → MySQL

YAML

1source:
2  type: mysql
3  name: Database A to Data warehouse
4  hostname: mysql-src
5  port: 3306
6  username: ${secret_values.mysqlusername}
7  password: ${secret_values.mysqlpassword}
8  tables: mysql\.\.*
9
10sink:
11  type: mysql
12  name: Database B to Data warehouse
13  hostname: mysql-dst
14  port: 3306
15  username: root
16  password: pass

Common fields and patterns

Table selection

Single table: tables: mydb.public.users
Multiple tables:

YAML

1tables:
2  - mydb.public.users
3  - mydb.public.orders

Regex pattern: tables: mydb\.public\..* (Escape dots in YAML strings.)

Primary/Upsert keys

For sinks that support upserts, specify a key:

YAML

1sink:
2  type: mysql
3  table: dw.users
4  upsert-key: id

Parallelism & checkpoints (runtime)

Runtime parameters are set on the deployment and can be overridden per job if supported:

YAML

1runtime:
2  parallelism: 4
3  checkpoint-interval: 60s
4  restart-strategy: fixed-delay

Error handling

YAML

1on-error:
2  drop: false      # default; fail the job on deserialization errors
3  dead-letter:     # optional DLQ
4    type: kafka
5    topic: cdc-dlq

Exact runtime/error keys vary by connector; prefer the connector’s reference.

Was this helpful?

Yes No