Member-only story

Understanding Apache Paimon Concurrency Control

Designing and testing Apache Paimon concurrency control to reveal common conflict scenarios

--

My girl

Not a member? You can still check out this article through here.

Previously, we tried Apache Paimon at the playground. In the conclusion I mentioned that I would like to know what scenarios concurrency control is designed to handle and what happens when there is a conflict.

This article will design an experiment environment that actually shows what snapshot conflict and files conflict are.

First, use this more complicated playground.

If you already have a main branch, you can just checkout to the jdbc branch.

Experiment Design

We will start two Flink tasks and keep receiving data from Kafka and writing to the same Paimon table.

The two Flink tasks use different consumer groups, so the amount of data written should be equal.

Therefore, we run the following script in Flink SQL.

CREATE TABLE orders (
order_number BIGINT,
price DECIMAL(32,2),
buyer STRING
);

CREATE TEMPORARY TABLE kafka_source1 (
order_number BIGINT,
price DECIMAL(32,2)
) WITH (
'connector' = 'kafka',
'topic' = 'test_topic',
'properties.bootstrap.servers' = 'broker:29092',
'properties.group.id' = 'session1',
'scan.startup.mode' = 'earliest-offset',
'format' = 'json'
);
INSERT INTO orders
SELECT
*,
'session1' AS buyer
FROM kafka_source1;
CREATE TEMPORARY TABLE kafka_source2 (
order_number BIGINT,
price DECIMAL(32,2)
) WITH (
'connector' = 'kafka',
'topic' = 'test_topic',
'properties.bootstrap.servers' = 'broker:29092',
'properties.group.id' = 'session2'…

--

--

Chunting Wu
Chunting Wu

Written by Chunting Wu

Architect at SHOPLINE. Experienced in system design, backend development, and data engineering.

No responses yet