Member-only story

Flink SQL Performance Tuning, Part 2

Explaining the concept of Split Distinct aggregation and JOIN optimizations

8 min readJul 3, 2023

In the previous article, we introduced three kinds of optimization mechanisms for Flink SQL as follows.

Reduce sub plan
Mini batch
Local-Global aggregation

These mechanisms correspond to some use cases individually. Among them, mini batch and Local-Global aggregation are both optimized for GROUP BY operations. However, in the previous article, we mentioned that even though both mechanisms can improve the performance of GROUP BY, they are not applicable to DISTINCT.

Therefore, in these articles, we will start by explaining the reason for this problem, and then we will introduce more optimization mechanisms.

Split Distinct Aggregation

Before explaining the problems encountered by DISTINCT, let’s review Local-Global aggregation with an example.

SELECT color, COUNT(DISTINCT id)
FROM T
GROUP BY color

This SQL command is a little different from the previous one, i.e., it uses DISTINCT, but it is…

Flink SQL Performance Tuning, Part 2

Explaining the concept of Split Distinct aggregation and JOIN optimizations

Split Distinct Aggregation

Written by Chunting Wu

No responses yet