Member-only story
Flink SQL Performance Tuning, Part 2
Explaining the concept of Split Distinct aggregation and JOIN optimizations
In the previous article, we introduced three kinds of optimization mechanisms for Flink SQL as follows.
- Reduce sub plan
- Mini batch
- Local-Global aggregation
These mechanisms correspond to some use cases individually. Among them, mini batch and Local-Global aggregation are both optimized for GROUP BY
operations. However, in the previous article, we mentioned that even though both mechanisms can improve the performance of GROUP BY
, they are not applicable to DISTINCT
.
Therefore, in these articles, we will start by explaining the reason for this problem, and then we will introduce more optimization mechanisms.
Split Distinct Aggregation
Before explaining the problems encountered by DISTINCT
, let’s review Local-Global aggregation with an example.
SELECT color, COUNT(DISTINCT id)
FROM T
GROUP BY color
This SQL command is a little different from the previous one, i.e., it uses DISTINCT
, but it is…