Member-only story

Flink SQL Performance Tuning, Part 2

Explaining the concept of Split Distinct aggregation and JOIN optimizations

Chunting Wu
8 min readJul 3, 2023
Photo by Jason Song on Unsplash

In the previous article, we introduced three kinds of optimization mechanisms for Flink SQL as follows.

  1. Reduce sub plan
  2. Mini batch
  3. Local-Global aggregation

These mechanisms correspond to some use cases individually. Among them, mini batch and Local-Global aggregation are both optimized for GROUP BY operations. However, in the previous article, we mentioned that even though both mechanisms can improve the performance of GROUP BY, they are not applicable to DISTINCT.

Therefore, in these articles, we will start by explaining the reason for this problem, and then we will introduce more optimization mechanisms.

Split Distinct Aggregation

Before explaining the problems encountered by DISTINCT, let’s review Local-Global aggregation with an example.

SELECT color, COUNT(DISTINCT id)
FROM T
GROUP BY color

This SQL command is a little different from the previous one, i.e., it uses DISTINCT, but it is…

--

--

Chunting Wu
Chunting Wu

Written by Chunting Wu

Architect at SHOPLINE. Experienced in system design, backend development, and data engineering.

No responses yet