Member-only story
Is there an Alternative to Debezium + Kafka?
Evaluating open-source options to improve performance and scalability in CDC pipelines
Not a member? You can still check out this article through here.
I asked this question on Reddit a while back and received lots of valuable answers.
Therefore, I’ve looked into each answer and documented the results in this article.
TL;DR
No, Debezium dominates the market at the moment, despite some drawbacks.
Background Explanation
Why would we want to find an alternative to Debezium? The main reason is we encountered a challenging scenario.
This is a typical scenario for Debezium, where any modifications to the data source are captured and fed into Kafka for downstream processing.
The advantage of this architecture is simple and efficient, ensuring all downstream processes are as real-time as possible.
If the source has a large number of updates, Debezium can scale horizontally until a large number of updates are concentrated in a single table. This is where Debezium hits its limits.
Even though Debezium can scale horizontally, it means the updates originally handled by one process can be distributed to multiple processes. If each table already has a dedicated process, horizontal scaling is no longer feasible.
We are in such a situation, in our environment, even if the machine specification is stretched, the CDC throughput of a single table is capped at 25 MB/s.
This is certainly not a regular case, after all, 25 MB/s change for a single table is quite significant. However, if we encounter a data source that is doing large-scale data migration, this limit can be easily breached.
In order to ensure the real-time performance of our data pipeline downstream, we can only ask the upstream to be merciful when encountering this level of data migration, and try to do a good job of rate limiting.
However, this limitation will greatly reduce the productivity of the upstream developers. On the one hand, they have to add auditing process to their…