Member-only story

Evolutionary Data Infrastructure

From Monolith to Self-service Platform

Published in

Better Programming

9 min readSep 26, 2022

Photo by Mitchell Luo on Unsplash

All systems start as a small monolith. In the beginning, when resources and manpower are not sufficient, the monolith is the choice we have to make, even the data infrastructure is no exception.

But as requirements increase, there are more and more scenarios that cannot be achieved by the current architecture, and the system must therefore evolve. Each time the system evolves, it is to solve the problems encountered, so it is necessary to understand the different aspects that need to be considered, and to use the most efficient engineering methods to achieve the goal.

In this article, we will still start with a monolith, as we have done before.

Shift from Monolith to CQRS

But this time, our goal is not to serve a production environment, but to provide the data infrastructure behind all production environments.

A data infrastructure is a “place” where all kinds of data are stored, either structured data or time series data or even raw data. The purpose of this big data (and they are really big) is to provide material for data analysis, business intelligence or machine learning.

In addition to internal uses, there may also be user-facing functions, for example, a list of…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Continue in app

Or, continue in mobile web

Sign up with Google

Sign up with Facebook

Already have an account? Sign in

Published in Better Programming

Last published Nov 10, 2023

Advice for programmers.

Written by Chunting Wu

Architect at SHOPLINE. Experienced in system design, backend development, and data engineering.

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

More from Chunting Wu and Better Programming

Apache Paimon with Flink & Trino: A Streaming Lakehouse Playground

Chunting Wu

Apache Paimon with Flink & Trino: A Streaming Lakehouse Playground

A hands-on guide to integrating Apache Paimon, Flink, and Trino for efficient streaming and querying in data lakehouses.

Nov 18, 2024

How To Update Your Status During Standup Like a Senior Engineer

In

Better Programming

by

Edward Huang

How To Update Your Status During Standup Like a Senior Engineer

A status update is where you can showcase how well you manage ambiguity and is an important way to build trust with your team

Oct 20, 2022

Why I Prefer Regular Merge Commits Over Squash Commits

In

Better Programming

by

Dr. Derek Austin 🥳

Why I Prefer Regular Merge Commits Over Squash Commits

I used to think squash commits were so cool, and then I had to use them all day, every day. Here’s why you should avoid squash

Sep 30, 2022

Is there an Alternative to Debezium + Kafka?

Chunting Wu

Is there an Alternative to Debezium + Kafka?

Evaluating open-source options to improve performance and scalability in CDC pipelines

Nov 4, 2024

See all from Chunting Wu

See all from Better Programming

Recommended from Medium

Medallion Architecture: Principles and Practical Exploration

In

Level Up Coding

by

Santosh Shinde

Medallion Architecture: Principles and Practical Exploration

Data Layout Approach: A Modern Approach to Scalable Data Lakehouse Design and Understanding with Databricks notebook

Feb 15

How Meta Solves Data Lineage At Scale

In

Data Engineer Things

by

Vu Trinh

How Meta Solves Data Lineage At Scale

Meta’s Approach to Data Lineage: How They Did It and What We Can Learn

3d ago

Lists

General Coding Knowledge

20 stories1937 saves

Stories to Help You Grow as a Software Developer

19 stories1619 saves

Coding & Development

11 stories1029 saves

Predictive Modeling w/ Python

20 stories1852 saves

Surrogate Modeling: The Secret to Faster, Smarter Engineering

In

AI Advances

by

Shuai Guo, PhD

Surrogate Modeling: The Secret to Faster, Smarter Engineering

Its fundamentals, capabilities, and engineering applications

3d ago

Choosing the Right File Format in PySpark 🚀

Think Data

Choosing the Right File Format in PySpark 🚀

Imagine this: You’re working on a massive dataset in PySpark, and it’s time to save or read your data. But wait — should you go with CSV…

5d ago

How to Become a World-Class Data Architect

Lewis Gavin

How to Become a World-Class Data Architect

Tips and advice after 10 years of experience

Jan 29

Chart Smarter: How to Design Data Visualizations That Work

In

Data Science Collective

by

Paolo Perrone

Chart Smarter: How to Design Data Visualizations That Work

Our brains are built for visuals.

5d ago

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams