Redis helps avoiding racing conditions

Chunting Wu
6 min readAug 15, 2021

--

In the previous article, we had introduced how to ensure the integrity of transferring money. However, it’s not good enough to prevent going wrong. That approach will probably take time to retry many times to make things be done, so that it consumes client’s time as well as occupying the database resource.

User Scenario

Suppose there is a table for maintaining the lottery tickets, and each ticket has its own owner, serial number and a flag, remain, of whether it is used or not. In addition, there is another table to keep winning records.

tickets
rewards

For instance, A had 3 tickets but used one. And then, B got one ticket. Besides, the used ticket wins 100 dollars for A.

Simple Design

According the our previous introduction, we will design the remain flag in an unsigned integer with the default value: 1. When consuming this ticket, we uses the atomic update to increase the remain by -1. If a user has only one ticket but wants to draw twice, the second attempt will fail due to the negative value of remain.

Represent in the psudo code is like:

`START TRANSACTION`
A_ticket = `select id from tickets where \
remain = 1 and user = 'A' order by id limit 1`
`update tickets set remain = remain - 1 where id = A_ticket`
reward = bingo()
`insert into rewards (id, reward) values (${A_ticket}, ${reward})`
`COMMIT`

First, A’s first ticket will be picked out, and the lottery will start after deduction. Finally, the result will be updated into the database.

But the problem occurs in the assumption that A has two tickets like the example at the beginning of this article, which means that A can draw twice at the same time. When A draws twice simultaneously, it should be successful without fail.

Run the process once with the implementation we mentioned. The first one who comes in will get a ticket number 3 and the second one who comes in will also get the same ticket. As a result, the first draw will succeed but the second attempt will be failed. If it fails, then it must try again, and then it will get the ticket number 4 and succeed.

Such a procedure would be correct after all, but it is not good enough. When you have n tickets, you obviously can draw n times at the same time within O(1). However, such an implementation leads to O(n). Among them, O(n-1) is a retry. And, the higher the n, the worse the user experience.

Random select

Here comes the second design, we don’t order by the id; instead, we pick the ticket randomly to mitigate the failure. Therefore, we modify the original select statement to become:

A_ticket = `select id from tickets where remain = 1 and user = 'A' order by RAND() limit 1`

In this example, we don’t care about the drawback of order by RAND(); nevertheless, the random means it will sometimes go wrong. Worst of all, it may choose the ticket with a long validity period instead of the ticket is about to expire.

Requirement

Looking back at our needs, we need to draw a lottery “simultaneously” several times when the user has a few tickets, and use the tickets in order.

Final Solution

To achieve our goal, we can leverage Redis’ help. The purpose of Redis is to have a congestion control. In order to make all clients can succeed once, we have to dispatch the tickets to them correctly while highly concurrency occurs.

Bad approach

Usually, Redis is used to be a distributed optimistic lock to make sure the permission of a critical section, e.g.,

ret = `setnx lock 1`
if not ret:
raise Locked("already locked by others")
else:
do_something()
`del lock`

If an application can set a lock in Redis, which means he can dominate this time period until he finishes his job; on the other hand, everyone must wait for him. Solving the problem in this way is still not very helpful, at most it can only speed up the client’s retry, because the client doesn’t need to go through the database operation and the lottery procedures.

Good approach

To solve this problem, we want to support multi-entrance by using Redis. The process is to push all the current ticket numbers through LPUSH, and use RPOP to take out the first one.

Because Redis’ single command is atomic, you can ensure that the tickets are correctly allocated to each call that comes in at the same time.

Let’s take a simple psudο code as an example:

do:
A_owned_tickets = `select id from tickets where \
remain = 1 and user = 'A' order by id`
`LPUSH A_owned_tickets ${owned_tickets}`
A_ticket = `RPOP A_owned_tickets`
while A_ticket != nil
`START TRANSACTION`
`update tickets set remain = remain - 1 where id = A_ticket`
reward = bingo()
`insert into rewards (id, reward) values (${A_ticket}, ${reward})`
`COMMIT`
`DEL A_owned_tickets`

In the above example, we use uppercase to indicate Redis’ instructions. Let’s use the sequential diagram to get a deeper look at what’s going on.

The above is an ideal situation. Although A1 and A2 occur simultaneously, A1 is slightly earlier than A2, so there will be two complete sequences in Redis, [4, 3, 4, 3]. If A1 and A2 occur almost at the same time, it will be the following situation:

From the above figure, even if A1 and A2 occur almost simultaneously, it will not affect the accuracy. So what happens if A1 finishes its job faster, COMMIT the result and deletes the list?

We can see A2 got a nil at the first time, so he therefore had to retrieve the tickets from MySQL and push again to proceed the process. Even so, it is more efficient than the original commit failure after all the processes are run.

The worst case will happen when there is high concurrency, some clients are very fast and some clients are very slow, which will cause several retries and still fail. The flow as shown below.

If this is the case, you can adopt an alternative approach. Do not actively DEL the list, but use EXPIRE to delete. Choosing a reasonable timeout will relieve symptoms. However, it comes another problem, the validity of expired data. This is a trade-off, you can take the approach that suits your user scenario.

At last, if the amount of coming requests is greater than the owned tickets, the original schema design still works to reject the invalid operations(remain cannot be negative).

--

--

Chunting Wu
Chunting Wu

Written by Chunting Wu

Architect at SHOPLINE. Experienced in system design, backend development, and data engineering.

No responses yet