Announcements

Hello. It's been a long time. I have translated many of my latest projects and documents. I have hidden the all Turkish content except the document about the A* algorithm. I will translate it but I decided to keep it for a while because of the time issue. Also, there are a lot of old projects of mine are waiting to be translated.

Thanks.

In this article, I will talk about thread-safety and a specific but common case of it. I won’t cover the thread-safety concepts in detail and I will briefly explain the related ones only. This article considers that the reader has some knowledge about thread-safety, operating system concepts and computer architecture.

Protecting the data from race condition is a complex job and there are various solutions. Generic thread-safety mechanisms (mutex, etc) work great but their performance impact on the system may be huge. In some cases, it is possible to develop lock-free algorithms or disable the locks, temporarily. Developing a customized thread-safety solution according to your needs, doesn’t only protect your data from race condition, it saves your system resources, speeding up your operations as well.

For example, we will design a simple firewall that checks if the IP address of the client is allowed. Multiple threads will be executed and they will use a shared whitelist for this operation. We will keep the whitelist in a database and we won’t query it for every request. It is quite expensive. So, we will cache the data and update it (timely or depending on a trigger) ; a trigger would be better.

CustomContainer is our container for storing the allowed IP addresses. Consider it in a global or class scope and not thread-safe. Threads will share this container and IP address checker functions will only read it.

IsAllowedIP is our thread function. I won’t add exception handling to make the code simple as much as possible in the examples.

1
2
3
4
5
6
7
8
9
CustomContainer<string> httpServerAccessList;
 
// Thread function without synchronization
bool IsAllowedIP(const string& ipAddress) {
  if(httpServerAccessList.find(ipAddress) != httpServerAccessList.end())
    return true;
 
  return false;
}

and UpdateData is another thread function that is executed by another thread.

1
2
3
void UpdateData() {
  httpServerAccessList = GetUpdatedList();
}

In short, UpdateData function updates the whitelist and IsAllowedIP tries to read it at the same time and the find function of CustomContainer will show undefined behavior because of modification.

Now, we will put a mutex and see what happens.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
CustomContainer<string> httpServerAccessList;
mutex mtx;
 
bool IsAllowedIP(const string& ipAddress) {
  // Lock_guard automatically releases the lock on destruction
  lock_guard<mutex> lck(mtx);
 
  if(httpServerAccessList.find(ipAddress) != httpServerAccessList.end())
    return true;
 
  return false;
}
 
void UpdateData() {
  lock_guard<mutex> lck(mtx);
  httpServerAccessList = GetUpdatedList();
}

We solved the race condition problem but the performance will be pretty bad. Only one IsAllowedIP thread can read httpServerAccessList. If we solve this problem in this way, using multi-threading is useless.

mtx is our mutual exclusion object in global scope and lock_guard is a mutex wrapper. We create it in a function's local scope. So, its destructor releases the lock automatically when the function ends. Every thread has its own stack. So, it is not shared. mtx is the same but lck is different for all threads.

We need an improvement and I would like to start with mutex. Because mutex is not performance efficient. If a thread tries to acquire a mutex and can’t, goes to sleep immediately. So, overhead is huge and this impacts the performance. Also, its performance differs significantly according to the low-level implementations. For example, boost::mutex is better than std::mutex on Windows but std::mutex is better on Linux.

So, we could use spinlock. Spinlock is a good alternative for avoiding context switch overhead. It is a loop to keep the thread alive. The operation must be atomic. The thread will run until the context switch occurs on time interrupt. It won’t go to the sleep, immediately.

A spinlock in C++.

1
2
3
4
5
6
7
8
9
10
// Defined in global scope
atomic_flag lock = ATOMIC_FLAG_INIT;
 
// ...
while(lock.test_and_set(memory_order_acquire));
// ...
// Critical code
// ...
lock.clear(memory_order_release);
// ...

As it seems, there is a while loop and an atomic operation. It atomically tests and sets the state of lock. The compiler will generate atomic CPU instructions. All CPU instructions are not atomic and today’s modern CPUs have microcodes. Even the INC instruction is not atomic. Because CPU reads the value, increases it and stores. XCHG instruction is atomic and used for implementing spinlock in low level. Thanks to atomic instructions, we are also able to implement lock-free algorithms in software level. But there is actually some locking mechanism on the hardware. LOCK prefix is used for atomic operations and it is redundant for XCHG. LOCK puts a memory barrier for synchronization. But it is not like putting a mutex and the overhead differs according to the architecture. It is pretty efficient hardware operation but, of course, you shouldn’t use it in everywhere.

Now let’s continue,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
CustomContainer<string> httpServerAccessList;
atomic_flag lock = ATOMIC_FLAG_INIT;
 
bool IsAllowedIP(const string& ipAddress) {
  // There is no need to protect a local variable
  bool result = false;
 
  // Lock
  while(lock.test_and_set(memory_order_acquire));
 
  if(httpServerAccessList.find(ipAddress) != httpServerAccessList.end())
    result = true;
 
  // Unlock
  lock.clear(memory_order_release);
 
  return result;
}
 
void UpdateData() {
  while(lock.test_and_set(memory_order_acquire));
  
  httpServerAccessList = GetUpdatedList();     
  
  lock.clear(memory_order_release);
}

We replaced the mutex with a spinlock but it is like the other code. Only one IsAllowedIP thread will read httpServerAccessList. It is almost the same thing in a different color.

In addition to these, mostly, there is no need to use a complex thread-safety locking mechanism in this case. IsAllowedIP threads only read the data. They don’t modify it and UpdateData function is executed, very rarely. If UpdateData doesn’t run, race condition is not a problem. So, we could disable the spinlock, temporarily and we could enable it only when necessary.

So, we are changing the code,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
CustomContainer<string> httpServerAccessList;
atomic_flag lock = ATOMIC_FLAG_INIT;
bool isEnabled = false;
 
bool IsAllowedIP(const string& ipAddress) {
  bool result = false;
 
  // Lock
  if(isEnabled)
    while(lock.test_and_set(memory_order_acquire));
 
  if (httpServerAccessList.find(ipAddress) != httpServerAccessList.end())
    result = true;
 
  // Unlock
  if(isEnabled)
    lock.clear(memory_order_release);
 
  return result;
}
 
void UpdateData() {
  isEnabled = true;
  while(lock.test_and_set(memory_order_acquire));
      
  httpServerAccessList = GetUpdatedList();
  
isEnabled = false;   lock.clear(memory_order_release); }

And now, we enable and disable the spinlock mechanism depending on the UpdateData function. But, there is a critical problem in this code. Before mentioning it, I would like to talk about branch penalty. Because branch penalty affects the pipeline performance. So, using conditions (if, while, etc) may decrease the performance. But, all modern CPUs have branch prediction mechanisms and they can eliminate the branch penalties. Our conditions are extremely simple to be predicted.

I just want to talk about the critical problem now. We enable and disable the spinlock but context switch may occur when IsAllowedIP is not completed yet.

I mean,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
bool IsAllowedIP(const string& ipAddress) {
  bool result = false;
 
  // Lock 
  if(isEnabled)
    while(lock.test_and_set(memory_order_acquire));
  
  //WE ARE HERE =>
  if(httpServerAccessList.find(ipAddress) != httpServerAccessList.end())
    result = true;
 
  if(isEnabled)
    lock.clear(memory_order_release);
 
  return result;
}

UpdateData function will take the control, modify httpServerAccessList and when IsAllowedIP continues, a problem will occur. So, we must be sure about all IsAllowedIP threads are done.

Therefore, we must add a counter to track thread activity. 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
CustomContainer<string> httpServerAccessList;
atomic_flag lock = ATOMIC_FLAG_INIT;
bool isEnabled = false;
atomic<int> activeThreads {0};
 
bool IsAllowedIP(const string& ipAddress) {
  bool result = false;
 
  // Lock
  if(isEnabled)
    while(lock.test_and_set(memory_order_acquire));
 
  activeThreads++;
 
  if(httpServerAccessList.find(ipAddress) != httpServerAccessList.end())
    result = true;
 
  // Unlock
  if(isEnabled)
    lock.clear(memory_order_release);
 
activeThreads--;
  return result; }   void UpdateData() {   isEnabled = true;   while(activeThreads > 0);     while(lock.test_and_set(memory_order_acquire));          httpServerAccessList = GetUpdatedList();     
isEnabled = false;   lock.clear(memory_order_release); }

activeThreads is an atomic variable and it works in the same manner on the hardware. It is lock-free but all atomic variables may not be lock-free. You can use is_lock_free function to check this. IsAllowedIP threads increase, decrease it and a loop in UpdateData function checks it before acquiring the lock. UpdateData runs, very rarely (maybe in an hour, day, month, etc.). So, we don’t need to do excessive optimizations on it.

Additionally, you might think that what if lock was already unlocked and isEnabled is true. It won’t be an issue. Because it just sets the flag. It is safe to call.

After all of these, I want to create a custom spinlock class to make the code neater and finish this text.

Regards.

CustomSpinlock.hpp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#pragma once
 
#include <atomic>
 
class CustomSpinlock final {
 public:
  CustomSpinlock() = default;  
  ~CustomSpinlock() = default;
 
  void Lock();
  void Release();
 
  void TakeControl();
  void GiveControl();
 
 private:
  bool _isEnabled = false;
  std::atomic_flag _lock = ATOMIC_FLAG_INIT;
  std::atomic<int> _activeThreads {0};

CustomSpinlock.cpp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include "CustomSpinlock.hpp"
 
void CustomSpinlock::Lock() {
  if(_isEnabled)
    while(_lock.test_and_set(std::memory_order_acquire));
  
  _activeThreads++;	
}
 
void CustomSpinlock::Release() { 
  if(_isEnabled)
    _lock.clear(std::memory_order_release);

_activeThreads--; }   void CustomSpinlock::TakeControl() {   _isEnabled = true;   while(_activeThreads > 0);     while(_lock.test_and_set(std::memory_order_acquire)); }   void CustomSpinlock::GiveControl() {
_isEnabled = false;   _lock.clear(std::memory_order_release); }

Usage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
CustomContainer<string> httpServerAccessList;
CustomSpinlock spinlock;
 
bool IsAllowedIP(const string& ipAddress) {
  bool result = false;
 
  spinlock.Lock();
 
  if(httpServerAccessList.find(ipAddress) != httpServerAccessList.end())
    result = true;
 
  spinlock.Release();
 
  return result;
}
 
void UpdateData() {
  spinlock.TakeControl();
      
  httpServerAccessList = GetUpdatedList();
      
  spinlock.GiveControl();
}