Project Nayuki


Concurrent programming with monitors

Introduction

A program is concurrent if it has multiple threads, each having its own state and executing its own sequence of instructions. When threads access shared data, the programmer needs to explicitly specify and constrain how the threads should interact so that they make progress (avoiding deadlock and livelock) and not corrupt data. A monitor is a conceptual tool that makes it relatively manageable to write correct, performant concurrent programs. The two key parts of a monitor are the lock (mutex) and the condition variable(s). This page is a tutorial that begins with motivations for concurrent programming, naive approaches that are wrong or slow, correct implementations using monitors, and finishes with a general framework for designing monitors.

It is natural to learn programming with a single thread, which follows one execution path. The vast majority of effort – in computer science education as well as in real-world applications – focuses on single-threaded programming. But certain advanced problems point to the need for multiple threads. If a computer has multiple CPUs (known as parallel processing), then each is executing its own thread by definition. Even if a computer has just one CPU, it is more natural to implement I/O management using threads instead of other approaches (event loop, continuation passing, explicit state machines, etc.).

Threads are a natural way to keep track of the progress of many work items. For example, a web server could start a new thread for each request it needs to process. The workflow of such a thread is to read the incoming request from the external party, read and write internal resources such as files and databases, and write the outgoing response to the external party. Each of the aforementioned steps could have arbitrary delays. Ideally, if one thread is stalled waiting for some action (e.g. waiting to read the entire request from the network, waiting for a disk to acknowledge a write), then the computer can switch to executing another thread that is ready.

A big challenge with using threads is that by default, the progression of each thread is independent of others. For example, the computer could choose to run thread A for 100 instructions and thread B for 1 instruction and return to thread A. As another example, any thread could be suspended at any point in time, while other threads continue running normally. Almost every data structure requires multiple instructions to update properly, and what we want is a way for a thread C to access the data while excluding every other thread from this data until thread C is done with it.

The example code on this page is primarily given in Java and secondarily in Python. If the Java code has a very simple translation to Python, then the Python code will not be explained. The code in the page text is actually pseudocode and may contain abbreviations that prevent it from being directly interpretable/compilable, but the code in linked files is fully runnable.


Contents


Thread management

This section covers the details on how to launch threads. After this section, thread management code will be omitted and it will be implied that threads are declared and launched appropriately in context.

Java

There are two low-level ways to launch a thread, and both involve java.lang.Thread somehow. One way is to define a zero-argument method and pass it as a Runnable:

class Example {	
	static void main() {
		print("Hello from main");
		Thread th = new Thread(Example::worker);
		th.start();
		print("Goodbye from main");
	}
	
	static void worker() {
		print("Hello from worker");
	}
}

Another way is to define your own subclass of Thread and override run():

class Example {
	static void main() {
		print("Hello from main");
		Thread th = new Worker();
		th.start();
		print("Goodbye from main");
	}
}

class Worker extends Thread {
	public void run() {
		print("Hello from worker");
	}
}

Among the two approaches, the former is more concise, but the latter makes it easier to convey unique values (e.g. thread ID) to each thread – by having the subclass declare fields and constructor parameters.

Higher level ways of managing threads are provided by the executor framework in the standard library.

Python

There is one low-level way to launch a thread, which can have any number of arguments:

import threading

def main():
	print("Hello from main")
	th = threading.Thread(target=worker, args=[42, "abc"])
	th.start()
	print("Goodbye from main")

def worker(arg0, arg1):
	print("Hello from worker")

Higher level ways of managing threads are provided by the executor framework in the standard library.

Locks/mutexes

As the simplest example, suppose that two threads want to increment a shared count variable. Perhaps we are rendering a big image, each thread works on one arbitrary pixel at a time, and we want to keep track of the total number of pixels processed among all threads.

Unsynchronized (wrong)

Other than managing the threads, we treat this as a sequential programming problem and see where it gets us.

Java pseudocode: (or full code)

int count = 0;  // Shared global variable

void main() {
	startThreads();
	waitAllThreadsFinish();
	print(count);
}

void worker() {
	for (int i = 0; i < ITERATIONS; i++)
		count++;
}

Python pseudocode: (or full code)

count = 0  # Shared global variable

def main():
	start_threads()
	wait_all_threads_finish()
	print(count)

def worker():
	global count
	for _ in range(ITERATIONS):
		count += 1

When we run the program, we might find the final count to be significantly less than the correct value. This is because usually, count++ is decomposed as a sequence of approximately three instructions:

temp = count
temp = temp + 1
count = temp

One undesirable execution that can happen is that thread A reads count, gets paused, thread B runs to completion, then thread A resumes and overwrites count – thus discarding all of the increments made by B.

Using a lock (correct)

What we want is that whenever one thread wants to update the count, it first acquires a lock for exclusive access, performs the read-modify-write operations, and then releases the lock. Note that if thread A currently holds a lock and thread B tries to take the same lock, then thread B waits (i.e. its execution is suspended) until thread A releases the lock. It is never possible for two threads to hold the same lock at the same time. A lock is also known as a mutex, short for mutual exclusion. The piece of code between acquiring and releasing a lock is known as a critical section.

Java pseudocode: (or full code)

// Shared global variables
var lock = new Lock();
int count = 0;

void main() {
	startThreads();
	waitAllThreadsFinish();
	print(count);
}

void worker() {
	for (int i = 0; i < ITERATIONS; i++) {
		lock.lock();
		count++;
		lock.unlock();
	}
}

Python pseudocode: (or full code)

# Shared global variables
lock = Lock()
count = 0

def main():
	start_threads()
	wait_all_threads_finish()
	print(count)

def worker():
	global count
	for _ in range(ITERATIONS):
		lock.acquire()
		count += 1
		lock.release()

Now the program behaves correctly, no matter how many threads you start, no matter how many CPUs you have.

Atomic (correct)

Some common problems have off-the-shelf solutions in the language’s standard library or easy-to-obtain third-party libraries. In this case, Java has an atomic integer class that is designed to have a thread-safe increment operation. Java pseudocode (or full code):

var count = new AtomicInteger(0);  // Shared global variable

void main() {
	startThreads();
	waitAllThreadsFinish();
	print(count.get());
}

void worker() {
	for (int i = 0; i < ITERATIONS; i++)
		count.incrementAndGet();
}

General rule on locks

If a particular variable is written to by one thread and read from by a different thread, then there needs to be a lock such that all threads go through that lock to access that variable. Note that a variable can be a compound entity comprising any number of primitive data units.

Ensuring unlocks

If a thread throws an exception while holding a lock, it is important to release the lock so that other threads can enter the critical section.

In Java, this is accomplished with try-finally:

lock.lock();
try {
	... your logic ...
} finally {
	lock.unlock();
}

In Python, try-finally is correct, but with-statements are more concise:

# Verbose way
lock.acquire()
try:
	... your logic ...
finally:
	lock.release()

# Concise way
with lock:
	... your logic ...

Condition variables

Some problems can be solved with just locks, but advanced problems will require a new concept. Suppose we have a flag variable that many workers are watching and waiting for it to change. Perhaps the workers performed processing for varying lengths of time, started watching the flag at different times, and are waiting to be released simultaneously.

Check and sleep (slow)

In the workers, we have to make a tradeoff. If a worker checks the flag frequently, then it reduces the time between the flag getting set and the worker observing that the state changed. But the frequent checking is useless work, wasting energy and potentially reducing the performance of other threads that have real work to do. This approach is known as busy-waiting or spin-looping.

Java pseudocode: (or full code)

var lock = new Lock();
boolean flag = false;

void main() {
	startThreads();
	delay();
	lock.lock();
	print("Set true");
	flag = true;
	lock.unlock();
}

void worker() {
	while (true) {
		lock.lock();
		try {
			if (flag)
				break;
		} finally {
			lock.unlock();
		}
		print("Get false");
		delay();
	}
	print("Get true");
}

Python pseudocode: (or full code)

lock = Lock()
flag = False

def main():
	start_threads()
	delay()
	with lock:
		print("Set true")
		global flag
		flag = True

def worker():
	while True:
		with lock:
			if flag:
				break
		print("Get false")
		delay()
	print("Get true")

Using a condition variable (ideal)

Suppose a thread enters a critical section (holding a lock) and reads a variable. If the thread considers the value to be unsuitable, then the thread wants to wait until the value changes before rereading the value. A condition variable is a mechanism that lets writers notify waiting readers. A CV is always tied to a lock, and a thread wishing to wait on a CV must currently hold the associated lock. When a thread starts to wait/await on a CV, it atomically releases the lock and waits until some other thread calls notify/signal/pulse or notify-all/signal-all/pulse-all/broadcast on the CV.

Java pseudocode: (or full code)

var lock = new Lock();
Condition cond = lock.newCondition();
boolean flag = false;

void main() {
	startThreads();
	delay();
	lock.lock();
	print("Set true");
	flag = true;
	cond.signalAll();
	lock.unlock();
}

void worker() {
	lock.lock();
	try {
		while (!flag) {
			print("Get false");
			cond.await();
		}
	} finally {
		lock.unlock();
	}
	print("Get true");
}

Python pseudocode: (or full code)

lock = Lock()
cond = Condition(lock)
flag = False

def main():
	start_threads()
	delay()
	with lock:
		print("Set true")
		global flag
		flag = True
		cond.notify_all()

def worker():
	with lock:
		while not flag:
			print("Get false")
			cond.wait()
	print("Get true")

Why does a thread need to hold the lock before waiting on a condition variable? Suppose it is not required, and we write code like this:

int globalVal = (...);

void worker() {
	while (true) {
		lock.lock();
		int localVal = globalVal;
		lock.unlock();
		if (localVal == desired)
			break;
		cond.await();
	}
}

What can happen is that the worker thread could get suspended just before cond.await(), some other thread could alter globalVal and call cond.signal(), but the worker is not yet waiting so the signal() does nothing, and then the worker resumes and sits at await() forever.

Loops and spurious wake-ups

When using a condition variable, it is tempting to perform the check-and-wait as an if-statement (wrong):

lock.lock();
if (value != desired)
	cond.await();
lock.unlock();

But the above code is incorrect. Most language specifications allow for spurious wake-ups, where await() returns even if no other thread called signal() or signalAll(). Also, it is permissible for more than one thread to return from await() even if another thread called signal() once. Furthermore, even without external factors, the application itself could be designed to signal more threads than necessary as a way to simplify the overall logic.

For these reasons, we must always check the predicate in a loop, like this (correct):

lock.lock();
while (value != desired)
	cond.await();
lock.unlock();

Visual model of a monitor

Monitors are much easier to understand as a visual diagram instead of just being explained in words. Let each thread be represented by a circle in the diagram, and view each thread as a person that moves from place to place and holds internal state variables. Now, each monitor is a set of rooms with specific doors between rooms, with each room and door having certain rules:

  • The default state of a thread is to be outside all monitors.

  • When a thread calls lock.lock(), it immediately goes through the “enter” door and arrives in the entrance waiting room, and the call waits until certain criteria are met.

  • The call to lock.lock() returns successfully as soon as the main room is empty and a particular thread is chosen among all the threads in the entrance waiting room. That thread goes through the “e” door and arrives inside the main room.

  • The monitor’s main room is a mutual exclusion (mutex) area, which can host either zero threads or one thread, not more. A thread in the mutex can freely read and write any variables that are placed in this room; these variables are considered to be “protected” by the lock/monitor and no outsider can touch them.

  • When a thread inside the mutex calls cond.await(), it goes through the “wait” door and arrives in the condition waiting room. Not all problems need this feature, so not all threads perform this operation.

  • When a thread in the mutex calls cond.signal(), then one thread in the condition waiting room is chosen to go through the “notified” door into the entrance waiting room.

  • Similarly, when a thread in the mutex calls cond.signalAll(), then all threads in the condition waiting room are made to go through the “notified” door into the entrance waiting room.

  • Note that calls to notify/signal are not saved for future threads that have not yet entered the condition waiting room. For example, calling cond.signal() when the condition waiting room is empty has no effect, and calling cond.signal() twice when the condition waiting room has a single thread does not exempt the next thread entering the condition waiting room from having to wait.

  • When a thread calls lock.unlock(), it goes through the “leave” door and is no longer in the monitor. If there are any threads in the entrance waiting room, then one is immediately chosen to go through the “e” door into the mutex.

Note that this visualization is for threads interacting with a single monitor. There isn’t a good visual analogy when threads need to hold multiple locks at the same time.

Class encapsulation

A monitor is composed of a lock, condition variable(s), and data variables protected by the lock/mutex. It is natural to put all of these items together in a class, hide the items in private fields, and implement methods appropriate to the problem being solved.

As an exercise, let’s implement a concurrency primitive known as a barrier. A barrier makes threads who join it wait until a certain number of threads is accumulated. In other words, a barrier with an initial count of n will make the first n−1 threads joining the barrier wait, and then the next thread to join will release all the threads (including itself).

Here is how we will implement it: A barrier is initialized with a count, which is stored in a variable that is protected by a monitor. Whenever a thread joins the barrier, it enters the monitor and looks at the count. If the count is zero or negative, it is considered an error. Otherwise, the thread decrements the count. If the new count is zero, then the thread notifies all waiting threads and returns immediately; otherwise, it waits on the condition variable.

Extrinsic monitors

Java pseudocode: (or full code)

class Barrier {
	private Lock lock = new Lock();
	private Condition cond = lock.newCondition();
	private int count;
	
	public Barrier(int initCount) {
		count = initCount;
	}
	
	public void join() {
		lock.lock();
		try {
			if (count <= 0)
				throw new IllegalStateException();
			count--;
			if (count == 0)
				cond.signalAll();
			else {
				while (count > 0)
					cond.await();
			}
		} finally {
			lock.unlock();
		}
	}
}

Python pseudocode: (or full code)

class Barrier:
	def __init__(self, initcount):
		self.lock = Lock()
		self.cond = Condition(self.lock)
		self.count = initcount
	
	def join(self):
		with self.lock:
			if self.count <= 0:
				raise RuntimeError()
			self.count -= 1
			if self.count == 0:
				self.cond.notify_all()
			else:
				while self.count > 0:
					self.cond.wait()

Intrinsic monitors (Java-only)

The Java programming language has an unusual feature where every object has its own intrinsic monitor, consisting of a lock and one condition variable. This makes it easier and more concise to design classes that perform custom concurrency control, like the examples on this page. The barrier example can be rewritten to use an intrinsic monitor like this – Java pseudocode (or full code):

class Barrier {
	private int count;
	
	public Barrier(int initCount) {
		count = initCount;
	}
	
	public synchronized void join() {
		if (count <= 0)
			throw new IllegalStateException();
		count--;
		if (count == 0)
			notifyAll();
		else {
			while (count > 0)
				wait();
		}
	}
}

Comparing intrinsic monitors (defined in the Java language as part of every value that descends from java.lang.Object – excluding primitive types) versus extrinsic monitors provided by classes in the standard library package java.lang.concurrent.locks:

  • To lock an intrinsic monitor, write synchronized (obj) {, which is a language-level syntax. To lock an extrinsic monitor, write lock.lock(), which is a method call.

  • To unlock an intrinsic monitor, write }, which is the matching closing brace, and cannot be omitted without causing a compile-time syntax error. To unlock an extrinsic monitor, call lock.unlock(), which needs to be done carefully in all execution paths, taking care when there are early returns, multiple returns, and potential thrown exceptions; the easiest way to ensure consistent unlocking is to use try-finally blocks.

  • If a method declaration contains the keyword synchronized, then it behaves as if the entire body is wrapped in synchronized (obj) { ... }, where obj is this if the method is an instance method; otherwise for static methods obj is the immediate enclosing class’s object (e.g. MyThing.class). For example, public synchronized void foo() { ... } is a shorter form of public void foo() { synchronized(this) { ... } }.

  • An intrinsic monitor’s condition variable methods are named wait(), notify(), and notifyAll(), and these are final and not overridable. The CV methods of an extrinsic monitor are named await(), signal(), and signalAll(). Note that external monitors are themselves objects, so they still possess intrinsic monitor CV methods, thus forcing the true external CV methods to be named differently. Moreover, the intrinsic monitor lock and methods of extrinsic monitor objects are technically usable, but it is confusing and counterproductive to use them, so this should never be done.

  • Each intrinsic monitor has exactly one condition variable, whether it is used or not. Each extrinsic monitor can have zero or more condition variables, and this can be very useful for both clarifying logic and improving performance.

  • When synchronized on the intrinsic monitor of this object, the this. prefix can be omitted when calling wait()/notify()/notifyAll(), as per the usual language rules.

The fact that Java has a monitor associated with each object is a rare feature that very few languages share (e.g. C#). Java was designed in the early 1990s when few programming languages included concurrency at the language level instead of solely relying on libraries, thus the design does not have the benefit of decades of accumulated wisdom on what works well and poorly in practice. Intrinsic monitors in Java are a questionable feature because the overwhelming majority of classes and objects do not participate in concurrent operations or management, and having a monitor available for each object introduces a mandatory memory overhead. To make matters worse, some classes such as java.lang.StringBuffer and java.util.Vector are designed with every method as synchronized, so even in a single-threaded context, every call to a method of an object of one of these types would incur a run-time overhead to lock and unlock the monitor. Fortunately, standard library classes designed later are essentially all non-synchronized, and such versions of the aforementioned two classes exist as java.lang.StringBuilder and java.util.ArrayList respectively.

Notify vs. notify all

The choice of whether to use notify() versus notifyAll() requires careful consideration, and here is a motivating set of examples. A semaphore holds a non-negative count, which is intended to represent how much of some resource is currently available. The user of a semaphore can increment the count by some amount, which releases waiting threads and always succeeds immediately. The user of a semaphore can also decrement the count by some amount, waiting as long as the old count is less than the desired amount.

Single increment/decrement

This is straightforward and similar to the barrier example. It is sufficient for increment() to use notify() because if the count was zero, then the increment would make exactly one waiting thread (if any threads are waiting at all) eligible to continue. If instead notifyAll() was used, then all waiting threads would wake up, but it is possible for all but one of them to fail the check and go back to waiting, which wastes CPU time.

Java pseudocode: (or full code)

class Semaphore {
	private int count = 0;
	
	public synchronized void increment() {
		count++;
		notify();
	}
	
	public synchronized void decrement() {
		while (count == 0)
			wait();
		count--;
	}
}

Python pseudocode: (or full code)

class Semaphore:
	def __init__(self):
		self.lock = Lock()
		self.cond = Condition(self.lock)
		self.count = 0
	
	def increment(self):
		with self.lock:
			self.count += 1
			self.cond.notify()
	
	def decrement(self):
		with self.lock:
			while self.count == 0:
				self.cond.wait()
			self.count -= 1

Multiple increment/decrement – notify (wrong)

If we generalize increment and decrement to change the count by more than one, then keeping the notify can cause problems. For example, if the count was 0, and two threads are waiting to decrement by 1, and some thread increments by 2, then 2 threads should be released, but only 1 thread actually gets released.

Java pseudocode: (or full code)

class Semaphore {
	private int count = 0;
	
	public synchronized void increment(int amount) {
		count += amount;
		notify();
	}
	
	public synchronized void decrement(int amount) {
		while (count < amount)
			wait();
		count -= amount;
	}
}

Python pseudocode: (or full code)

class Semaphore:
	def __init__(self):
		self.lock = Lock()
		self.cond = Condition(self.lock)
		self.count = 0
	
	def increment(self, amount):
		with self.lock:
			self.count += amount
			self.cond.notify()
	
	def decrement(self, amount):
		with self.lock:
			while self.count < amount:
				self.cond.wait()
			self.count -= amount

Multiple increment/decrement – notify all (correct)

To fix the problem, it suffices to change notify to notify all. This way, after an increment operation, each waiting decrementer will wake up and recheck if the count is high enough for that decrementer to continue.

Java pseudocode: (or full code)

class Semaphore {
	private int count = 0;
	
	public synchronized void increment(int amount) {
		count += amount;
		notifyAll();
	}
	
	public synchronized void decrement(int amount) {
		while (count < amount)
			wait();
		count -= amount;
	}
}

Python pseudocode: (or full code)

class Semaphore:
	def __init__(self):
		self.lock = Lock()
		self.cond = Condition(self.lock)
		self.count = 0
	
	def increment(self, amount):
		with self.lock:
			self.count += amount
			self.cond.notify_all()
	
	def decrement(self, amount):
		with self.lock:
			while self.count < amount:
				self.cond.wait()
			self.count -= amount

Multiple increment/decrement – all notify (correct)

Another way to fix the problem is to put notify at the end of both increment and decrement. This has the added benefit that as each decrementer wakes up, if the count is too low then it will go back to waiting and not wake up the next decrementer.

Java pseudocode: (or full code)

class Semaphore {
	private int count = 0;
	
	public synchronized void increment(int amount) {
		count += amount;
		notify();
	}
	
	public synchronized void decrement(int amount) {
		while (count < amount)
			wait();
		count -= amount;
		notify();
	}
}

Python pseudocode: (or full code)

class Semaphore:
	def __init__(self):
		self.lock = Lock()
		self.cond = Condition(self.lock)
		self.count = 0
	
	def increment(self, amount):
		with self.lock:
			self.count += amount
			self.cond.notify()
	
	def decrement(self, amount):
		with self.lock:
			while self.count < amount:
				self.cond.wait()
			self.count -= amount
			self.cond.notify()

General template

Here is the most general template for implementing a monitor to protect a group of data variables and to allow waiting on a single condition variable for that data to meet certain conditions. Java pseudocode:

class ConcurrentData {
	// One or more data fields, which can have
	// substructures and graphs of child objects
	private int x;
	private String y;
	private List<Long> z;
	
	// Simple getter method (still must hold lock)
	public synchronized int getX() {
		return x;
	}
	
	// Simple setter method
	public synchronized void setX(int val) {
		x = val;
		// Other threads might be waiting for data to change
		notifyAll();
	}
	
	// The most general method
	public synchronized Object foo(Object param) {
		while (... some Boolean condition involving data fields ...)
			wait();
		... read/write data fields, do calculations ...
		notifyAll();  // Wake up threads so they recheck fields
		return (... some value ...);
	}
}

Python pseudocode:

class ConcurrentData:
	def __init__(self):
		self.lock = Lock()
		self.cond = Condition(self.lock)
		# One or more data fields, which can have
		# substructures and graphs of child objects
		self.x = (...)
		self.y = (...)
		self.z = (...)
	
	# Simple getter method
	def get_x(self):
		with self.lock:  # Still must hold lock
			return self.x
	
	# Simple setter method
	def set_x(self, val):
		with self.lock:
			self.x = val
			# Other threads might be waiting for data to change
			self.cond.notify_all()
	
	# The most general method
	def foo(self, param):
		with self.lock:
			while (... some Boolean condition involving data fields ...):
				self.cond.wait()
			... read/write data fields, do calculations ...
			self.cond.notify_all()  # Wake up threads so they recheck fields
			return (... some value ...)

The general template is always logically correct, so it is a good starting point for solving any concurrency problem. But the code is not necessarily the most efficient. For example, some methods have no condition to check and wait for at the beginning, so that code can be deleted outright; some methods don’t cause a change that requires calling notify; some methods need a notify but not notify all; some problems can be expressed better with multiple condition variables to avoid waking up (notifying) irrelevant threads.

Multiple condition variables

So far, the examples above needed at most one condition variable to solve the problem. Here is an example to motivate the use of more than one CV. A bounded queue can store only a finite number of data values, and allows any number of producers and consumers to interact with it. When the queue is empty, consumers need to wait. When the queue is full, producers need to wait.

One condition variable (slow)

With a single condition variable, there is no way to distinguish why a thread is waiting – it might be a producer or a consumer. So whenever a thread changes the queue’s state, it needs to notify and awaken all waiters, which potentially wastes CPU time.

Java pseudocode: (or full code)

class Queue<E> {
	private int limit;
	private Lock lock = new ReentrantLock();
	private Condition cond = lock.newCondition();
	private LinkedList<E> data = new LinkedList<>();
	
	public Queue(int lim) {
		limit = lim;
	}
	
	public void add(E val) {
		lock.lock();
		while (data.size() == limit)
			cond.await();
		data.add(val);
		cond.signalAll();
		lock.unlock();
	}
	
	public E remove() throws InterruptedException {
		lock.lock();
		while (data.isEmpty())
			cond.await();
		E result = data.remove();
		cond.signalAll();
		lock.unlock();
		return result;
	}
}

Python pseudocode: (or full code)

class Queue:
	def __init__(self, limit):
		self.lock = Lock()
		self.cond = Condition(self.lock)
		self.limit = limit
		self.data = deque()
	
	def add(self, val)
		with self.lock:
			while len(self.data) == self.limit:
				self.cond.wait()
			self.data.append(val)
			self.cond.notify_all()
	
	def remove(self):
		with self.lock:
			while len(self.data) == 0:
				self.cond.wait()
			result = self.data.popleft()
			self.cond.notify_all()
			return result

Two condition variables (ideal)

The ideal way to solve this problem is with two condition variables: One that gets notified when a producer can proceed, and one that gets notified when a consumer can proceed. After a thread adds a value, it notifies the can-remove CV; after a thread removes a value, it notifies the can-add CV.

Java pseudocode: (or full code)

class Queue<E> {
	private int limit;
	private Lock lock = new ReentrantLock();
	private Condition canAdd = lock.newCondition();
	private Condition canRemove = lock.newCondition();
	private LinkedList<E> data = new LinkedList<>();
	
	public Queue(int lim) {
		limit = lim;
	}
	
	public void add(E val) {
		lock.lock();
		while (data.size() == limit)
			canAdd.await();
		data.add(val);
		canRemove.signal();
		lock.unlock();
	}
	
	public E remove() throws InterruptedException {
		lock.lock();
		while (data.isEmpty())
			canRemove.await();
		E result = data.remove();
		canAdd.signal();
		lock.unlock();
		return result;
	}
}

Python pseudocode: (or full code)

class Queue:
	def __init__(self, limit):
		self.lock = Lock()
		self.can_add = Condition(self.lock)
		self.can_remove = Condition(self.lock)
		self.limit = limit
		self.data = deque()
	
	def add(self, val)
		with self.lock:
			while len(self.data) == self.limit:
				self.can_add.wait()
			self.data.append(val)
			self.can_remove.notify()
	
	def remove(self):
		with self.lock:
			while len(self.data) == 0:
				self.can_remove.wait()
			result = self.data.popleft()
			self.can_add.notify()
			return result

Miscellaneous

Happens-before relationships

If there is a happens-before relationship from action X to action Y, then in every valid program execution, action Y will see all the effects of action X. A simple example is that within any individual thread, an action happens-before any action that comes later in program order. For multi-threaded programs and any single lock, actions performed in a thread just prior to unlocking happen-before actions performed in a thread just after a subsequent locking. Monitors are a tool – but not the only tool – to create, manage, and reason about happens-before relationships in concurrent programs.

Monitor variations

This article is based on Mesa monitors, the type used in the majority of programming languages, presumably due to lower implementation costs. A major variation is the Hoare monitor, where a thread calling notify will immediately and temporarily exit the mutex and let a notified thread (if any) immediately enter the mutex; this ensures that the predicate holds true when the notified thread is in the mutex. Other variations can arise from the policy of how to select one thread from multiple threads that are eligible to enter the mutex.

Low-level implementation concerns

Generally speaking, locks and condition variables cannot be implemented by using just memory read/write, arithmetic, and conditional control flow instructions. This is largely because both the programming language and the CPU assume that programs are single-threaded by default, and can perform optimizations like reordering memory reads/writes, hoisting redundant reads out of loops, etc. In practice, locks are implemented with the help of low-level CPU instructions like memory fences, atomic compare-and-swap, and load-linked/store-conditional.

Non-blocking algorithms

Some data structures can be designed to be used concurrently without locks, thus avoiding problems like priority inversion and deadlock. However, this paradigm might not have a solution to every problem and requires more care when ordering operations – whereas monitors appear to be universally applicable and designing the logic inside a monitor is straightforward because other threads are excluded from seeing any broken intermediate states.

Message passing

The monitor paradigm fundamentally involves mutual exclusion around shared variables in memory. A different major paradigm is based on message passing, where each thread conceptually communicates with other threads by sending and receiving messages, without using locks or shared memory. Variants of message passing include communicating sequential processes (CSP) and channels.

Global interpreter lock

The primary implementation of the Python programming language, CPython, has several features that make it a poor language for learning about concurrent programming. Each Python interpreter instance has a single global interpreter lock (GIL) and hosts one or more threads. The GIL allows at most one thread to be executing Python code at any given time (though functions implemented in C can optionally temporarily release the lock). This also means that one thread executing a sequence of writes (e.g. x = 1; y = 2) can never have those writes be observed out of order in another thread (e.g. x != 1 and y == 2).

Furthermore, the vast majority of basic Python operations are implemented in C and cannot be interrupted – for example, list.append() will never be observed in a half-broken state in another thread. By contrast, java.util.ArrayList.add() is implemented in ordinary Java code that is executed by the virtual machine (not being implemented in C/C++ in the virtual machine itself), and it is possible for other threads to observe the arraylist object in a half-broken state when one thread is executing add() (regardless whether this thread is suspended or not, as the JVM has no GIL and allows true parallel execution).

Because of these considerations, it is possible to write subtly incorrect concurrent Python code that appears to always execute correctly in CPython, but causes rare and disastrous data corruption / bugs when ported to a more powerful and less forgiving language like Java, C, etc. Other languages possessing a GIL such as Ruby suffer from the same problems involving pedagogy, correctness, and performance.

More info