I enjoy asking an interview question about threading and queues. Candidates usually attempt to solve the problem using one of two Windows synchronization primitives – either the Event or the Semaphore. Sadly, I see a lot more attempts using Events than Semaphores. Sometimes, I’ll ask, “which is faster on Windows, a semaphore or an event?” The answer is “it depends”, but most people don’t say that. Anyway, it is my opinion that Events are awful from a programming model as well as from a performance standpoint. Let’s find out.
To find out, I had to create a little benchmark. Of course, any benchmark is contrived, and this one is no different. Lets create a set of N producers and N consumers putting and getting from a queue. Then let’s implement basically the same queue using each of 3 types: a semaphore based queue, an event based queue (manual reset events) and an event based queue (auto reset events).
I ran each of these 3 algorithms with 8 concurrent consumers and producers through 20,000 iterations (I did 5 samples, threw out the hi/lo, etc etc):
|
Latency |
Max Queue Depth |
Semaphore |
284.6ms |
~10K |
Event (auto-reset) |
456.2ms |
~150K |
Event (manual reset) |
297.2ms |
~10K |
So the conclusion is that for this test, events and semaphores are pretty close. The semaphore does provide a slight (<10%) performance improvement over the event.
But it also shows that there is clearly a right and wrong way to use the event; auto-reset being bad, and manual-reset being good. I suspect this is basically due to the benchmark itself; the benchmark is greedy – the producers all add items to the queue as quickly as possible. Since usually there is something on the queue, the manual reset event doesn’t need to clear the event nearly as often. I weakly attempted to add a delay to mix things up, but even a modest delay (1ms) basically makes all of these algorithms behave identically unless you crank the thread count through the roof.
Next, I decided to increase the thread count of producers and consumers to 32 instead of 8. Now we get this:
|
Latency |
Max Queue Depth |
Semaphore |
949.6ms |
~20K |
Event (auto-reset) |
1615.6ms |
~600K |
Event (manual reset) |
621.8ms |
~165K |
Now all you Event fans will tell me to eat crow, clearly the Event is faster, right? Not quite! In this case, the manual-reset event code is cheating. Each time you add a single element to the queue, many consumers awaken because they all see the event as set; since its a greedy benchmark, this is like predicting the future – that an element will be ready by the time the consumer gets through the lock. It’s coincidental to the benchmark that it works. But it is interesting to see. Also note the max queue length now shows a different problem. The semaphore is reasonably “fair” compared to the Event based implementations.
I still dislike events, however. And if you don’t yet agree, take a look at the code. The Semaphore code is incredibly simple compared to the Event based code. (Is there a simplification I could have made?) There is no doubt in my mind that most experienced programmers will botch the Event based algorithm. I’m reasonably confident these solutions are correct, I wouldn’t be completely surprised if some smart reader posts that my implementation is flawed too.
There is one place where Events are useful; if you have a need for synchronization which changes only once (e.g. from set to unset), events are pretty cool. In the driver for this test program, I used an event to start the test after warming up all the threads. Events are great for that. Just don’t use them in loops ever!
Lastly, I’m sure I left out something important. If you’ve got another idea about this, please post a comment! But don’t bark about my lack of error checking or huge queue lengths. I did that on purpose.
Here is the source code to the Semaphore-based Queue:
#pragma once
#include <assert.h>
class QueueUsingSemaphore {
public:
QueueUsingSemaphore(int max_items) :
head_(0),
tail_(0),
count_items_(0),
max_items_(0),
queue_max_(max_items) {
items_ = new int[queue_max_];
InitializeCriticalSection(&lock_);
semaphore_ = CreateSemaphore(NULL, 0, queue_max_, NULL);
}
~QueueUsingSemaphore() {
delete items_;
assert(head_ == tail_);
printf("Max was %dn", max_items_);
}
int Get() {
int rv;
WaitForSingleObject(semaphore_, INFINITE);
EnterCriticalSection(&lock_);
assert(head_ != tail_);
rv = items_[head_];
head_ = (++head_) % queue_max_;
LeaveCriticalSection(&lock_);
return rv;
}
void Put(int val) {
EnterCriticalSection(&lock_);
items_[tail_] = val;
tail_ = (++tail_) % queue_max_;
assert(head_ != tail_); // Queue is overflowing.
int count = tail_ - head_;
if (count<0) count += queue_max_;
if (count > max_items_) { max_items_ = count; }
LeaveCriticalSection(&lock_);
ReleaseSemaphore(semaphore_, 1, NULL);
}
private:
HANDLE semaphore_;
CRITICAL_SECTION lock_;
int *items_;
int head_;
int tail_;
int count_items_;
int max_items_;
int queue_max_;
};
Here is the Event based code (manual reset):
#pragma once
#include <assert.h>
class QueueUsingManualEvent {
public:
QueueUsingManualEvent(const int max_items) :
head_(0),
tail_(0),
count_items_(0),
max_items_(0),
queue_max_(max_items) {
items_ = new int[queue_max_];
InitializeCriticalSection(&lock_);
event_ = CreateEvent(NULL, TRUE, FALSE, NULL); // Manual-reset event.
}
~QueueUsingManualEvent() {
delete items_;
assert(head_ == tail_);
printf("Max was %dn", max_items_);
}
int Get() {
bool got_one = false;
int rv;
while (!got_one) {
WaitForSingleObject(event_, INFINITE);
// Note: it is possible to get here and have the queue be empty.
EnterCriticalSection(&lock_);
if (head_ == tail_) {
LeaveCriticalSection(&lock_);
continue;
}
got_one = true;
rv = items_[head_];
head_ = (++head_) % queue_max_;
if (head_ == tail_)
ResetEvent(event_); // The queue is now empty.
LeaveCriticalSection(&lock_);
}
return rv;
}
void Put(int val) {
EnterCriticalSection(&lock_);
items_[tail_] = val;
tail_ = (++tail_) % queue_max_;
assert(head_ != tail_); // Queue is overflowing.
int count = tail_ - head_;
if (count<0) count += queue_max_;
if (count > max_items_) { max_items_ = count; }
// It's tempting to move this outside the lock. However,
// unlike the semaphore, this cannot be moved outside the
// lock because the threads are intentionally racing
// on the event notification.
SetEvent(event_);
LeaveCriticalSection(&lock_);
}
private:
HANDLE event_;
CRITICAL_SECTION lock_;
int *items_;
int head_;
int tail_;
int count_items_;
int max_items_;
int queue_max_;
};
Here is the source to the driver code:
#pragma once
#include "queue_semaphore.h"
#include "queue_event.h"
#include "queue_event_manual.h"
class Driver {
public:
Driver() :
queue_(kMaxItems) {
start_event_ = CreateEvent(NULL, TRUE, FALSE, NULL);
}
void StartThreads() {
for (int i=0; i<kMaxProducers; i++)
threads_[i] = CreateThread(NULL, 0, ProducerMain, this, 0, 0);
for (int i=0; kMaxConsumersi++)
threads_[kMaxProducers + i] = CreateThread(NULL, 0, ConsumerMain, this, 0, 0);
Sleep(500); // Let the threads get ready.
start_ticks_ = GetTickCount();
SetEvent(start_event_);
}
void WaitForFinish() {
for (int i=0; i<kMaxProducers + kMaxConsumers; i++)
WaitForSingleObject(threads_[i], INFINITE);
stop_ticks_ = GetTickCount();
}
int TimeTaken() {
return stop_ticks_ - start_ticks_;
}
void Producer() {
WaitForSingleObject(start_event_, INFINITE);
for(int i=0; i<kMaxIterations; i++)
queue_.Put(i);
}
void Consumer() {
WaitForSingleObject(start_event_, INFINITE);
for (int i=0; i<kMaxIterations; i++)
int x = queue_.Get();
}
private:
static DWORD WINAPI ProducerMain(void *self) {
Driver* driver = static_cast(self);
driver->Producer();
return 0;
}
static DWORD WINAPI ConsumerMain(void *self) {
Driver* driver = static_cast(self);
driver->Consumer();
return 0;
}
static const int kMaxProducers = 32;
static const int kMaxConsumers = 32;
static const int kMaxIterations = 20000;
static const int kMaxItems = (kMaxProducers + kMaxConsumers)*kMaxIterations;
HANDLE start_event_;
//QueueUsingSemaphore queue_;
//QueueUsingEvent queue_;
QueueUsingManualEvent queue_;
HANDLE threads_[kMaxProducers + kMaxConsumers];
int start_ticks_;
int stop_ticks_;
};