lockless producer/consumer queue

Khanmots · Apr 23, 2010

My initial thoughts were that it'd be a simple thing to implement (designed for a single provider and single consumer). And it seems like it is... with the little assumption that your memory read/writes execute as coded.

However, you can't make that assumption. Seems like compilers and CPUs both like to reorder read/writes, hell, even invent them.

At this point I'm thinking that I'm probably best off going ahead and using a locking data-transfer implementation for simplicity until it's proven that it's a bottleneck. But while that may be what I put into source control, this seems like a really interesting problem that I'd still like to toy with on my own time.

So, my question comes down to are volatile pointers and careful use of mb() enough to solve this problem? i.e., am I attempting an unsolvable task?

Tawnos · Apr 24, 2010

with a circular buffer & volatile pointers, you should be fine. Remember, a compiler will never optimize a volatile read in fron of a write.

Khanmots · Apr 24, 2010

Tawnos said:
with a circular buffer & volatile pointers, you should be fine. Remember, a compiler will never optimize a volatile read in fron of a write.

That places restrictions on the compiler. I'm not convinced it places them on the CPU. I'm also not convinced that it prevents the compiler from inventing instructions either. If it's as simple as volatile pointers I'd be quite happy... but I've got a lot of reading to do before I convince myself of that

pedant · Apr 24, 2010

Java concurrency in practice has a section on this sort of thing.

Example is http://www.cs.rochester.edu/research/synchronization/pseudocode/queues.html

pedant · Apr 24, 2010

Indeed, what language are you doing this in? It has already been written in java
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html

jimmyb · Apr 24, 2010

I had thought most modern x86 CPUs used re-order buffers to execute write backs in program order.

For out-of-order issue, I thought there was register renaming to avoid read hazards.

Out of curiosity, can you give an example of the CPU reordering you believe is causing problems?

Khanmots · Apr 26, 2010

pedant, I'm using C++
I would note that the pseudocode linked to from the article makes no mention if it was tested on an in-order or out-of-order processor. A little research shows that the SGI Challenge mentioned could be configured either way. However, I'd be tempted to say that it was configured with the R4000 in-order chips due to the fact that out-of-order execution was not addressed. Mayhaps that's not an issue in the JVM, but I know next to nothing about it.

jimmyb, my background in this area is rather limited. I know just enough to know that I need to be careful and learn more

However, just as an example, I could foresee something along the lines of:

Code:

queue[nextWrite]->store(data); //not atomic
publicQueue[nextWrite] = queue[nextWrite]; //atomic

being problematic if the update to the publicQueue pointer was made prior to the storing of the data. My understanding is that if the pointers are volatile that the compiler won't reorder them, but I don't believe that prevents the CPU from doing so. But now... I'll read up on re-order buffers.

Khanmots · Apr 29, 2010

Well, if anyone cares...

It looks like x86 is a really nice architecure with how things are shared between caches. Most of the time the only thing that you have to worry about being externally visible from another thread is it reordering a read prior to a write. Although in some cases some writes can be reordered ahead of others. I'm theorizing that a volatile variable is implemented with CLFLUSH and thus results in reorder issues, so I avoided volatile. On the x86 careful design let me avoid needing CPU-level memory barriers.

Compiler reordering on the other hand... that takes a bit of inline assembly. Namely:

Code:

asm volatile("" ::: "memory");

Nicely though, the same bit of inline assembly that creates the effect of a barrier for the compiler does what we need volatile to do. Not flush the cache line, but ensure that the compiler hasn't optimized away a read or write (or delayed too far) by keeping a value in a register assuming that it isn't changing.

On a side note regarding volatile variables. The only guarantee you have from the compiler is that it won't reorder volatiles past volatiles. It can still reorder non-volatile variable read/writes past volatile ones. Which makes for all sorts of messes. Turns out to be much better not to use them for thread synchronization, it's not what they're intended for, and they impose an unecessary performance hit.

jimmyb · Apr 30, 2010

Update appreciated. What were your resources for the information on x86 cache micro-architectures?

Khanmots · Apr 30, 2010

Mostly Intel documentation. Namely Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide

Section 8.1.1 details what's guaranteed atomic.
Section 8.2 details memory ordering; i.e., reordering of reads/writes and such.

lockless producer/consumer queue

Khanmots

Gawd

Tawnos

2[H]4U

Khanmots

Gawd

pedant

2[H]4U

pedant

2[H]4U

jimmyb

2[H]4U

Khanmots

Gawd

Khanmots

Gawd

jimmyb

2[H]4U

Khanmots

Gawd