• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

lockless producer/consumer queue

Khanmots

Gawd
Joined
May 12, 2007
Messages
905
My initial thoughts were that it'd be a simple thing to implement (designed for a single provider and single consumer). And it seems like it is... with the little assumption that your memory read/writes execute as coded.

However, you can't make that assumption. Seems like compilers and CPUs both like to reorder read/writes, hell, even invent them.

At this point I'm thinking that I'm probably best off going ahead and using a locking data-transfer implementation for simplicity until it's proven that it's a bottleneck. But while that may be what I put into source control, this seems like a really interesting problem that I'd still like to toy with on my own time.

So, my question comes down to are volatile pointers and careful use of mb() enough to solve this problem? i.e., am I attempting an unsolvable task?
 
with a circular buffer & volatile pointers, you should be fine. Remember, a compiler will never optimize a volatile read in fron of a write.
 
with a circular buffer & volatile pointers, you should be fine. Remember, a compiler will never optimize a volatile read in fron of a write.

That places restrictions on the compiler. I'm not convinced it places them on the CPU. I'm also not convinced that it prevents the compiler from inventing instructions either. If it's as simple as volatile pointers I'd be quite happy... but I've got a lot of reading to do before I convince myself of that :)
 
I had thought most modern x86 CPUs used re-order buffers to execute write backs in program order.

For out-of-order issue, I thought there was register renaming to avoid read hazards.

Out of curiosity, can you give an example of the CPU reordering you believe is causing problems?
 
pedant, I'm using C++
I would note that the pseudocode linked to from the article makes no mention if it was tested on an in-order or out-of-order processor. A little research shows that the SGI Challenge mentioned could be configured either way. However, I'd be tempted to say that it was configured with the R4000 in-order chips due to the fact that out-of-order execution was not addressed. Mayhaps that's not an issue in the JVM, but I know next to nothing about it.

jimmyb, my background in this area is rather limited. I know just enough to know that I need to be careful and learn more :)

However, just as an example, I could foresee something along the lines of:
Code:
queue[nextWrite]->store(data); //not atomic
publicQueue[nextWrite] = queue[nextWrite]; //atomic
being problematic if the update to the publicQueue pointer was made prior to the storing of the data. My understanding is that if the pointers are volatile that the compiler won't reorder them, but I don't believe that prevents the CPU from doing so. But now... I'll read up on re-order buffers.
 
Well, if anyone cares...

It looks like x86 is a really nice architecure with how things are shared between caches. Most of the time the only thing that you have to worry about being externally visible from another thread is it reordering a read prior to a write. Although in some cases some writes can be reordered ahead of others. I'm theorizing that a volatile variable is implemented with CLFLUSH and thus results in reorder issues, so I avoided volatile. On the x86 careful design let me avoid needing CPU-level memory barriers.

Compiler reordering on the other hand... that takes a bit of inline assembly. Namely:
Code:
asm volatile("" ::: "memory");
Nicely though, the same bit of inline assembly that creates the effect of a barrier for the compiler does what we need volatile to do. Not flush the cache line, but ensure that the compiler hasn't optimized away a read or write (or delayed too far) by keeping a value in a register assuming that it isn't changing.

On a side note regarding volatile variables. The only guarantee you have from the compiler is that it won't reorder volatiles past volatiles. It can still reorder non-volatile variable read/writes past volatile ones. Which makes for all sorts of messes. Turns out to be much better not to use them for thread synchronization, it's not what they're intended for, and they impose an unecessary performance hit.
 
Update appreciated. What were your resources for the information on x86 cache micro-architectures?
 
Back
Top