Distributed processing (?) with Infiniband

fastgeek

[H]ard|DCOTM x4 aka "That Company"
Joined
Jun 6, 2000
Messages
6,520
OK, so I've asked something similar before, but that was more along the lines of what used to be possible "back in the day" with older versions of FAH; so meaning just having a bunch of computers share the load over ethernet. So much for that idea.

Now I've learned that one of the other groups is building a pretty nice rack and it has QDR Infiniband to boot. In their case it's just being used as an ultra high speed data network; but whenever I hear Infiniband it makes me think of taking all those lovely nodes and making them work together as one. The guy in charge of this little project is a geek too and is quite intrigued at the idea of making this happen (as in let it run over the weekend for shits and giggles); however he's swamped and I'm seriously green when it comes to this stuff.

So my question is - just WTF needs to happen to make this work? More importantly, what's the path of least resistance? Rack has a few of these @ 2.1Ghz and a bunch of these @ 2.6Ghz (Basically a controller / node setup). We have pretty much every major OS you can shake a stick at; but tend to use OpenSUSE, Windows and VMWare the most. Would be amusing as hell if somehow the usual Ubuntu install could work. :p

Can this easily happen? Or is it just another one of my pipe dreams?
 
You need something like vSMP/ScaleMP to work as a layer between (probably) the bare metal and the OS so that the OS can see the cluster as a single machine. Once that happens, FAH may or may not work. The problem is, there is no open source ScaleMP options, and it is expensive - I want to say $800/G34 node, but I don't remember. We also have no idea if even ScaleMP will work for what we would want it to.
 
You need something like vSMP/ScaleMP to work as a layer between (probably) the bare metal and the OS so that the OS can see the cluster as a single machine. Once that happens, FAH may or may not work. The problem is, there is no open source ScaleMP options, and it is expensive - I want to say $800/G34 node, but I don't remember. We also have no idea if even ScaleMP will work for what we would want it to.

$800/per mode for dual socket and lower end G34 CPUs. For 62xx CPUs, it is $800 per socket :eek:
 
Why can nothing ever be easy? :( Pity some kind of VMWare cluster wouldn't work.

OTOH, I wonder if there's a trial of vSMP for potential customers? God knows we get interesting things for trials all the time.
 
OTOH, I wonder if there's a trial of vSMP for potential customers?

No! I was actually laughed at when I asked in person over the phone. I was about to pull them trigger with ScaleMP a bit ago, but no there is not a chance in hell.
 
Huh. How big of a company were you representing? The company I work for, while not huge, is very well known in the semiconductor industry. Plus we're local to boot.
 
Just myself. Initially I was thinking about shelling out $800 to cluster 2 dual 771 Xeon systems together over 10Gb ethernet. They said there is not Trial version, their software just works.....
 
Does vSMP support FAH?
As we see, Mosix is a software similar to vSMP but not so expensive, however, seems it doesn't support current FAH SMP clients because shared-memory is unsupported.
There is a explanation in Mosix's FAQ (http://www.mosix.org/faq/output/faq_q0061.html):
Question:
Why shared-memory is not supported
Answer:
Because it is not scalable, i.e., it is impossible to change the contents of a memory in one node and expect that the same change will be reflected instantly in the memory of the remaining nodes (with which memory is shared), e.g., as in a multi-core.
I'm wondering whether or how vSMP is able to solve this problem?
 
I'm wondering whether or how vSMP is able to solve this problem?

Yes and no. In theory it does supported shared memory and a client is completely unaware that it is running across multiple nodes. The big issue is when memory needs to be updated across nodes, there is a HUGE latency hit ( even across Infiniband ). Once inter connect technology approaches something even close half the speed of HT, running something like vSMP might be an option for apps like the FAH one.
 
Been playing with 40gbps QDR IB also as you may have seen from the Mellanox ConnectX-2 QDR Infiniband post today:
Mellanox-ConnectX-2-MHQH19B-XTR.png


Noted in the post that they were about $100/ each and work with SMB 3.0 also :)
 
Back
Top