• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

Clarification needed on cluster computing

Bluelude1

n00b
Joined
Jul 13, 2010
Messages
26
I understand the basics of cluster computing is the combining of resources from multiple machines to provide more computing power to a particular task then one individual machine may be capable of. My question is in regards to software...most applications I see used are intended specifically for this type of work, but how seamless is the structure?

For instance I have a multithreaded Windows application that is CPU bound and I love the idea of utilizing additional resources to chip away at workload, but I don't fully understand how complex spreading the task to additional resources is. Once the computing cluster is created can I essentially run the program as I normally would on a typical Windows 8 PC and seamlessly take advantage of the additional resources or would my program have to be programmed specifically to spread the task across a cluster?

BTW - I am referring to the use of Windows Server 2012 w/ the HPC package.
 
Thanks scubadiver59,

As I understand it there are various systems to create a cluster, but what I am still a bit fuzzy on is after a cluster has been created does it run in the background? I am trying to understand if I create a cluster of 4 windows servers running 8 cores each does the system link them together in the background and let me continue with my project just like I am running a 32 core machine? Or does my software have to be designed to specifically with cluster computing in mind.

Sorry if this is a no brainer question, its just that I have seen most jobs run on Linux which I have no familiarity with and the process always seems to come off as more programmic.
 
The applications needs to be designed specifically to make use of the resources, just like with multi-core systems today. When working with things like Hadoop, you need to use some kind of framework that you run your application on and tells it to send map reduce jobs. If you read up on Hadoop, it should give you a fairly good idea on what is required.
 
The setups I have worked with in the past you have a master server that delegates the work load to the other servers/workstations in the group. It divides the task up based on performance of the nodes handling the work load. The application has to be written specifically for clustered computing, and since FAH, BOINC, etc are already a cluster setup you can't cluster compute a task that is already. Their queue manager/master server delegates which WU you get based on your hardware, and your computer/server in nothing more than a node in the compute line. It even has redundancy so if a node doesn't finish its task there is another node out there somewhere doing the same task.

Example of Cluster computing:
http://en.wikipedia.org/wiki/Render_farm

Cluster Computing Benchmarks:
http://en.wikipedia.org/wiki/LINPACK_benchmarks

Basically it is all designed for simulation, prediction, and computation.
 
Now something that I would like would be a master server that pulled down the WUs, identified which one it was, and then assigned that WU to the computer that would handle it the most efficiently. Like take the 8101s and shove that off to a 6 or 8 CPU folding machine and put the others on the 4Ps.

Now, I have yet to get a 6 or 8 CPU system...but it's only a matter of time! ;-)
 
So basically what your saying is if I created a cluster and then tried to fire up AutoCad or any other 3rd party multithreaded software I am not necessarily going to get the benefit of the additional computing resources, unless the program was designed for cluster computing?
 
What you mean as "seamless" is known as SSI (single system image) cluster (http://en.wikipedia.org/wiki/Single_system_image),
that is, an environment that allows you to run your multithreaded application unmodified.

From brief reading, it doesn't seem that Microsoft's HPC products supports that so you'll most likely
need to split your work into multiple jobs and then deploy them on your computing nodes (each node
will be completing its job(s) in an independent fashion), then gather/aggregate results.
 
So basically what your saying is if I created a cluster and then tried to fire up AutoCad or any other 3rd party multithreaded software I am not necessarily going to get the benefit of the additional computing resources, unless the program was designed for cluster computing?

That pretty much sums it up. Autocad does have a rendering module for offloading huge renderings to a rendering farm, but I never hear much good about it only its a high maintenance product.
 
A SSI cluster sounds more like what I am looking for ...unfortunately I can't seem to find anything similar that would work with on a Windows platform.
 
Your biggest issue - as called in the HPC field - is problem decomposition. Mapping the compute problem to the different resources for maximum efficiency (CPU, memory, communication).
For decent performance, all approaches need your software to be aware about the memory/CPU/communication affinity - welcome to the world of explicit parallelism.The free ride is over :)

The largest SSI systems for Windows are from SGI (>256 core)
You can run a WServer cluster with 64 nodes (cluster like mainframes)
you can run a WServer cluster like your current HPC system

you can pick and choose one approach according to cost, managability, reliability, scale-up, or scale-out requirements.

A good paper, which archetype of scientific problem fits which particular HPC cluster and system architecture can be found in the 13 dwarfs paper of Berkeley's ParLab - recommended reading

To gain high efficiency at scale, you need to touch your sourcecode, independent of the OS or system architecture underneath.

rgds,
Andy
 
So basically what your saying is if I created a cluster and then tried to fire up AutoCad or any other 3rd party multithreaded software I am not necessarily going to get the benefit of the additional computing resources, unless the program was designed for cluster computing?
Exactly. There are at least two types of scalability: scale out (cluster of several PCs) and scale up (a single huge server). I discuss it here, in this thread of the new coming IBM POWER8 cpu
http://hardforum.com/showpost.php?p=1040393845&postcount=41

AndyE, what do you say about my link? Do you agree with it? Maybe you should read the whole thread?
 
Exactly. There are at least two types of scalability: scale out (cluster of several PCs) and scale up (a single huge server). I discuss it here, in this thread of the new coming IBM POWER8 cpu
http://hardforum.com/showpost.php?p=1040393845&postcount=41

AndyE, what do you say about my link? Do you agree with it? Maybe you should read the whole thread?
@brutalizer,
read through this thread and your extensive posts. Will chime in over there and add some corrections.

Andy
 
I don't know what your application is. Without that info, any recommendation lacks any reasonable base.

The links and strategy you are pointing to are leading you deep into heterogenous computing (One application dealing with mutiple CPU architectures)

Andy
 
Last edited:
^^ +1

Why not just ask your app vendor? They, of all, should be most informed :)
 
Back
Top