Bigadv/Beta Folding Farm still a Good idea?

A 2600K, mobo, and 8 gigs of RAM can be had for $450. A couple creatively wired OCZ ZX1250W ($150 each) could power 10 builds, right? Is this viable, yet?

$180 for six hyper 212+

You could probably power 3 or 4 rigs with a decent 1250 PSU, but I can't speak to the quality of that particular model.

So you're looking at $3060 for six CPU, mobo, ram, and cooling. Then say two PSUs at $300

That probably puts you around 450,000 PPD (75k * 6) for $3,360
 
That's not right. Possibly a software issue scaling to 80 threads properly

but some time ago I read that F@H could scale all the way to 128 cores/threads.. at that time I was like that will never happen, but the way AMD is heating this game, I bet we will hit that number in just a few years,(64 cores is a few more weeks away).. :eek:
 
I Resent that...:mad:

No need to get mad, AMD have delayed the launch of Bulldozer based chips so many times i have lost count, This year alone it has been July, August, Q2, then september and currently Q3 with no specific month. At the rate things are going it will be into next year:eek:
 
No need to get mad, AMD have delayed the launch of Bulldozer based chips so many times i have lost count, This year alone it has been July, August, Q2, then september and currently Q3 with no specific month. At the rate things are going it will be into next year:eek:

what I resented was the idea of AMD delaying it even more, its like they are waiting for Intel to catch up...:confused:
 
One of my 4p boxen is intel... e7-4870 its slower than your 4p considerably... It's rather sad actually...

Oh... :eek: But but... how? :(

AMD did their job......

if you Understad below Pic Patriot, this will make your mouth water... ;)
IBM3755.jpg
 
if you Understad below Pic Patriot, this will make your mouth water... ;)
IBM3755.jpg

the first one to guess what is that picture about gets a cookie..

first clue its not about the IBM x3755 server,
second clue check the sockets...;)
 
you are almost there..... not quite an 8p but a 6p.. but how ? why? take a closer look...;)

I assume you have two identical systems. The ribbon is connected to the card on the back to do the interconnect between the systems. So, you have 3 cpus in each system with 1 socket used to link the systems...
 
I assume you have two identical systems. The ribbon is connected to the card on the back to do the interconnect between the systems. So, you have 3 cpus in each system with 1 socket used to link the systems...
You Sir gets to Eat the Cookie....:D

it laverage HyperTransport native processor interconect speed and latencies to conect them as one huge 72 core system, and with Interlagos 98 cores, but that is just 2 nodes it can grow to thousands of nodes without loosing speeds or latencies, as Patriot has tested Hypertransport interconect is a Mature technology for Folding@Home as it scales linearly unlike intel's QPI wich seems not to scale well, Patriot tested a 40 core Intel in a 4P system and it is lagging behind AMD 4P 48 core system, but why if intel processors are at this moment are at least one generation ahead? blame it on the scalability of QPI...
 
does fah support 72 core systems?

if so i want to try this option on my rigs and someone needs to tell me how to do it.
 
does fah support 72 core systems?

if so i want to try this option on my rigs and someone needs to tell me how to do it.

It should but there have been reports that something doesn't scale well beyond 64 threads. The only way we will find out is for people to experiment and feed back to PG
 
not the way shown here, musky did send me couple of links before but i found out it was very difficult to do, nobody ever tried that method what he suggested.

I thought we worked with you on this before.
 
Can you post details on the HT-link card and other details of how this is supported (BIOS supported needed, software, etc)?

I've tested fah up to -smp 160 and scaling is linear relative to number of cores. The currently shipped windows binary has a hard limit of 64 threads due to it being a 32bit binary (this has been discussed a number of times, so won't repeat details here).

I just did the math assuming 6180 processors running at 2.7ghz, and baseline 6904 TPF of 16:00.

4x 6180, 1 machine: 678,228 PPD
8x 6180, 2 separate machines: 1,356,456 PPD
12x 6180, 3 separate machines: 2,034,684 PPD
16x 6180, 4 separate machines: 2,712,912 PPD

Assuming each linked machine has 3 CPUs, and scaling is perfectly linear:

6x 6180, 2 machines "linked": 10:40 TPF, 1,245,986 PPD
9x 6180, 3 machines "linked": 7:06 TPF, 2,294,398 PPD
12x 6180, 4 machines "linked": 5:20 TPF, 3,524,180 PPD

So, from a PPD perspective, you'd be better off running two separate machines with 4 processors each. However, at 3 or more machines, linking them makes sense, assuming scaling is actually linear. Do you have the ability to run some tests on linked systems? (this ignores cost factors).

(I'll check later whether -smp 72 is an optimal/supported domain decomposition grid size.)






You Sir gets to Eat the Cookie....:D

it laverage HyperTransport native processor interconect speed and latencies to conect them as one huge 72 core system, and with Interlagos 98 cores, but that is just 2 nodes it can grow to thousands of nodes without loosing speeds or latencies, as Patriot has tested Hypertransport interconect is a Mature technology for Folding@Home as it scales linearly unlike intel's QPI wich seems not to scale well, Patriot tested a 40 core Intel in a 4P system and it is lagging behind AMD 4P 48 core system, but why if intel processors are at this moment are at least one generation ahead? blame it on the scalability of QPI...
 
not the way shown here, musky did send me couple of links before but i found out it was very difficult to do, nobody ever tried that method what he suggested.
well most other ways to linking computer nodes and making them look like one huge SMP Single System Image is thru Proprietary Software Hypervisors and InfiniBand,
but this has many drawbacks for Folding@Home, there are Latency Penalties in Using InfinBand as an InterProcessor comunication as InfiniBand Uses RDMA and there must be some algorithims to mask this from the OS all of this is made by the Hypervisor and that also adds to the Overhead, meaning it will not scale lineraly, basicaly too much overhead using a sofware SMP and InfiniBand that at best can reach 1microsecond of Latencies..

but if you want to try it just let me know I already have the contact and how much to set up a 4 nodes of 2 socket Xeon motherboards(just a hint, a license just for a single 2 socket board with hexacore xeon its $2,500 and so far only Intel supported)

what´s the difference between using InfiniBand and HyperTransport as processor interconect? HyperTransport gives you Native processor to processor Latencies in the nanoseconds range, you do not need a dedicated router or switch, just cables, you don´t need hypervisors to mask its presence, and native windows its supported...:D


Can you post details on the HT-link card and other details of how this is supported (BIOS supported needed, software, etc)?
don´t know if you guys remember the old HyperTransport HTX slot that some Opteron motherboards used to have about 7 years ago? well the idea is not dead yet, but since no board like that is produce anymore then a few smart people that have seen the potential of the mature HyperTransport interconect, did come up with a few good ideas, since HTX can´t conect thru PCIx because they will loose their edge, they use a Direct Socket to card conect..
Below pics will give you the Idea..
pick-up-card-300px-201106.jpg

pick-up-cable-300px-201106.jpg

N323-300px-201106.jpg



So, from a PPD perspective, you'd be better off running two separate machines with 4 processors each. However, at 3 or more machines, linking them makes sense, assuming scaling is actually linear. Do you have the ability to run some tests on linked systems? (this ignores cost factors).
Why would you run separate instances? just run SMP 72 and watch how much Beta Points this beast will produce..

So far I haven´t ask for any F@H test but if enough of us ask at the same time I am sure they could hear us, I mean in a few weeks Magnycours will be in FleeBay by the hundreds as the big guys will be upgrading to Interlagos, one could have a second hand 12 core for cheap...
 
Separate instances is what we are running today -- eg: 4 separate 4p machines, each running their own work unit. Numbers for that were included above, compared to a hypothetical perfect scaling set of linked machines that would all be processing the same work unit.

Why would you run separate instances? just run SMP 72 and watch how much Beta Points this beast will produce..

So far I haven´t ask for any F@H test but if enough of us ask at the same time I am sure they could hear us, I mean in a few weeks Magnycours will be in FleeBay by the hundreds as the big guys will be upgrading to Interlagos, one could have a second hand 12 core for cheap...
 
It is interesting that the loss of the 4th proc to bridge plays into making this a 3 system minimum for benifit.

Not sure we will see it anytime soon, but depending on the price of that cable.... it might be possible with the 2nd hard hardware that we will see in a couple months.
 
It is interesting that the loss of the 4th proc to bridge plays into making this a 3 system minimum for benifit.

Not sure we will see it anytime soon, but depending on the price of that cable.... it might be possible with the 2nd hard hardware that we will see in a couple months.
knowing how steep the Bonus point curb is I would not be surprice if a 72 cores system average more points than two separate 48 cores system...:p

I am in contact with the company that makes this possible, if you can see the pics on the previous page, its the socket interconect, the ribbon and the network card, total price for that is $1,750 1 cable is $150 to conect two system, so for a 2 nodes system is $3,650. no routers nothing extra is need it.

now lets compared that price to a ScaleMP(software SMP) you need to have a base license wich goes for $1,000 and a extra node of identical system(lets say you want to connect two 2P Xeon hexacore system) the extra node license is $2,500, you need the DDR InfiniBand card on each nodes and a switch, that alone will get you past $4,000 and its Software SMP not HardWare SMP like the guys at NumaScale, search for them..

so far I am saving money to get me one of them system to rape them BigBeta WUs...
 
Yeh, I figured it was going to be crazy costly.

I will spend my pennys on better procs. That is just too much for such specialized hardware.
 
Yeh, I figured it was going to be crazy costly.

I will spend my pennys on better procs. That is just too much for such specialized hardware.

well if you research on Big Iron SMP like Cray and SGI you will find how expencive they can get, NumaScala HyperTransport interconect its realy not that expencive..
 
knowing how steep the Bonus point curb is I would not be surprice if a 72 cores system average more points than two separate 48 cores system...:p

I am in contact with the company that makes this possible, if you can see the pics on the previous page, its the socket interconect, the ribbon and the network card, total price for that is $1,750 1 cable is $150 to conect two system, so for a 2 nodes system is $3,650. no routers nothing extra is need it.

now lets compared that price to a ScaleMP(software SMP) you need to have a base license wich goes for $1,000 and a extra node of identical system(lets say you want to connect two 2P Xeon hexacore system) the extra node license is $2,500, you need the DDR InfiniBand card on each nodes and a switch, that alone will get you past $4,000 and its Software SMP not HardWare SMP like the guys at NumaScale, search for them..

so far I am saving money to get me one of them system to rape them BigBeta WUs...

I would spend the money on a nice switch...rather than 1700 on a card... if its possible...
 
I would spend the money on a nice switch...rather than 1700 on a card... if its possible...

well keep in mind that if you are going the Software way(ScaleMP) you would need more than a Switch... ;)

you'll pay about $3,500 on the Hypervisor license and its Intel Only.. and its a virtualized SMP not a real hardware one
 
just wondering if we should lean towards clusters instead? Not sure how much bandwidth is needed? But 10gbit is included on macs, they should be some cheap equivalent coming out for PC I'd guess.

If we can somehow make folding run on beowulf with 10gbit (multiple?) connectivity that would be way cheaper.
 
just wondering if we should lean towards clusters instead? Not sure how much bandwidth is needed? But 10gbit is included on macs, they should be some cheap equivalent coming out for PC I'd guess.

If we can somehow make folding run on beowulf with 10gbit (multiple?) connectivity that would be way cheaper.

10GBe are not cheap...
a "cheap" cluster is a blade system...

I recommend 4p being the sweet spot for a while... as I dont see scaling on the 6p to be the same as 2 3p...given just 1 ht link between....
 
just wondering if we should lean towards clusters instead? Not sure how much bandwidth is needed? But 10gbit is included on macs, they should be some cheap equivalent coming out for PC I'd guess.

If we can somehow make folding run on beowulf with 10gbit (multiple?) connectivity that would be way cheaper.
its not just a question of bandwith, but of latency. Lots of folds in a protein, lots of intercore communication.. all that latency adds up!
 
its not just a question of bandwith, but of latency. Lots of folds in a protein, lots of intercore communication.. all that latency adds up!

Right which is why I don't see even this multi node working as well.... I guess it depends how its done...

10GBe are not cheap...
a "cheap" cluster is a blade system...

I recommend 4p being the sweet spot for a while... as I dont see scaling on the 6p to be the same as 2 3p...given just 1 ht link between....

so as crazy as it sounds... from a HT bandwidth perspective.... I would think scaling would be harsh after 8p or
>2 nodes.... unless..... you did 1 4p in center with no cpus...and 4 3ps around it... that would give you 12 cpus....
 
Right which is why I don't see even this multi node working as well.... I guess it depends how its done...

so as crazy as it sounds... from a HT bandwidth perspective.... I would think scaling would be harsh after 8p or >2 nodes.... unless..... you did 1 4p in center with no cpus...and 4 3ps around it... that would give you 12 cpus....

Now that's thinking outside the boxen.
 
just wondering if we should lean towards clusters instead? Not sure how much bandwidth is needed? But 10gbit is included on macs, they should be some cheap equivalent coming out for PC I'd guess.

If we can somehow make folding run on beowulf with 10gbit (multiple?) connectivity that would be way cheaper.
its all in the Latencies, the lowest you can get with Ethernet its RoCE or RDMA over Converged Ethernet, wich gets you in the 2us ~ 5us but NumaScale provides you with native nano second latencies


so as crazy as it sounds... from a HT bandwidth perspective.... I would think scaling would be harsh after 8p or
>2 nodes.... unless..... you did 1 4p in center with no cpus...and 4 3ps around it... that would give you 12 cpus....
here have a read on the NumaScale SMP AdapterCard

http://www.numascale.com/numa_smp_adapter.html

Edit.
Or we can see if we can Port to x86_64 vNUMA vNUMA stands for Virtual Numa, a Itanium Hypervisor that links clusters using InfiniBand(Low latencies) and make them look like one SMP single system image
nice reading..
http://www.usenix.org/events/usenix09/tech/full_papers/chapman/chapman_html/
 
Last edited:
Everything you are talking about is so far beyond the average folders budget. As a larger folder it is still well outside the scope of what I'm willing to put into any single generation of hardware.

I'm more appt to try and get as much as I can out of hardware I can scrounge up on the FS section and Ebay.
 
Everything you are talking about is so far beyond the average folders budget. As a larger folder it is still well outside the scope of what I'm willing to put into any single generation of hardware.
True... and while Big Iron SMP like performance and hardware is expencive NumaScale its the least expencive of them,

I would love to see how a SGI its Altix UV 1000 get them BigBetas...:)
 
Who are you, nicfolder?

Are you a time traveler from the future?

You seem to have secret knowledge of the future. You have only been here at [H] for nine days. Where and when do you hail from?
 
;)
Who are you, nicfolder?

Are you a time traveler from the future?

You seem to have secret knowledge of the future. You have only been here at [H] for nine days. Where and when do you hail from?

just an Old Time folder keeping a low profile ;)
 
Back
Top