Spring '05 Build (supercomputer inside)

That's some kind of massive list of [h]ardware. I'm jealous. :D

Though I can't believe that you are building a massive computing environment and you are using SATA for storage.


What is the application that you plan on running? With RAID-5 SATA arrays, It looks like you are going to be primarily storage, right?
 
Why raid0 for the boot/root disks on the mid/large ones? What are you doing with these that you need it that fast but don't care about the stability?

Jesus H Christ! By my count you have 4.3TB of ram in these!!!
 
If you don't have all those monsters folding for the [H], we'll all have to track you down and beat you. :D
 
agrikk said:
That's some kind of massive list of [h]ardware. I'm jealous. :D

Though I can't believe that you are building a massive computing environment and you are using SATA for storage.


What is the application that you plan on running? With RAID-5 SATA arrays, It looks like you are going to be primarily storage, right?
Thank you.

We used the same RAID units in our last build for the same customer (currently 183 and 184 on the list) last year and they have had no problems with it. This was the correct price/performance ratio for the application.

If I ever find out what the customer will be doing with this cluster, they will kill me :D
All jokes aside, I can say that they are using this system primarily for computations.
 
unhappy_mage said:
Why raid0 for the boot/root disks on the mid/large ones? What are you doing with these that you need it that fast but don't care about the stability?

Jesus H Christ! By my count you have 4.3TB of ram in these!!!
The storage space on all CMUs will be used as swap space. If the array fails, the failed drive can be replaced, the array rebuilt, and node reimaged in the matter of minutes. All data is stored on the RAID5 units.
 
Speechless. That's awsome, BTW how much did that cost them, must be fortune.
 
If you ever need to be rid of them for space or heat reasons, my house could use a logical heatsource.. :) Don't tell my wife though!
 
The Donut said:
If you ever need to be rid of them for space or heat reasons, my house could use a logical heatsource.. :) Don't tell my wife though!
175 Kilowatts per hour.
Not terribly sure about the amount of heat it puts off - I'll find out tomorrow when I head back to the office. To give you an idea, its 42F outside and we have a very large air conditioner running non-stop. The air conditioner cools the 18 racks (on the floor) and a large exhaust fan blows the heated air out of the top of the warehouse. The ambient in the room is around 74F, i'm not quite sure what it is directly behind the racks.
 
slowbiznatch said:
175 Kilowatts per hour.
Not terribly sure about the amount of heat it puts off
~175 KW/H ;)
Every single bit of electricity used ends up as directly as heat or kinetic energy (fans and motors) and if you want to continue to get more technical, the kinetic energy from the fans is lost as heat through friction with the air.
 
so after the usual posts ...


Why aren't you using any Raptors or SCSI ? (Sure, more expensive but better than replacing dozens of chashed IDE disks ... with 1010 IDE drives ... ouch ... keep us updated how many of those bite the dust ...

The Small one are going to be RAID1 not 0 ?


(RAID5 setups are not included in above calculation)
 
I'm not talkin bout teh Raptors performance or Warranty - it's the reliability - Raptors are designed to run 24/7 under heavy load, the IDE disks are not ...

All are in RAID0 :eek:

So don't forget to count the dead drives ... every time that happens you'll have to replace it and install the OS or load an Image ... sounds like fun ;)
 
slowbiznatch said:
They are all in RAID 0 because data integrity is not an issue with compute nodes. All data is stored in redundant RAID5 arrays. To rebuild the OS on a compute node, the node is PXE booted to the control node and a 15 digit (type of node_node name) command is typed at the console to begin the auto-install.
This all makes me wonder why your compute nodes even have OS drives. Why not just PXE boot your OS completely? I mean, I'm sure you aren't running Windows on the damn thing, so you should be able to boot your minimal OS from PXE easily enough. With the clusters I've built here, we were able to get our entire boot image in a 7MB squashfs image plus a 1MB kernel. There's no need for disks, at all.
 
Wolf31o2 said:
This all makes me wonder why your compute nodes even have OS drives. Why not just PXE boot your OS completely? I mean, I'm sure you aren't running Windows on the damn thing, so you should be able to boot your minimal OS from PXE easily enough. With the clusters I've built here, we were able to get our entire boot image in a 7MB squashfs image plus a 1MB kernel. There's no need for disks, at all.
Swap space.
 
nice to see that AMD is making some money too ;)

My jaw dropped reading that list...
 
Come over to the Distributed Computing forum, we'll get you started right away on putting all that power to some good use!

 
Nice setup. I'm curious what made you pick Myrinet over something like Infiniband...From what I understand Infiniband is more cost-effective and performs better.
 
Back
Top