Spring '05 Build (supercomputer inside)

agrikk

Gawd
Joined
Apr 16, 2002
Messages
933
That's some kind of massive list of [h]ardware. I'm jealous. :D

Though I can't believe that you are building a massive computing environment and you are using SATA for storage.


What is the application that you plan on running? With RAID-5 SATA arrays, It looks like you are going to be primarily storage, right?
 

unhappy_mage

[H]ard|DCer of the Month - October 2005
Joined
Jun 29, 2004
Messages
11,455
Why raid0 for the boot/root disks on the mid/large ones? What are you doing with these that you need it that fast but don't care about the stability?

Jesus H Christ! By my count you have 4.3TB of ram in these!!!
 
Joined
Dec 24, 2004
Messages
588
If you don't have all those monsters folding for the [H], we'll all have to track you down and beat you. :D
 
Joined
Oct 24, 2001
Messages
856
agrikk said:
That's some kind of massive list of [h]ardware. I'm jealous. :D

Though I can't believe that you are building a massive computing environment and you are using SATA for storage.


What is the application that you plan on running? With RAID-5 SATA arrays, It looks like you are going to be primarily storage, right?
Thank you.

We used the same RAID units in our last build for the same customer (currently 183 and 184 on the list) last year and they have had no problems with it. This was the correct price/performance ratio for the application.

If I ever find out what the customer will be doing with this cluster, they will kill me :D
All jokes aside, I can say that they are using this system primarily for computations.
 
Joined
Oct 24, 2001
Messages
856
unhappy_mage said:
Why raid0 for the boot/root disks on the mid/large ones? What are you doing with these that you need it that fast but don't care about the stability?

Jesus H Christ! By my count you have 4.3TB of ram in these!!!
The storage space on all CMUs will be used as swap space. If the array fails, the failed drive can be replaced, the array rebuilt, and node reimaged in the matter of minutes. All data is stored on the RAID5 units.
 

EnderXC

Limp Gawd
Joined
Oct 20, 2004
Messages
449
Speechless. That's awsome, BTW how much did that cost them, must be fortune.
 

The Donut

2[H]4U
Joined
Jan 28, 2003
Messages
3,114
If you ever need to be rid of them for space or heat reasons, my house could use a logical heatsource.. :) Don't tell my wife though!
 
Joined
Oct 24, 2001
Messages
856
The Donut said:
If you ever need to be rid of them for space or heat reasons, my house could use a logical heatsource.. :) Don't tell my wife though!
175 Kilowatts per hour.
Not terribly sure about the amount of heat it puts off - I'll find out tomorrow when I head back to the office. To give you an idea, its 42F outside and we have a very large air conditioner running non-stop. The air conditioner cools the 18 racks (on the floor) and a large exhaust fan blows the heated air out of the top of the warehouse. The ambient in the room is around 74F, i'm not quite sure what it is directly behind the racks.
 

Vertigo Acid

2(-log[H+])4u
Joined
May 31, 2003
Messages
12,416
slowbiznatch said:
175 Kilowatts per hour.
Not terribly sure about the amount of heat it puts off
~175 KW/H ;)
Every single bit of electricity used ends up as directly as heat or kinetic energy (fans and motors) and if you want to continue to get more technical, the kinetic energy from the fans is lost as heat through friction with the air.
 

Anarchy

2[H]4U
Joined
Aug 25, 2000
Messages
3,074
so after the usual posts ...


Why aren't you using any Raptors or SCSI ? (Sure, more expensive but better than replacing dozens of chashed IDE disks ... with 1010 IDE drives ... ouch ... keep us updated how many of those bite the dust ...

The Small one are going to be RAID1 not 0 ?


(RAID5 setups are not included in above calculation)
 

Anarchy

2[H]4U
Joined
Aug 25, 2000
Messages
3,074
I'm not talkin bout teh Raptors performance or Warranty - it's the reliability - Raptors are designed to run 24/7 under heavy load, the IDE disks are not ...

All are in RAID0 :eek:

So don't forget to count the dead drives ... every time that happens you'll have to replace it and install the OS or load an Image ... sounds like fun ;)
 

Wolf31o2

Supreme [H]ardness
Joined
Nov 7, 2000
Messages
4,612
slowbiznatch said:
They are all in RAID 0 because data integrity is not an issue with compute nodes. All data is stored in redundant RAID5 arrays. To rebuild the OS on a compute node, the node is PXE booted to the control node and a 15 digit (type of node_node name) command is typed at the console to begin the auto-install.
This all makes me wonder why your compute nodes even have OS drives. Why not just PXE boot your OS completely? I mean, I'm sure you aren't running Windows on the damn thing, so you should be able to boot your minimal OS from PXE easily enough. With the clusters I've built here, we were able to get our entire boot image in a 7MB squashfs image plus a 1MB kernel. There's no need for disks, at all.
 
Joined
Oct 24, 2001
Messages
856
Wolf31o2 said:
This all makes me wonder why your compute nodes even have OS drives. Why not just PXE boot your OS completely? I mean, I'm sure you aren't running Windows on the damn thing, so you should be able to boot your minimal OS from PXE easily enough. With the clusters I've built here, we were able to get our entire boot image in a 7MB squashfs image plus a 1MB kernel. There's no need for disks, at all.
Swap space.
 

drizzt81

[H]F Junkie
Joined
Jan 21, 2004
Messages
12,361
nice to see that AMD is making some money too ;)

My jaw dropped reading that list...
 

wake6830

Gawd
Joined
Jan 3, 2005
Messages
843
Come over to the Distributed Computing forum, we'll get you started right away on putting all that power to some good use!

 

[H]EMI_426

2[H]4U
Joined
Feb 19, 2001
Messages
3,965
Nice setup. I'm curious what made you pick Myrinet over something like Infiniband...From what I understand Infiniband is more cost-effective and performs better.
 
Top