IBM Roadrunner Smashes the Petaflop Barrier

HardOCP News

[H] News
Joined
Dec 31, 1969
Messages
0
From the front page:

In 2006, the Department of Energy’s National Nuclear Security Administration selected Los Alamos National Laboratory as the development site for Roadrunner and IBM as the computer’s designer and builder. Roadrunner, named after the New Mexico state bird, cost about $100 million, and was a three-phase project to deliver the world’s first “hybrid” supercomputer – one powerful enough to operate at one petaflop (one thousand trillion calculations per second). That’s twice as fast as the current No.1 rated IBM Blue Gene system at Lawrence Livermore National Lab – itself nearly three times faster than the leading contenders on the current TOP 500 list of worldwide supercomputers.
 
I have to say, this is a proud day for me professionally. Yes I do work for IBM.

Now if they would only lend me that beast to run my 2560x1600 monitor with some decent FPSs;)

According to the chief software architech - btw, yes this is software achievement (think drivers) - this is the equiv of 15-30 thousand playstations to solve the world's most complex problems.
 
And now build one with a quad core AMD and 4 of the new NVidia G200 chips slightly overclocked, easily doable on two boards, that would weigh in at 5 TERAFLOPs... more than 5 TIMES faster than the roadrunner!! Amazing how fast computers are getting faster.

And the moment one becomes sentient we are toast because according to IBM it can out think the entire worlds population at lightning speed. :eek::rolleyes::p

We're all gonna die. :cool:
 
And now build one with a quad core AMD and 4 of the new NVidia G200 chips slightly overclocked, easily doable on two boards, that would weigh in at 5 TERAFLOPs... more than 5 TIMES faster than the roadrunner!! Amazing how fast computers are getting faster.

And the moment one becomes sentient we are toast because according to IBM it can out think the entire worlds population at lightning speed. :eek::rolleyes::p

We're all gonna die. :cool:

I hope your math was that far off on purpose...
 
AMD cell procs....seems AMD is not as bad off as the doom sayers say....

No. Cell is an IBM-designed chip (as the article says, it's used in the PS3) and they have many Cells 'talking' to a much smaller number of AMD x86 chips. The Cells are where the machine gets its power, the AMD chips are just sort of 'herding' them.
 
And now build one with a quad core AMD and 4 of the new NVidia G200 chips slightly overclocked, easily doable on two boards, that would weigh in at 5 TERAFLOPs... more than 5 TIMES faster than the roadrunner!! Amazing how fast computers are getting faster.

And the moment one becomes sentient we are toast because according to IBM it can out think the entire worlds population at lightning speed. :eek::rolleyes::p

We're all gonna die. :cool:

zippa.jpg
 
Give it to the NSA...


Hopefully, lets get this over with. We know where its heading ;)

OK! Its brilliant, would hate to troubleshoot those systems :D
 
No, I was guesstmating based on the hype. Allegedly the GTX280 chip runs 940 GFLOPs at stock speed, so a good typical NVidia overclock should yeild about 1.2-1.3 TERAFLOPs each. Put 4 of them on one board (should take about the same space as a blade board) and sister it to a phenom based cpu to make the equivalent of one of IBM's "Tri-Blade" gizmos. Well that should land the combo in the 5 TERAFLOP range, times 4,000 "tri-blades" or "dual blades" cough.... and you easily vlow the IBM out of the water... each IBM "tri-blade" weighs in at 400+400+amd = 800+ Gigaflops, or 0.8 TERAFLOPs. So the GTX280 based units could rack up... 4,000 x 5 TERAFLOPs or a whopping 20 PETAFLOPs for the equivalent setup.

If you built a "blade" with the CPU etc on the CPU board facing right, and the GPU board etc facing left, with a water cooling plate sandwiched between them, and the "blade center" supplying interconnect/power/and water cooling, you'd easily be able to pack that level of power into the same physival volume.

I mean, if you are going to spend $100 MILLION on this thing, then you have alot of room for serious performance tweakage :eek::rolleyes::p
 
As an employee of a company that assisted IBM on the actualy build out of Roadrunner, this stuff is awesome. I know the guys that work on this thing. 3 months ago, @ Stage 1, performance was just over 1/2 of a petaflop. They're in Stage 2 now, and stage 3 is expected to bring anywhere from 1.2-1.5petflops.

IBM takes all the thunder though, which sucks. Our company is small, but we don't make the news. Yay for politics. See you at SC'08 SpeedyVV
 
No, I was guesstmating based on the hype. Allegedly the GTX280 chip runs 940 GFLOPs at stock speed, so a good typical NVidia overclock should yeild about 1.2-1.3 TERAFLOPs each. Put 4 of them on one board (should take about the same space as a blade board) and sister it to a phenom based cpu to make the equivalent of one of IBM's "Tri-Blade" gizmos. Well that should land the combo in the 5 TERAFLOP range, times 4,000 "tri-blades" or "dual blades" cough.... and you easily vlow the IBM out of the water... each IBM "tri-blade" weighs in at 400+400+amd = 800+ Gigaflops, or 0.8 TERAFLOPs. So the GTX280 based units could rack up... 4,000 x 5 TERAFLOPs or a whopping 20 PETAFLOPs for the equivalent setup.

If you built a "blade" with the CPU etc on the CPU board facing right, and the GPU board etc facing left, with a water cooling plate sandwiched between them, and the "blade center" supplying interconnect/power/and water cooling, you'd easily be able to pack that level of power into the same physival volume.

I mean, if you are going to spend $100 MILLION on this thing, then you have alot of room for serious performance tweakage :eek::rolleyes::p

And pray to god data exchange between cards in different machines is as easy as this proprietary system ;)
 
It would be very interesting to learn about how they have communication provided for.

Each GTX280 is a PCIe 2.0 x16 native chip, so it has serious IO potential, and on the CPU board would be whatever is needed for this function..... x16 by 4 to fiber? Many GB/s potential.

My point behind the comments is to say the same people using the GTX280 instead of Cell could produce an even faster machine.... so I assume they'd still be using thier same propietary communications mechanism,etc.

The GTX280 as simply a calculation engine (not 4 video cards), could easily be laid out on one board, with 4 processors, each surrounded by its 1GB of ram in an L, with a connector carrying the PCIe 2.0 signals to the CPU board arranged along the top/bottom edge of the board, and the mosfet/voltage regs along the the back edge.

Im just brainstorming here. Ive designed many highspeed motherboards, array processor systems, and machine vision engines, etc. in my day.

I just find it amazing there is so much power to be found in the technology we have this month. It's alot like the old days of NASA, where developments for other reason end up having major ramifications to our abilities in civilian life.

Here NVidia's (or SONY's) pursuit of better videogames provides the building blocks for orders of magnitude greater super computers.... accidently ;-)

Imagine how much processing power is sitting idle in American homes, that folding/seti/etc only makes a tiny tiny dent in. I think a government incentive program to get all that idle processing power to work would be worth pursuing. 100+ million computers twiddling thier thumbs instead of working out climate models and fusion reactor theory. :cool:
 
It would be very interesting to learn about how they have communication provided for.

Each GTX280 is a PCIe 2.0 x16 native chip, so it has serious IO potential, and on the CPU board would be whatever is needed for this function..... x16 by 4 to fiber? Many GB/s potential.

My point behind the comments is to say the same people using the GTX280 instead of Cell could produce an even faster machine.... so I assume they'd still be using thier same propietary communications mechanism,etc.

The GTX280 as simply a calculation engine (not 4 video cards), could easily be laid out on one board, with 4 processors, each surrounded by its 1GB of ram in an L, with a connector carrying the PCIe 2.0 signals to the CPU board arranged along the top/bottom edge of the board, and the mosfet/voltage regs along the the back edge.

Im just brainstorming here. Ive designed many highspeed motherboards, array processor systems, and machine vision engines, etc. in my day.

I just find it amazing there is so much power to be found in the technology we have this month. It's alot like the old days of NASA, where developments for other reason end up having major ramifications to our abilities in civilian life.

Here NVidia's (or SONY's) pursuit of better videogames provides the building blocks for orders of magnitude greater super computers.... accidently ;-)

Imagine how much processing power is sitting idle in American homes, that folding/seti/etc only makes a tiny tiny dent in. I think a government incentive program to get all that idle processing power to work would be worth pursuing. 100+ million computers twiddling thier thumbs instead of working out climate models and fusion reactor theory. :cool:

Comcast doesnt allow us enough bandwidth for that.
 
It's gonna be funny when 5 years from now you'll be able to buy a single CPU with the same power.



...if that happens of course :D
 
And now build one with a quad core AMD and 4 of the new NVidia G200 chips slightly overclocked, easily doable on two boards, that would weigh in at 5 TERAFLOPs... more than 5 TIMES faster than the roadrunner!! Amazing how fast computers are getting faster.

And the moment one becomes sentient we are toast because according to IBM it can out think the entire worlds population at lightning speed. :eek::rolleyes::p

We're all gonna die. :cool:

I thought you were joking about this at first, but you kept on posting afterwards about 5 Teraflops.

The article says the Roadrunner smashed the petaflop barrier. Teraflop = 10^12, Petaflop = 10^16. In otherwords, your 5 teraflop machine you speak of is pretty far off.
 
I thought you were joking about this at first, but you kept on posting afterwards about 5 Teraflops.

The article says the Roadrunner smashed the petaflop barrier. Teraflop = 10^12, Petaflop = 10^16. In otherwords, your 5 teraflop machine you speak of is pretty far off.

Totally ignore this post, I clearly have demonstrated that I can' read posts correctly enough to say anything
 
It would be very interesting to learn about how they have communication provided for.

Each GTX280 is a PCIe 2.0 x16 native chip, so it has serious IO potential, and on the CPU board would be whatever is needed for this function..... x16 by 4 to fiber? Many GB/s potential.

My point behind the comments is to say the same people using the GTX280 instead of Cell could produce an even faster machine.... so I assume they'd still be using thier same propietary communications mechanism,etc.

The GTX280 as simply a calculation engine (not 4 video cards), could easily be laid out on one board, with 4 processors, each surrounded by its 1GB of ram in an L, with a connector carrying the PCIe 2.0 signals to the CPU board arranged along the top/bottom edge of the board, and the mosfet/voltage regs along the the back edge.

Im just brainstorming here. Ive designed many highspeed motherboards, array processor systems, and machine vision engines, etc. in my day.

I just find it amazing there is so much power to be found in the technology we have this month. It's alot like the old days of NASA, where developments for other reason end up having major ramifications to our abilities in civilian life.

Here NVidia's (or SONY's) pursuit of better videogames provides the building blocks for orders of magnitude greater super computers.... accidently ;-)

Imagine how much processing power is sitting idle in American homes, that folding/seti/etc only makes a tiny tiny dent in. I think a government incentive program to get all that idle processing power to work would be worth pursuing. 100+ million computers twiddling thier thumbs instead of working out climate models and fusion reactor theory. :cool:

Why would they go with GTX280's when they can just use rv770's that are cheaper and have more FLOPs to play with??

The 4870 will have more than 1 GFLOPs at stock speeds, use 40% less power and cost 1/2 of a GTX280. While those flops may not translate directly into game scores, we don't know yet, in a custom machine like this, you make use of that theoretical power much better, so going with power hungry GTX280's with less Flops would seem stupid. In other words, you can get around 2.5x the GFLOPs at the same price, and roughly the same power usage with rv770's.

Only a fanboy would go on and on about GTX280 this and GTX280 that when it is obviously a less than optimal solution for this kind of thing compared to what ATi will be offering.

Also, GTX280 apparently only has ~78 GFLOPs of double precision power, much less than the ~240GFlops in rv770, which would probably be more useful in these scientific programs being run, and while I'm somewhat ignorant of exactly what that means for this kind of machine, it doesn't sound too great for a GTX280. Especially when you are getting more than 6x the double precision GFLOPs for the same price and power by going with ATi.


Some quotes about the double precision numbers from beyond3d:
Jawed said:
Something I've been told: each SM has a dedicated double-precision MAD unit, so there's 30 in total. That's a surprise, 1/12th of single-precision, way less than I was expecting, 78 GFLOPs :shock:

A HD3650 has about 44GFLOPs DP.
V3 said:
78 GFLOPS is pretty poor peak for the reported size and power consumption of the board. It couldn't even beat the Cell board.
Jawed said:
Sigh, and I said 249GFLOPs for RV770 based on 777MHz, not 750, so 240GFLOPs :oops:

And on top of all of that, the R700 is supposedly being specifically designed to be much more multichip friendly, so it should scale even better.
 
I would love to see just the cooling setup for this thing. I can only imagine the heat generated. They talk about how effecient the computer is as far as power consumption, but imagine the air conditioning bill. Ouch. :p
 
If you pick any number from 1 to 1,000,000,000,000 given that you tell the computer if its right or wrong number then it will be able to find it out in 1 second.
 
If you pick any number from 1 to 1,000,000,000,000 given that you tell the computer if its right or wrong number then it will be able to find it out in 1 second.

Uhmm, I think it is supposed to be 15 zeros, not 12.

Also I think your example is more related to MIPS not FLOPs.
 
Also, that example isn't correct. Programmers don't use that type algorithm where it has to compare every single number to the one you input. It just checks digits.
 
Fanboy? cough.

But by all means, if R700 is better for the job, great. My point was using up to the minute tech for the same application, there is already a good 5X+ performance boost available and it's only been a matter of months since the machine started going on line.

I find that pretty amazing.

On the double precision, they didnt mention what the current machine or Cell cpu's were capable of in the article, so it's a data point in the ether. If the Cell's is 26GFlops and the GTX280 is 78 GFlops, then it would still be a 3X boost apples-to-apples, just as 1.2 TERAFLOPs is 3x the 400GIGAFLOPs mentioned in the article vs the GTX280's claims.

Yes, the cooling would be interesting to see, I'd venture a guess it is just air cooling... 1U HSF on the AMD and each Cell, with a big ass fan in the blade rack moving plenty of air through all the blades. And the whole shabang in a nice cool AC environment.

Recognizing the higher heat output of the video GPU instead, I suggested water cooling and how to do it.... a big aluminum plate sandwiched between processor/gpu boards per blade, and a nice smart quick-connect no-leak interconnect to allow the dual-tri-blade-thingy to be plucked and unplucked from the rack hassle free. Nice cold chilled water pumped throughout all the blade racks keeping everything nice and happy at a healthy overclock :eek::rolleyes::p:cool::D
 
Back
Top