New IBM POWER8 CPU

Status
Not open for further replies.

Red Falcon

[H]ard DCOTM December 2023
Joined
May 7, 2007
Messages
12,381
Looks pretty powerful, and has quite a performance increase over POWER7+ CPUs.

http://www.theregister.co.uk/2013/08/27/ibm_power8_server_chip/?page=1

ibm_power_chip_comparison.jpg


From the article:
With the Power8 chip, IBM has a few goals. First, the company is shifting from the 32-nanometer processes used for the relatively recent Power7+ chips to a 22-nanometer process. The shrinking of the transistor gates allows IBM to add more features to a die, cranks the clocks, or do a little of both.
 
I have 3 Power 520's at work and I despise these systems. One died right at the end of the warranty and the other 2 are just barely hanging on. I wish I could dump them into the ocean. Unfortunately, they weigh like 80lbs and I can't easily drag them to my car.
 
So mainframes get faster and more expensive. Meh
 
Unfortunately, they weigh like 80lbs and I can't easily drag them to my car.
You could always delegate the immensely challenging task of lifting eighty pounds to a small-framed woman in her late 60's :)
 
Oh.

What would a machine running Power 3 or 4 chips have anything to do with an uptime of 1500 days?
 
You could always delegate the immensely challenging task of lifting eighty pounds to a small-framed woman in her late 60's :)

Oh you think you're a strong man now? Wait until you reach my age! (shake ma cane at ya!):)
 
4ghz and "...IBM can deliver a box with 32TB of memory across 384 cores and 3,072 processor threads."

But will it play Crysis?
 
I'm hoping we get our hands on some. The P4/P5 were solid for us. P6 was overpriced. P7s have been good. Moved a lot of stuff to 740s and 750s.

We've still got some ancient stuff like one server running AIX 4.3.2 that just won't die and people are still using it, sadly. Uptime on that thing is 2027 days as of right now.
 
how are system updates on AIX done? no reboot needed?
 
^ That's some impressive uptime!

The problem is, it's running AIX 4.3.2 so we're unsupported. It's on ancient hardware so we have to hope that nothing goes wrong and that IBM can get replacement parts if something does go wrong.

We can't wait to get off that thing. The difficulty is in getting the app teams to move their jobs to/from the mainframe off this server. Most have but there's at least two apps still using it.
 
Or these systems haven't been updated in the last 1800+ days, or rebooted - the hardware of the Power3/4 have been quite stable, Power 6 had it's issues, Power 7 7+ have been good.
 
A lot of those old systems were pretty solid. The one I listed is a B50 (PPC 603e) now at 2034 days.

And no, no one's updated it. No one wants to touch it if at all possible.
 
4ghz and "...IBM can deliver a box with 32TB of memory across 384 cores and 3,072 processor threads."

But will it play Crysis?

Probably not. Doubt it would even play Minecraft. Not being sarcastic. These things are supercomputers and like workstations they can't game. Sure, if you could get it to run with a standard PCIe Nvidia or AMD HD series or GTX series, sure it might play those games if they support this architecture.


WHAT? 96MB L3 Cache??? and 230GBper second of sustained memory bandwidth!!!
Woow
5 years from now, we're gonna laugh at POWER8's performance at this rate of advancing in development and technology

Doubt that. Maybe in new supercomputers, but I don't think we will laugh at it for another 10 years or so. This is what, 100 times as powerful as the average i7? We laugh at Pentium 4s, but if a CPU is even close to as 100 times as powerful as it it's pretty damn good.
 
WHAT? 96MB L3 Cache??? and 230GBper second of sustained memory bandwidth!!!
This is weird. I remember the old POWER6, IBM claimed it had 240GB/sec bandwidth or so (which was unrealistic back then). Digging around a bit, it turned out that IBM had added all bandwidth in the cpu: L1 cache bandwidth + L2 cache bandwidth, etc. So why is IBM now writing that POWER6 only has 30GB/sec bandwidth? Has IBM marketing people changed their mind?

BTW, you can not add all bandwidth in a chip. If there is a bottleneck of 1GB/sec somewhere, then the maximum allowed bandwidth can not exceed 1GB/sec. So, it is really wrong to add all bandwidth in a chip. If you do that, you have not really understood anything about bandwidth.


And 12 threads per core!
Yes, just similar to the SPARC cpus. Back in the POWER6 days (dual core, 5GHz), IBM claimed that the only way forward into the future, is to have 1-2 strong cores at very high clock speeds. Because databases (which is the heart of a company's business) runs best on strong cores. So therefore, IBM said, the Sun SPARC Niagara with many lower clocked cores, running lot of threads was a foolish thing and IBM mocked the Niagara design. Back in the days, everybody had 1 or 2 cores running 1 or 2 threads each. At that time, Niagara SPARC was released with 8 cores running 4 threads each = 32 threads per cpu. That was radical and unheard of a decade ago. IBM said that future POWER cpus would run at even higher clock speeds, 7-8GHz or so, so they could power databases even better. By using many cpus, IBM would get many threads too. So IBM would run many cpus each having 1-2 strong cores. That was the future, IBM said.

Today, the IBM has no more 5GHz or 7-8GHz cpus. Today, the IBM POWER7 and POWER8 are remarkably similar to SPARC cpus, which has 16 cores each running 8-12 threads each. IBM finally realised that if you clock a cpu too high, there will be diminishing returns, and you break the wattage budget. The only way forward is the SPARC way: many lower clocked cores so you keep the watt budget down.

A cpu uses the wattage as:
watt = Hz * Volt * Volt
this means that to keep wattage down, you need to keep Hz down. 7-8GHz cpus are out of the question. Instead, it is better to go lower clocked cpus. But to compensate for performance, you use many more cores instead. Just what SPARC did decade ago, and IBM and Intel and the rest have realised recently and are playing catch up now. From single high clocked core with 1-2 threads, to multiparallellism with many cores and many threads. The transition that SPARC inititated a decade ago, and IBM follows now.
POWER6 had 2 cores
POWER7 had 8 cores
POWER8 has 12 cores.

SPARC Niagara started at 8 cores decade ago, and has had 16 cores for years.

I wait for IBM marketing department to claim that the POWER8 is unique and revolutionary designed and having many lower clocked cores is the only way forward. :)


This is what, 100 times as powerful as the average i7? We laugh at Pentium 4s, but if a CPU is even close to as 100 times as powerful as it it's pretty damn good.
Well, this POWER8 is up to 3x faster in theory than the POWER7, according to IBM. And as one high end Intel Xeon CPU is faster than the POWER7 today, it means that the POWER8 will be only 2-3x faster than a high end Intel Xeon cpu when it will be released in 1-2 years or so. Meanwhile, Intel is not resting and Intel's new Haswell Xeon E7 cpus will rock the boat.

BTW, todays current SPARC T5 is ~2.4x faster than the POWER7 in some benchmarks today, in practice, not in theory. And the next year the SPARC T6 will arrive, which is 2x faster than the SPARC T5. Every SPARC iteration aims to be at least twice as fast as the previous. And in three years, a SPARC server will arrive, with 16.384 threads, and 64TB RAM. 16.384 threads sounds sick today, just as 8 core cpus was sick a decade ago, but in a decade from now, every Unix vendor will sell servers with 1000s of threads, after they have catched up on SPARC. Next year, the Oracle M6 will be able to use Bixby to create 96-socket SPARC servers. IBM's largest Unix server has 32 sockets. If you want to run fast databases, then it is the SPARC and Oracle highway.
 
Instead, we just run RAC clusters on Linux along with some data warehouse appliances that are being put in to replace our AIX data warehouses. We're all mildly curious to see how that ends up going.

Solaris and AIX are now being phased slowly out. Existing applications can stay but new ones will have to justify why they can't run on Linux. Much as I'd like to see some new SPARC systems, the T4-1 servers we have may be the last of the SPARC chips. We may see some P8s in the future but it will probably be relatively few.
 
what always wonders me is absolute lack of any benchmarks of these exotic processors. I can't even find in google single result of dhrystone on POWER7 :eek: If these processors run Linux then there should be zero problem to compile and run open-source benchmark suites, yet no one, absolutely no one does that, not for POWER, not of SPARC, not for Itanium, etc

so what normal ordinary folk like myself have to think about power of such expensive platforms?
database benchmark may be all what matters for server platforms but that don't tell us anything about normal performance of these CPUs in various programs, so all in all we don't even know if those are better than Intel X86 offerings or not...

I wouldn't be that much surprised if normal 6-core i7 would beat hell out of this POWER7/8/SPARC things on normal programs...
 
I wouldn't be that much surprised if normal 6-core i7 would beat hell out of this POWER7/8/SPARC things on normal programs...

why would anyone be running "normal programs" on a power series processor?
 
what always wonders me is absolute lack of any benchmarks of these exotic processors. I can't even find in google single result of dhrystone on POWER7 :eek: If these processors run Linux then there should be zero problem to compile and run open-source benchmark suites, yet no one, absolutely no one does that, not for POWER, not of SPARC, not for Itanium, etc

so what normal ordinary folk like myself have to think about power of such expensive platforms?
database benchmark may be all what matters for server platforms but that don't tell us anything about normal performance of these CPUs in various programs, so all in all we don't even know if those are better than Intel X86 offerings or not...

I wouldn't be that much surprised if normal 6-core i7 would beat hell out of this POWER7/8/SPARC things on normal programs...

IBM lists their systems on SPEC's benchmarks. Here's a POWER7 CPU. Oracle and Fujitsu list their SPARC-based MX00 servers. Here's the list.

Outside that, you'll normally see application-related benchmarks like TPC (TPC-C comes to mind immediately) being touted more than anything. These become more important than than the benchmarks you're looking at.
 
why would anyone be running "normal programs" on a power series processor?

Seriously. I wouldn't even know where to begin using one of those beasts. Just the density in a rack makes my mind spin.
 
what always wonders me is absolute lack of any benchmarks of these exotic processors. I can't even find in google single result of dhrystone on POWER7 :eek: If these processors run Linux then there should be zero problem to compile and run open-source benchmark suites, yet no one, absolutely no one does that, not for POWER, not of SPARC, not for Itanium, etc

so what normal ordinary folk like myself have to think about power of such expensive platforms?
database benchmark may be all what matters for server platforms but that don't tell us anything about normal performance of these CPUs in various programs, so all in all we don't even know if those are better than Intel X86 offerings or not...

I wouldn't be that much surprised if normal 6-core i7 would beat hell out of this POWER7/8/SPARC things on normal programs...
The SPARC Niagara cpus are server cpus. That means the servers serve 1000s of clients handling huge amount of data and I/O. This means the cpu cache can never fit in 1000s of clients worth of data, and the kernel, and the database binaries, etc - the cpu cache would need to be several GB big to be able to cache all client data. Therefore, SPARC cpus are targeted at throughput, handling large amounts of data at the same time. Some companies had 50x performance increase when they tried the first Niagara T1 iteration at 1.4GHz compared to the x86 servers they had back then - because they had high throughput workloads. 1-2 core x86 had very bad throughput back then, but the x86 cores were individually stronger.

So there is a big difference to desktop cpus and server cpus. A server cpu should have many cores and many threads so it can handle many clients, the cpu cache size is not that important for a server cpu, because a server cpu will never be able to fit in all client's data into the cache. No matter how big the cpu cache is, a server cpu will never be able to cache all client's data. So basically, you could have a cpu cache of size 1MB or so - which is what the Niagara T1 has.

A desktop cpu serves only one user, who runs only a few programs at the same time. So it would need only a few high clocked cores, so the few programes running can complete fast. The working data set will fit into a cpu cache, so the cpu cache size is relatively important.

If we study the IBM POWER6, it has very few cores, and very high clocked, and is heavily cache dependent. This is all characteristics of a desktop cpu. This means it would have bad throughput and serve many clients badly. So, if you look at the SIEBEL v8 benchmarks, you need 14 (fourteen) POWER6 running at 5GHz to match one single SPARC server T5440 which has four Niagara 1.6GHz. So you need 14 x 5GHz = 70GHz worth of aggregate clock speed to match four SPARC Niagara 1.6GHz cpus. This shows that a desktop cpu is not really fit for server workloads. Of course, for number crunching that runs a small loop over and over again, which fits into the cpu cache - the POWER6 is excellent. But that is not a server workload.

Studies from Intel shows that a x86 cpu used as a server, under full load, maximally loaded, waits for data 50% of the time because of cache misses. That means the x86 cpu idles for 50% of the time when fully loaded. Because a server workload tries to serve many different clients at the same time so the working set will never fit into cache, so it will constantly ask for new data from RAM - so there will be lot of cache misses. This the reason a cpu has bigger and bigger caches, complex prefetch logic, etc - to try to minimize cache misses. But that is not possible, there will always be cache misses for a server cpu - the only way to avoid cache misses would be if the cpu would have ESP and could foresee the future. Hence, a server cpu will always have cache misses, because CPUs are much faster than RAM. RAM is slow, CPU is fast - so the big difference in speed results in cache misses. And the higher the server cpu is clocked, the more it has to wait for data, so a 5GHz POWER6 cpu maybe idles for 60-70% of the time, waiting for data under full load?

A desktop cpu might fit the small working data set into cpu cache, so it will run faster at higher clock speeds. So, you would like to clock them higher and higher, just what IBM did of the POWER6 cpu.

However, the secret of the SPARC Niagara is that it idles 5-10% of the time, under full server load. No other cpu can do this. This is unique. It does this by having many threads, and as soon a thread stalls, it switches to another thread on one clock cycle and continues to run while waiting for data. It does not try to avoid cache misses, because avoiding is impossible, and it also has 1MB cache and runs fine on this. It does other work while waiting for data, it never idles. When a x86 cpu switches thread it takes 100s or even 1000s of clock cycles. So the x86 runs in bursts: waits, and runs, and waits, etc. The Niagara SPARC never waits, it has always work to do. So it is not strange why a low clocked Niagara SPARC cpu can outclass many times higher clocked cpus. But it has a new radical design.

Now finally IBM seems to have understood this and tries to mimic the SPARC Niagara cpus which has many cores and many threads. This POWER8 has 12 cores and several threads, just like the Niagara SPARC cpus. I wonder what happened with IBMs mocking of the Niagara design, and "the future is strongly clocked cores at 7-8GHz"? For several years IBM had worlds fastest cpu, because Sun Microsystems did not have the resources to do heavy research. But Oracle has heavy resources and are betting heavily on SPARC. This year, when SPARC T5 was released as the worlds fastest cpu, IBM could not beat it, so IBM's response to it was:
http://www.forbes.com/sites/oracle/2013/04/02/big-data-performance-speed-cost/
"...And while it’s not exactly a big surprise that IBM would try to downplay the independent SPARC T5 benchmarking tests showing Oracle’s outperformance, IBM’s rationale that businesses don’t care about speed was startling. From the WSJ article: “[Performance] was a frozen-in-time discussion,” Parris said in an interview Wednesday. “It was like 2002–not at all in tune with the market today. Companies today, Parris argued, have different priorities than the raw speed of chips.”

Of course, if POWER8 turns out be the fastest cpu in the world, IBM would boast a lot and claim that performance is the most important factor that customers considers. :)

So, to answer your questions: these cpus are not meant to run on desktops. They will not run Crysis or something like that. They have extreme throughput, but might fail to run a single program very fast. Instead, they run lot of different programs fast at the same time. An x86 cpu chokes when you try to start many programs on it, but server cpu does not have that difficulty.

So, what is "normal programs"? These are server cpus and runs "normal server workloads" serving many clients at the same time, just fine.

BTW, there are lot of benchmarks. For instance, the cpu SPECint2006 and SPECfloat2006 raw benchmarks are available. Here is a comparison of specint of an x86 to a SPARC T5 to a IBM POWER7, and many other benchmarks too, if you look a bit:
https://blogs.oracle.com/BestPerf/entry/20130326_sparc_t5_speccpu2006_rate

EDIT: added "fourteen POWER6 matches four SPARC Niagara cpus in Siebel v8 benchmarks"
 
Last edited:
X86 is not inherently bad for the server space. It's simply that it grew up in the world of desktops where memory bandwidth and disk I/O was not a big deal. Only with the advent of SSDs is anyone talking about I/O in the x86 world. Look a Sandy E and Ivy E. Intel understands that they need general system throughput to match the CPU if they want to be a part of the server world.
The real benefit of the power systems is their throughput. The sparcs have their crypto acceleration and x86 has it's cost. Everyone has a different leg up.
 
x86 was inherently bad for the server space, because it only had 1-2 cores. A server typically serves many clients, 1000s of clients. To do lot of things simultaneously, you need many cores and many threads. It is a bad thing to have only 1-2 strong cores because such a cpu design does not allow many clients simultaneously. For isntance, if you compare the multi tasking ability of 1 core cpu, to a quad core cpu - which one is the best? With a quad core, you can game and download bittorrents and copy big files all at the same time without loosing much performance, your fps will stay the same. But if you try to do all of this one 1 core cpu, it will grind to a halt. The game will occupy the whole cpu and the other tasks will lag a lot.

Thus, to serve many clients, as servers do, you need many cores and many threads. A high troughput cpu. You dont want a cpu which has strong cores and finishes one task very quick. Because the other clients need to wait until their turn. They will loose patience. Instead you need a cpu that can serve many clients simultaneously.

Earlier, the x86 was single core, or dual core. It is only quite recently x86 has gotten 10 cores or more, but still each core has 2 threads each. So they are still not really high throughput, and not that perfect for server workloads.

This also applies to POWER cpus. So, no, the real benefit of POWER systems is not their throughput. Instead, they have strong cores, but not good throughput. For instance, if the POWER6 clocked at 5GHz can fit in the workload into it's cpu cache (typically desktop workloads), it will run quite fast, actually. The POWER6 is a good desktop cpu, but it can not serve many clients, so no, it is not a high troughput cpu. It is a low throughput cpu, strong at a single task but bad at many tasks. Typical desktops.

The POWER7 has eight cores, with only 4 threads each. That is better than dual core POWER6 cpu. But still not good. The POWER8 finally with many cores and many threads, has a design that is very similar to SPARC Niagara cpus, and thus the POWER8 seems suitable for server workloads.
 
X86 is not inherently bad for the server space. It's simply that it grew up in the world of desktops where memory bandwidth and disk I/O was not a big deal. Only with the advent of SSDs is anyone talking about I/O in the x86 world. Look a Sandy E and Ivy E. Intel understands that they need general system throughput to match the CPU if they want to be a part of the server world.
The real benefit of the power systems is their throughput. The sparcs have their crypto acceleration and x86 has it's cost. Everyone has a different leg up.

In all honestly, though, Intel doesn't necessarily need to try that hard to compete. Linux and Windows on Intel are going to dominate the server world and what can't be done with a handful of Windows or Linux Intel servers will be done with... even more Windows or Linux Intel servers. Companies see the price of pSeries and they shy away. Ditto Sun servers. That on top of the companies shying away from Sun after Oracle bought them.

IBM and Oracle have to do whatever they can to compete because people will happily buy cheap Intel-based systems running Windows and Linux because it's inexpensive.

And cheap.
 
Agreed that Linux and Windows are cheap. As computers get more powerful, a 4-socket x86 server suffices for many needs. Long time ago, you needed a 32-socket Unix server for the same workload, but not anymore. So the low end is getting more powerful, and most companies only need low end servers.

But in the high end, there are only Unix and Mainframes. No Linux nor Windows exist in the high end segment. If you need 16-socket or 32-socket servers you must go to Unix or Mainframes. There has never been sold a 16-socket or 32-socket Linux server suitable for SMP work. Sure, there are big SGI servers with 1000s of cores, but they are clusters and only suitable for number crunching HPC workloads. No one runs Enterprise SMP workloads on them, because you can not. Larger Enterprise SMP servers with more than 8-sockets are all Unix or Mainframes. Typically, you run a big database configuration on a SMP server with as many as 16 or even 32 sockets. As there has never been sold such a large Linux server, you can not use Linux for high end Enterprise workloads (databases, etc). The largest Linux SMP server has 8-sockets, and is just a normal IBM or Oracle x86 server.

Sometimes you do still need 16 or 32 sockets, then you need to go high end: Unix or Mainframes. So Unix will never die, until Linux can handle more than 8-sockets. Until then, Linux will be for low end (up to 8-sockets), and Unix for high end up to large 32 socket servers.

With that said, if you need the highest performance in a single server (which you do sometimes, when running SMP workloads, typical for Enterprise sector) you must choose Oracle or IBM. No Linux vendor can offer anything in the high end segment, because there are no such large Linux servers for sale, and has never been. So, yes, Linux eats into low end, Unix and Mainframes is still untouched in the high end segment with 100% market share.

EDIT: Actually, I know of one large Linux server with 16-sockets, but it is new and has not been offered on the market for too long. And the performance is abysmal. So, Linux has nothing to offer in the high end segment.
 
^ Linux is used in the majority of the world's supercomputers, which have far more than 16 sockets. ;)
Also, since when has Linux been limited by 8 sockets? I think you're thinking of Windows.

Replace "Linux" with "Windows" in everything you just said, and you'd be 100% correct.

Even the top 10 supercomputers in the world are all using Linux: http://en.wikipedia.org/wiki/TOP500

Unless I'm wrong, do you have any links to backup what you're saying about Linux and >8 socket systems and mainframes?
Or at least elaborate on why Linux isn't good with >8 socket systems?

I'm not sure why the number of sockets would hold Linux back since it scales with multiple processor cores (and sockets) perfectly fine.
 
Last edited:
^ Linux is used in the majority of the world's supercomputers, which have far more than 16 sockets. ;)
Also, since when has Linux been limited by 8 sockets? I think you're thinking of Windows.

Replace "Linux" with "Windows" in everything you just said, and you'd be 100% correct.

Even the top 10 supercomputers in the world are all using Linux: http://en.wikipedia.org/wiki/TOP500

Unless I'm wrong, do you have any links to backup what you're saying about Linux and >8 socket systems and mainframes?
Or at least elaborate on why Linux isn't good with >8 socket systems?

I'm not sure why the number of sockets would hold Linux back since it scales with multiple processor cores (and sockets) perfectly fine.

Those supercomputers are usually running software designed for that type of computing. The 3D rendering software of old was like this as well, where DEC Alphas at high clock speeds were quite impressive. What was being talked about with SPARC T-series processors was on the handling of much larger numbers of threads.

That said, I too am curious about more information on this.
 
Well brutalizer, whenever you get back, we would appreciate your answer to our questions. :)
 
Status
Not open for further replies.
Back
Top