Sun or Oracle Sparc processors

Define "business workload".
Your descriptions are very very broad.

I've seen x86 blades handle "business workloads" in full production, and on 4-5 year old E7 CPUs.
Really, give us some applications that SPARC absolutely excels at that can't be done with a large cluster of x86 systems.

Yes, modern SPARC CPUs have lots of cores, which individually, are much "weaker" cores than x86 or POWER/PowerPC.
I know that SPARC excels at databases, and queuing up lots of smaller tasks, almost akin to how modern GPUs work, only with a bit more general purpose and less processing power than GPGPUs.

So other than Oracle databases, please give us some examples.
So you mean that NVIDIA is doing well because the HPC market has saved them? That does not sound reasonable. Do you have links?
I need to give you links to this? Seriously? Have you been hiding under a rock since the G80 GPUs emerged back in 2006??? :confused:
Really, I'm not trying to be insulting, but this is general/common knowledge that is in the news on a daily basis.

Just search for "NVIDIA deep learning" and that should give you enough info to answer your question.
Yes, the HPC market has absolutely saved and propelled NVIDIA, and has for nearly the last decade.
 
They care about the Enterprise business server market, that is where all the money is. That is the sh-t.
Intel's server business gets ~50% margins on a market size of approximately $50 billion. Their *profit* in x86 servers is larger than the entire RISC *revenue* size combined. The vast majority of the money is in x86 servers.

IBM and Oracle might not make a whole lot of margin on x86 processors, but Intel certainly does.

All these you mentioned, are clusters. And sure, for clusters x86 is fine because you only need up to a few sockets, but lot of these compute nodes. They need to be cheap.
Right. And the majority of server applications, including many enterprise such as data centers, perform better on state of the art x86 processors for less money. They're cheaper and they're faster for these applications, which is why almost everyone uses them.

Quantity before quality. And that is exactly x86.
Intel's x86 architecture dominates in performance in all but the most niche applications. It's quality and quantity. You're equating SAP performance with processor "quality", while ignoring the other 95% of workloads where x86 is far better.

OTOH, the Enterprise business is high margin and that is where the big money is.
Again, there's far more money in server x86. Intel designed their architecture to win in 90% of the market, and they do so with industry leading margins.

IBM and Oracle know they can't compete profitably here against Intel, so they've gone for the remaining 10%.

All the server vendors are trying desperately to get into the high margin business market. And that market is dominated by Unix 16/32 socket and Mainframes. There are no clusters here.
Server vendors want to move into a less competitive market where they don't have to compete against Intel and dozens of other server vendors. Meanwhile, the end-customers are moving in the opposite direction towards commodity x86 solutions.

Look at the market data I shared earlier. It's becoming increasingly niche.


Nobody would want to do scientific calculations on expensive Unix servers when you can get cheap x86 clusters. The point is that scientific calculations are embarassingly parallel workloads, so they run fine on clusters.
Agreed. And Xeons have better performance in coarse grained parallel or few-threaded workloads. Their execution pipeline, OOE, caching, branch prediction, is the most advanced in the market.

That is why you use clusters. OTOH business workloads can not run on clusters, they can only run on large servers.
You are yet again falsely equating business workloads with the workloads that SPARC wins. The majority of businesses do things other than run SAP, and this is why the majority of server sales go to Intel, and why Intel makes the most money in the server market.

Of course could you use large Unix servers to do HPC calculations as well, but that would not be economical. You could probably buy 100s of compute nodes for the price of one 16-socket Unix server.
Right. SPARC and Power are awful for HPC. They cost more and are slower. Clearly a win for Intel.


It represents the high margin market. That is what counts.
Profit is what counts. Intel makes the most profit by capturing 90% of the server market, and they do so with margin above the industry average.


Not worth the effort. Sun Microsystems did choose to exit the HPC market. It was too volatile.
The fact that Sun couldn't compete in HPC is irrelevant to the fact that it is a larger market than RISC and other customer architectures.

Yes, but a cluster can never replace one scale up server when we talk about business workloads.
Cluster interconnect latency is going down, bandwidth is going up, fabric topologies and infrastructures are getting more advanced. The difference between a many-socket system and a closely coupled cluster will continue to shrink. This is where the industry is investing their money.

You dont need to accept it. Technology decides that business workloads can only be run on single scale-up servers with 16 or 32 sockets - if you need extreme performance.
Right, and the market has decided that they don't need 16 and 32 socket servers except for in a very limited number of applications.

What market data do you speak of?
The market share data I shared earlier. The majority of revenue *and* profit is in x86. The majority of businesses are buying x86 servers.

Nobody has succeeded until a few months ago.
So your entire argument is out of date then. Apparently x86 is competitive in the niche 16 socket applications that you're talking about, and presumably if there's a business case for it we'll see 32 socket x86 platforms in the near future.

Unix will scale the sh-t out of any x86 server.
Many UNIXs run on x86. They're two independent concepts.
 
Oracle's SPARC servers are increasing performance >100% every generation. Is that not dedication to tech? That is why Oracle has the fastest servers on the market soon.

Fastest in a workload representing like 5% of the market. Not nearly as impressive with that caveat.

It's so easy to selectively pick out benchmark statistics. I can do the same for Intel:
  • Encryption performance increased by 600% with the addition of AES instructions
  • Video encoding performance increased 500% over a couple generations of Quick Sync
  • LINPACK scores improved 200-300% with the addition of AVX2 instructions.
  • Some parallel workloads scale almost linearly in core count. 8->16 cores, that's almost 100% improvement.

I could cherry pick lots of other selective improvements.
 
Define "business workload".
Your descriptions are very very broad.
Scale-up workloads that are not embarassingly parallel. For instance, large database instances. SAP business software. I.e. things that are not suitable to run on a cluster. Clustered workloads are typically scientific computations, i.e. stuff that fits into the cpu cache and run in a tight for loop, over and over again on the same grid points. OTOH, business source code tend to branch all over the place, so the cpu must go out for main RAM all the time.

Really, give us some applications that SPARC absolutely excels at that can't be done with a large cluster of x86 systems.
For instance, large databases. SAP. Enterprise business workloads.

Yes, modern SPARC CPUs have lots of cores, which individually, are much "weaker" cores than x86 or POWER/PowerPC.
The new SPARC M7 does not have weak cores. It has a new generation cores called "S4". The earlier cores where called S1,S2 and S3. The S3 has the ability to dedicate all threads in one core, to a single thread. I.e. it has one single strong thread, or many cores. On the fly it is decided. For instance, POWER7 can do something similar but you have to reboot and reconfigure the server.

So other than Oracle databases, please give us some examples.
Large instances of databases. SAP.

I need to give you links to this? Seriously? Have you been hiding under a rock since the G80 GPUs emerged back in 2006??? :confused:
Really, I'm not trying to be insulting, but this is general/common knowledge that is in the news on a daily basis.
Well, this is not common knowledge to me. I know that NVIDIA has like 80% of the mass GPU market. So I thought that was why NVIDIA is more profitable than AMD. But you claim it is not the the mass market, it is the HPC market?

Just search for "NVIDIA deep learning" and that should give you enough info to answer your question.
And what has deep learning to do with NVIDIA? Deep learning is a very small nische market that I doubt would bring in the big money to NVIDIA.

Yes, the HPC market has absolutely saved and propelled NVIDIA, and has for nearly the last decade.
So that NVIDIA owns 80% of the GPU market is irrelevant? To me this sounds like:
-Dont you know that Intel is profitable only because of the HPC market? (That Intel owns the x86 market is irrelevant?)
 
Intel's server business gets ~50% margins on a market size of approximately $50 billion. Their *profit* in x86 servers is larger than the entire RISC *revenue* size combined. The vast majority of the money is in x86 servers.
True.

IBM and Oracle might not make a whole lot of margin on x86 processors, but Intel certainly does.
IBM and Oracle are doing high margin business, Intel is not. Intel is making up the profit by sheer numbers, quantity. IBM and Oracle is doing quality. The margin on large Unix servers are sky high. Intel has much much much lower margins. It seems that you believe that Intel's low margin of... 10%(???) is good, whereas IBM and Oracle maybe has something like >50% margin or so. Intel sell many cpus in the $400 mark, whereas IBM and Oracle sells cpu for $10-20.000 or so. There is a reason IBM and Oracle is branded as way more expensive than x86. It's because they do high margin business.

Right. And the majority of server applications, including many enterprise such as data centers, perform better on state of the art x86 processors for less money. They're cheaper and they're faster for these applications, which is why almost everyone uses them.
True that cheap x86 servers has come a long way and are fine for smallish tasks.

Intel's x86 architecture dominates in performance in all but the most niche applications. It's quality and quantity. You're equating SAP performance with processor "quality", while ignoring the other 95% of workloads where x86 is far better.
Where is x86 better? In the business workloads, SPARC/POWER is faster. In computations POWER is better. x86 has barely 400 gflops/sec, whereas POWER8 has above 400Gflops. SPARC XIfx has 1.100 gflops. Have you seen the benchmarks with POWER8? It dominates x86. SPARC dominates on large business workloads such as SAP, databases, etc.

Sure, there are many cheap x86 servers in the datacenters, but they do virtualization, etc. They dont run large business software tackling extreme workloads. And for this highly lucrative nische (which is small - true) there are no x86. x86 does not exist. All large workloads are run on POWER/SPARC servers.

Again, there's far more money in server x86. Intel designed their architecture to win in 90% of the market, and they do so with industry leading margins.
What is important to IBM and Oracle is margin. Not profit. HP thinks this too, as their CEO Meg tried to sell of the x86 division because it was too low margin. They fought tremendously for their measly 5% margin or so, on x86. Not worth it, HP concluded. Just like IBM and Oracle.

Low margin business is not sound. If you are a mere 5% from extinction, no - that is not good. You need some margin for bad times. If you have a company with 5% margin on everything it sells but they have a large market - and if you have a company with 30% margin with a smaller market - then the stock market will look more favorable on the smaller company because it is in better shape for the future. If sales tank a quarter, the large company needs to sell off assets, and if the sales does not recover for a prolonged time, they will go bankrupt. Low margin business is too unstable for a company to rely on.
http://www.bloomberg.com/news/artic...ibm-server-unit-for-2-3-billion-amid-pc-slump
"...While [IBM] will continue to sell a range of higher-end servers and mainframes, offloading the x86 division removes a low-margin business from its books....“We were no longer in a position to get the kinds of returns that we wanted,” Steve Mills, senior vice president of software and systems at Armonk, New York-based IBM, said in an interview. “We wouldn’t do this if we didn’t see the obvious taking place in the market...”

What is most preferable and sound you think, to have a high salary and large mortgages on your house so you only barely keep your nose over water, or a lower salary and much smaller mortgages so you have a very good margin so you can cope with problems without problems, like you loose your job? Have you heard about the subprime house crisis that sank the entire USA in 2008 and plunged the world into a financial crisis? The common people could not cope with higher costs so everything tanked, despite the subprime market being very large with many billions. With low margin you are just a step away from very bad things. Everybody is trying to offload their low margin x86 division.

http://www.businessinsider.com/what-hp-looks-like-without-its-pc-unit-2014-10?IR=T
"...Margins for PCs are slim, only about 4%...."

That is why HP, which is the planets largest PC vendor, wanted to sell of it's x86 division worth $56 billion.
http://www.businessinsider.com/hp-is-planning-to-split-2014-10?IR=T
"...The PC and computer segment is massive for HP. It accounts for half of the company's revenue. For the first six months this year, it reported $27.8 billion in revenue. That's about three times the size of HP's next biggest unit, the Enterprise Group, which makes servers, storage, and network hardware...."

HP has to work so hard for the measly 4% margin, factorys, supply chain, R&D, storage, etc etc etc. It is not simply worth it, HP said. But in the end HP changed their mind and kept their low margin stinking x86 division. But it is not the first time HP talked about getting rid of the x86 division.

Wayne Gretzky said "I skate to where the puck is going to be, not where it has been". x86 margin is shrinking for every year. And now it is 4%. In a few years it will be 3%. As IBM said: “We wouldn’t [sell our x86 divsion] if we didn’t see the obvious taking place in the market...”

IBM and Oracle know they can't compete profitably here against Intel, so they've gone for the remaining 10%.
IBM and Oracle can go for the lucrative 10% which Intel does not can. Who wants to do a 4% margin business?

And Xeons have better performance in coarse grained parallel or few-threaded workloads. Their execution pipeline, OOE, caching, branch prediction, is the most advanced in the market.
POWER8 is faster in every benchmark, I suspect. POWER8 has higher gflops. So I dont see how Xeons have better performance than POWER/SPARC?

You are yet again falsely equating business workloads with the workloads that SPARC wins. The majority of businesses do things other than run SAP, and this is why the majority of server sales go to Intel, and why Intel makes the most money in the server market.
SPARC is designed to do large scale business workloads, that is why SPARC is faster on that kind of workload. Even if Intel makes more money totally on the server market, that is not that important as a whole. Intel market cap is $139 billion, IBM is $142 billion and Oracle $159 billion so they are bigger. So, it seems you can do good business if you exclusively go for high margin business.

Right. SPARC and Power are awful for HPC. They cost more and are slower. Clearly a win for Intel.
Seriously, I dont understand why you believe this? POWER8 has higher gflops than any Xeon, and SPARC XIfx has 1.100 gflops. The next generation Xeon will probably increase performance 10% or so, so maybe Xeon will reach 440 gflops in the next generation?

Profit is what counts. Intel makes the most profit by capturing 90% of the server market, and they do so with margin above the industry average.
Margin is what counts. Compare to Oracle and IBM market cap. I hope you dont believe that Intel selling the vast majority of their cpus for $400 is higher margin than RISC cpus for $10-20.000?

The fact that Sun couldn't compete in HPC is irrelevant to the fact that it is a larger market than RISC and other customer architectures.
Sun focused on the business enterprise market.

Cluster interconnect latency is going down, bandwidth is going up, fabric topologies and infrastructures are getting more advanced. The difference between a many-socket system and a closely coupled cluster will continue to shrink. This is where the industry is investing their money.
No it isnt. The big money is in large business servers. Again, IBM sells a few 100 mainframes per year, and mainframes account for something like 15% of IBM's total huge revenue. If we compare to SGI which exclusively dabbles in the HPC market, SGI has a market cap of $0.15 billion. So, it seems that large HPC clusters such as SGI UV2000 is a very small market compared to large business servers.

Right, and the market has decided that they don't need 16 and 32 socket servers except for in a very limited number of applications.
True. x86 is getting stronger so today a 4 or 8-socket x86 gives plenty of power. You dont need a POWER/SPARC for most of the workloads today. But for the largest workloads, you have no other choice than POWER/SPARC. There are no x86 servers that can tackle large business workloads. And these large business workloads, brings in looooot of money.

The market share data I shared earlier. The majority of revenue *and* profit is in x86. The majority of businesses are buying x86 servers.
True. But x86 is low margin with 4% so nobody wants to touch it and tries to dump it. x86 is a far cry from being a loss business.

So your entire argument is out of date then. Apparently x86 is competitive in the niche 16 socket applications that you're talking about, and presumably if there's a business case for it we'll see 32 socket x86 platforms in the near future.
Wrong. The 16-socket x86 servers does not offer the same performance as POWER/SPARC, they scale badly. And what is really important in the large server arena, is RAS. RAS is extremely expensive. x86 does not have that good RAS. For instance, on Mainframes and POWER/SPARC you can routinely replace cpus, RAM etc on the fly. SPARC and Mainframes can reply instructions if they detect an error in the calculation. Some Mainframes have three cpus and all calculations are run on all of them simultaneously, and if any cpu output differ, it is shut down. etc etc. These tailor made Reliability solutions are very very expensive and x86 does not have that kind of technology which has been perfected for decades. So large Unix and Mainframes are much much much more reliable than x86 servers which is a huge selling point in the business Enterprise arena. They prefer a large Unix server, capable of tackling the largest workloads, reliable that very very seldom goes down, to a new unproven unmature 16-socket x86 server.

Many UNIXs run on x86. They're two independent concepts.
True. But I meant that Unix OSes will scale the sh-t out of primary x86 OSes such as Linux/Windows because these desktop OSes have never been run on large servers earlier and need heavy redesign to be able to cope with large 16/32 socket servers. And also, SPARC/POWER will also scale the sh-t out of any x86 server.
 
Fastest in a workload representing like 5% of the market. Not nearly as impressive with that caveat.

It's so easy to selectively pick out benchmark statistics. I can do the same for Intel:
  • Encryption performance increased by 600% with the addition of AES instructions
  • Video encoding performance increased 500% over a couple generations of Quick Sync
  • LINPACK scores improved 200-300% with the addition of AVX2 instructions.
  • Some parallel workloads scale almost linearly in core count. 8->16 cores, that's almost 100% improvement.

I could cherry pick lots of other selective improvements.
Can you show us relevant benchmarks against POWER8 and SPARC? Encryption performance is not an important selling point when selling huge servers. gflops are interesting, and the most common business benchmarks such as the important SAP. Linpack, Lapack, etc.

(I suspect that x86 is much slower on encryption than SPARC as they do it in real time, and SPARC M7 has built in hardware accelerators for that)
 
Scale-up workloads that are not embarassingly parallel. For instance, large database instances. SAP business software. I.e. things that are not suitable to run on a cluster. Clustered workloads are typically scientific computations, i.e. stuff that fits into the cpu cache and run in a tight for loop, over and over again on the same grid points. OTOH, business source code tend to branch all over the place, so the cpu must go out for main RAM all the time.

"Embarrassingly parallel"... you act like parallel processes are a bad thing, or something to be ashamed of.
I've seen a lot of production applications and workloads running full databases with parallel clusters, so it isn't as uncommon as you might think.

Considering almost all HPC and GPGPU functions are massively parallel, I would say that scale-out, not scale-up, is the future; this is just going off of what has happened in the last ten years in technology.

For instance, large databases. SAP. Enterprise business workloads.
This is very, very niche, and fits with SPARC and POWER/PowerPC systems, so that makes sense.

Well, this is not common knowledge to me. I know that NVIDIA has like 80% of the mass GPU market. So I thought that was why NVIDIA is more profitable than AMD. But you claim it is not the the mass market, it is the HPC market?
Actually, it is both.
Not only is it leading in the mass GPU market (Tegra/GeForce/Quadro) but it is also leading in the HPC market (Quadro/Tesla).

GPUs have changed massively from what they were ten years ago.
The G80, or 8800GTX/GTS and their CUDA cores (NVIDIA stream processors) is what changed everything; even the wiki page talks about this.

And what has deep learning to do with NVIDIA? Deep learning is a very small nische market that I doubt would bring in the big money to NVIDIA.
Right now it is, but another ten years from now, it will be huge.
Automated vehicles, robotics, AI, etc. will all utilize this technology.

NVIDIA is already prepping for this with their Maxwell (GTX900 series) GPUs, since they have removed all but a single double-precision FPU on their die, leaving the near-majority of FPU functionality as single-precision, which deep learning takes huge advantage of; also, they did this to not immediately cannibalize their existing Tesla GPU products.

So that NVIDIA owns 80% of the GPU market is irrelevant?
I never said it was irrelevant.
In fact, it is quite the opposite.

Where is x86 better? In the business workloads, SPARC/POWER is faster. In computations POWER is better. x86 has barely 400 gflops/sec, whereas POWER8 has above 400Gflops. SPARC XIfx has 1.100 gflops. Have you seen the benchmarks with POWER8? It dominates x86. SPARC dominates on large business workloads such as SAP, databases, etc.

...and GPGPUs have upwards of 6TFLOPS of SP compute capabilities, and that's just on the consumer GeForce line, not counting Tesla at all.
This is why the world's top supercomputers use x86 processors paired with CUDA GPGPUs.

So what is your point?
If anything, this just shows how niche, specific, proprietary, and in some ways obsolete, that SPARC and POWER CPUs really are for modern tasks.

Wrong. The 16-socket x86 servers does not offer the same performance as POWER/SPARC, they scale badly.
This is why interconnects like infiniband exist, and if one's software is written to take advantage of this, then there is no issue.
Perhaps on a 1:1 scale, SPARC and POWER do have more processing power than x86, but their extreme cost is what kills it.

It is far cheaper and much more cost effective to invest into many x86 systems with interconnects, than it would be to invest in just one SPARC or POWER system, for the same price, and with far less computing power (depending on the software, task, etc.).
Again, SPARC and POWER systems are expensive because they are niche and proprietary.

You are right about the quality of their CPUs compared to x86, but the extreme cost is what kills this argument.

For instance, on Mainframes and POWER/SPARC you can routinely replace cpus, RAM etc on the fly. SPARC and Mainframes can reply instructions if they detect an error in the calculation. Some Mainframes have three cpus and all calculations are run on all of them simultaneously, and if any cpu output differ, it is shut down. etc etc. These tailor made Reliability solutions are very very expensive and x86 does not have that kind of technology which has been perfected for decades.
Your data and info on x86 is 20 years out of date, at least.
We can do all of this on x86 blade servers, and have been able to for years; I did just what you described here back 2012 for crying out loud.

Really, your knowledge on the topic of x86 and GPGPUs is from 1995-2005; things have changed massively since that time period.
 
Last edited:
It seems that you believe that Intel's low margin of... 10%(???) is good
No, Intel's server business margin is just over 50%. Where are you getting this 10% number from? Please provide some data for the numbers you're inventing about Intel's server margin.

, whereas IBM and Oracle maybe has something like >50% margin or so.
What's your source for that number? It seems like you're just guessing. Sun was losing money on the server business so they were operating with negative margins. I understand Oracle is doing better with their business, but I can't find any SPARC specific numbers. Please feel free to cite some actual sources, or alternatively stop guessing numbers and presenting them as fact.

Even so, your guess is basically the same as Intel actually operates at.

Intel sell many cpus in the $400 mark, whereas IBM and Oracle sells cpu for $10-20.000 or so.
And yet somehow Intel has server business margins >50%.

x86 has barely 400 gflops/sec
You're seriously pulling out a naive GFLOPs comparison? By this metric GPUs are the clear winners.

Even on the x86 front we have the Xeon Phi coming in at greater than 1 Tflops, and the next version out later this year is supposed to exceed 3 Tflops.

So if you're going to fall back on a Gflop comparison x86 still wins.

Anandtech said:
single threaded performance at 3.3 GHz (turbo) is no less than 50% higher than the POWER8 at 3.4 GHz. That means that the Haswell core is a lot more capable when it comes to extracting ILP out of that complex code.
http://www.anandtech.com/show/9193/the-xeon-e78800-v3-review/11
And it does so on less power for less money.


In the business workloads, SPARC/POWER is faster.
You've just agreed that data center, HPC, virtualization is better on Intel. These are business workloads. Google's workloads are way bigger than any SAP installation. You keep on conflating business workloads with your limited subset of workloads where SPARC/POWER might be better.


Have you seen the benchmarks with POWER8?
Yes, see my link above.

Look at CPU2006 results. The POWER8 and Xeon are basically the same per-socket, and the Xeon costs a lot less and consumes less power.

Sure, there are many cheap x86 servers in the datacenters, but they do virtualization, etc. They dont run large business software tackling extreme workloads.
Data centers and virtualization are business workloads. Extreme is a subjective term, but in terms of working data size and computational operations required, they can be just as high if not higher than SAP. Again, look at Google, Amazon, Facebook workloads: they are absolutely massive business workloads, and they use x86.

I guess by extreme you actually mean SAP. It's a very selective definition that you're using, that is out of line with the rest of the computer industry.


All large workloads are run on POWER/SPARC servers.
"All large workloads" include data centers, HPC, etc. These run on x86. Your statement is false.

Low margin business is not sound.
Low revenue business is also not sound. It's not clear to me why you're going on about business strategy. I agree that many companies want to move out of competition from Intel into less competitive markets; it's irrelevant when analyzing processor architecture.

IBM and Oracle can go for the lucrative 10% which Intel does not can. Who wants to do a 4% margin business?
And Intel goes for the even more lucrative 90%. And, apparently they're releasing competitive products in the remaining 10% as well.


Seriously, I dont understand why you believe this? POWER8 has higher gflops than any Xeon, and SPARC XIfx has 1.100 gflops. The next generation Xeon will probably increase performance 10% or so, so maybe Xeon will reach 440 gflops in the next generation?
The Gflop comparison is primarily a function of how many cores are on the chip. It's not particularly representative of actual performance. Clusters win on this metric, then GPUs, then the Xeon Phi, followed by the SPARCs, POWER and regular Xeon.

The 10% number you keep quoting is average per-core throughput largely determined by IPC improvements. If for some reason you want to predict the Gflop scaling then you should be looking at core count increases. This is why GPUs do so well.


But for the largest workloads, you have no other choice than POWER/SPARC.
And yet again you're equating "large workloads" with SAP. Data centers are large workloads, HPC is large workloads. Your definition of large workloads is wrong.


Wrong. The 16-socket x86 servers does not offer the same performance as POWER/SPARC, they scale badly.
Are you just inventing these claims? Please provide some evidence.

I literally spent 2 minutes googling large socket count scaling on Xeons and pulled up these numbers. That's approximately 80% improvement as a function of socket count. That is considered very good scaling.

With this scaling, going from 16 to 32 sockets the Xeons would occupy top spot according to SAP

Literally what evidence do you have to suggest that the Xeons don't scale well? Provide specific sources.


And what is really important in the large server arena, is RAS. RAS is extremely expensive. x86 does not have that good RAS.
Another claim without any supporting evidence. What RAS features and metrics specifically does a state-of-the-art Xeon fall short on? Please provide specifics, and examples where this was a factor in a purchasing decision.

True. But I meant that Unix OSes will scale the sh-t out of primary x86 OSes such as Linux/Windows because these desktop OSes have never been run on large servers earlier and need heavy redesign to be able to cope with large 16/32 socket servers.
Great. I never said otherwise. Note that x86 is the most common processor for UNIX operating systems.
 
Can you show us relevant benchmarks against POWER8 and SPARC?
By relevant I suppose you mean the tiny subset of workloads where they are competitive. I've said repeatedly, the majority of business workloads are not those applications. Can you provide any market data that suggests otherwise?

the most common business benchmarks such as the important SAP. Linpack, Lapack, etc.
Citation needed.

Regardless of your lack of evidence,

LINPACK:
The top #2 positions are held by x86 systems.

SAP:
16 socket Xeon benchmark right up there with the 16 socket POWER and SPARC systems. Based on the 8 to 16 socket scaling of a Xeon, a 32 socket system is projected to take the top position. And again, for less money and power.

(I suspect that x86 is much slower on encryption than SPARC as they do it in real time, and SPARC M7 has built in hardware accelerators for that)
What do you think the "AES instructions" are?
 
"Embarrassingly parallel"... you act like parallel processes are a bad thing, or something to be ashamed of.
I've seen a lot of production applications and workloads running full databases with parallel clusters, so it isn't as uncommon as you might think.
It is very difficult to parallelize some workloads. Some workloads are not even parallelizable, it is known they are impossible (aka known as P-complete problems). Typically, business workloads are not parallelizable. As explained by SGI talking about their large Linux servers with 10.000s of cores UV2000 and Altix:
http://www.realworldtech.com/sgi-interview/6/

"....However, scientific applications have very different operating characteristics from commercial applications. Typically, much of the work in scientific code is done inside loops, whereas commercial applications, such as database or ERP software are far more branch intensive. This makes the memory hierarchy more important, particularly the latency to main memory. Whether Linux can scale well with a workload is an open question. However, there is no doubt that with each passing month, the scalability in such environments will improve. Unfortunately, SGI has no plans to move into this [scale-up] market, at this point in time. However, it would be very interesting to see how the low latency Altix systems would perform with commercial workloads...."

Only recently, SGI has finally released the 16-socket UV300H x86 server which is designed to tackle scale-up business workloads. Scale-out clusters can not do that. Embarassingly parallel workloads are very specialized and typically only fit for scientific calculations, as explained by SGI.

The problem is that large business servers, serve many clients simultaneously, as one client does something, the code jumps around wildly in the source code. Maybe the user is doing some accounting, and next he does something else. The business code branches off wildly, making clusters useless. You need a large scale-up server to tackle such workloads.

Considering almost all HPC and GPGPU functions are massively parallel, I would say that scale-out, not scale-up, is the future; this is just going off of what has happened in the last ten years in technology.
People would really like to go scale-out, but most business workloads are not parallelizable - or very very tricky to do. For instance, databases running over many nodes - very very difficult to guarantee data integrity and synch between many nodes, and rollback, etc. Clustered databases have many problems, and very few vendors can offer clustered databases. I suspect them offers are crippled in some ways, which makes scale-up databases superior.

If it were easy to do clustered scale-out databases Oracle would have no market anymore. Just release a 64-node x86 cluster and outperform the Oracle database for a tiny fraction of the cost. But no one has succeeded. And that is why Oracle earns the big bucks.

This is very, very niche, and fits with SPARC and POWER/PowerPC systems, so that makes sense.
These are business workloads. High margin. Sure it is nische, but it pays very very very well. SGI has tried for decades to venture into this highly lucrative arena.

Actually, it is both.
Not only is it leading in the mass GPU market (Tegra/GeForce/Quadro) but it is also leading in the HPC market (Quadro/Tesla).
I do suspect that owning 80% of the consumer GPU market is more worth than delivering GPUs to some supercomputers each year? But you claim that the supercomputer market is why NVIDIA fares better than AMD? Not that NVIDIA owns 80% of the consumer market? Just like Intel owns the majority of the consumer CPU market.

Right now it is, but another ten years from now, it will be huge.
Automated vehicles, robotics, AI, etc. will all utilize this technology.
...NVIDIA is already prepping for this with their Maxwell (GTX900 series) GPUs, since they have removed all but a single double-precision FPU on their die, leaving the near-majority of FPU functionality as single-precision, which deep learning takes huge advantage of; also, they did this to not immediately cannibalize their existing Tesla GPU products.
No one knows what the future will bring. Maybe deep learning will become obsolete. Maybe someone will create a specialized chip that outperforms GPUs, for a fraction of the wattage, and the cost? I would not be so sure on deep learning's abilities.

I never said it was irrelevant.
In fact, it is quite the opposite.
It seemed that you claimed NVIDIA fares better than AMD because of the supercomputer market?

...and GPGPUs have upwards of 6TFLOPS of SP compute capabilities, and that's just on the consumer GeForce line, not counting Tesla at all.
This is why the world's top supercomputers use x86 processors paired with CUDA GPGPUs.
The GPGPUs are not general purpose cpus, you need to rewrite your code. SPARC with 1.100 gflops are general purpose, it runs the same code, just several times faster than x86.

So what is your point?
If anything, this just shows how niche, specific, proprietary, and in some ways obsolete, that SPARC and POWER CPUs really are for modern tasks.
It is true that SPARC and POWER are nisched to business workloads. Because that is where the big money are. And that is why IBM and Oracle are more successfull as companies than Intel or large x86 vendors such as SGI. No one wants to touch x86 servers.

This is why interconnects like infiniband exist, and if one's software is written to take advantage of this, then there is no issue.
Perhaps on a 1:1 scale, SPARC and POWER do have more processing power than x86, but their extreme cost is what kills it.
It is far cheaper and much more cost effective to invest into many x86 systems with interconnects, than it would be to invest in just one SPARC or POWER system, for the same price, and with far less computing power (depending on the software, task, etc.).
True. But business workloads can only run on large scale-up servers. And such x86 servers hardly exist. So you have no choice than SPARC/POWER.

Your data and info on x86 is 20 years out of date, at least.
We can do all of this on x86 blade servers, and have been able to for years; I did just what you described here back 2012 for crying out loud.
Really, your knowledge on the topic of x86 and GPGPUs is from 1995-2005; things have changed massively since that time period.
I find this very very veeeeery doubtful. Scale-up x86 servers simply does not have the same RAS as SPARC/POWER and Mainframes. Actually, I dont believe you. The superior RAS is one of the main selling points. The most probable thing is that you falsely believe that x86 RAS is comparable to SPARC/POWER and Mainframes.
-Oh, this x86 server has ECC RAM. Well, that makes it as reliable as a SPARC/POWER/Mainframe.

Not quite.
 
No, Intel's server business margin is just over 50%. Where are you getting this 10% number from? Please provide some data for the numbers you're inventing about Intel's server margin.

What's your source for that number? It seems like you're just guessing. Sun was losing money on the server business so they were operating with negative margins. I understand Oracle is doing better with their business, but I can't find any SPARC specific numbers. Please feel free to cite some actual sources, or alternatively stop guessing numbers and presenting them as fact.

Even so, your guess is basically the same as Intel actually operates at.
It is apparent that I was guessing because of all my question marks and "maybe" wordings. I did not claim those guesses as facts. I based my guesses on that x86 vendors have very low margins at 4% which is well known (that is why everybody tries to exit x86 server market). I thought Intel must be doing better than 4% and guessed 10%. I was wrong on this, Intel apparently does far better as you have proved with 50%.

Anyway, this does not change that the x86 server market stinks which is extremely low margin. That is why everybody tries to exit it.

I suspect that if Intel with their $400 cpus have 50% margin, then the cpu division at IBM and Oracle with their $10-20.000 cpus, have much higher margin. The cost of manufacturing a cpu should be roughly the same, x86 or SPARC/POWER. And still Intel eeks out 50%. So SPARC/POWER cpu divisions should be able to eek out much more than Intel.

You're seriously pulling out a naive GFLOPs comparison? By this metric GPUs are the clear winners.
Even on the x86 front we have the Xeon Phi coming in at greater than 1 Tflops, and the next version out later this year is supposed to exceed 3 Tflops.
So if you're going to fall back on a Gflop comparison x86 still wins.
I am comparing general purpose cpus. SPARC/POWER vs x86. You can not compare a specialized addon GPU to a general purpose cpu. To take advantage of GPUs, you need to scrap all your code and rewrite everything using CUDA, and some workloads are not even parallelizable and can not be run on GPUs, such as business workloads. On SPARC XIfx you run the same code, but just faster. So, no, x86 lags behind in sheer computing power.

What is your point? That Haswell is slower than POWER8 on all(?) benchmarks?

You've just agreed that data center, HPC, virtualization is better on Intel. These are business workloads. Google's workloads are way bigger than any SAP installation. You keep on conflating business workloads with your limited subset of workloads where SPARC/POWER might be better.
I did not agree on that. I wrote, "On business workloads SPARC/POWER is faster". That does not imply that Intel is better on other workloads, I have not even mentioned other workloads. Maybe I claim that SPARC/POWER is faster on these workloads as well? Which I do. Your logic is way off the road. With such a logic, it is no wonder you draw wrong conclusions and believe x86 is faster than SPARC/POWER and has better RAS.

Yes, see my link above.

Look at CPU2006 results. The POWER8 and Xeon are basically the same per-socket, and the Xeon costs a lot less and consumes less power.
Hmmm.... This is a bit weird. You are showing links to CPU2006 and claim that x86 is comparable to POWER8?

In your link, an 8-socket POWER8 server reaches 5,130 CFP2006 and the fastest 8-socket x86 reaches 3.980. So, how in earth can you claim that x86 is comparable to POWER8? Not only that, you claim that x86 is also FASTER than POWER and SPARC in the rest of your paragraphs. I have not seen a single benchmark posting from you, where a x86 is faster than POWER8. So how can you continue claim that POWER8 is faster? Oh, now I know. Wrong logic.

And lets look at the highest scores. It is done by 16-socket servers, ie POWER8 server reach 14.400. Whereas the fastest x86 reaches 3.980. I fail to see why you claim that x86 servers are faster than SPARC and POWER? The link you showed, only has CFP2006 comparing both POWER8 and x86, that is why I looked at it. Imagine a Fujitsu M10-4S with 64-sockets SPARC 3.7GHz cpu score. :)

Data centers and virtualization are business workloads. Extreme is a subjective term, but in terms of working data size and computational operations required, they can be just as high if not higher than SAP. Again, look at Google, Amazon, Facebook workloads: they are absolutely massive business workloads, and they use x86.
I dont think I am clear enough. I am talking about business software, used for accounting and such business stuff, like SAP, databases, etc. That are used by large business companies, banks, etc to do their everyday business.

Google has a very large cluster with 900.000 servers Ive read. But no one is doing accounting or pay rolls or other business stuff on that cluster. Neither Facebook. Amazon does lot of business, and that is served somewhere by a large database on a single scale-up server, most probably a SPARC or POWER server. So the database backend is not x86 server - I suspect. The reason is that Amazon generates many transactions, so you need a large database for that. And the largest databases with best uptime are SPARC/POWER. I suspect an 4-socket or 8-socket x86 does not suffice to power the entire Amazon database backend.

So when I say business software, I do not mean virtualization, HPC, etc. I mean accounting and such business stuff. And such code branches heavily, so you can not run such business software on clusters, as explained by SGI.

Of course if Google replaced 900.000 SPARC or POWER servers in their cluster, it would certainly not be slower as each server is faster than x86. But the cost would kill it.
I guess by extreme you actually mean SAP. It's a very selective definition that you're using, that is out of line with the rest of the computer industry.

"All large workloads" include data centers, HPC, etc. These run on x86. Your statement is false.
Large Business workloads.

Low revenue business is also not sound. It's not clear to me why you're going on about business strategy. I agree that many companies want to move out of competition from Intel into less competitive markets; it's irrelevant when analyzing processor architecture.
I am talking about business strategy because that is where the big money are, and that market is the most important one. It has the highest margins and that is where the big fight is. SPARC and POWER targets that market. x86 tries to get in but fails. x86 has not the ability: not the performance, not the scalability, not the RAS.

Code:
And Intel goes for the even more lucrative 90%.  And, apparently they're releasing competitive products in the remaining 10% as well.
Intel is not able to go into the most important market, where the margins are really high.

The Gflop comparison is primarily a function of how many cores are on the chip. It's not particularly representative of actual performance. Clusters win on this metric, then GPUs, then the Xeon Phi, followed by the SPARCs, POWER and regular Xeon.
If we talk about strongest general purpose cpus, then x86 lags behind.

The 10% number you keep quoting is average per-core throughput largely determined by IPC improvements. If for some reason you want to predict the Gflop scaling then you should be looking at core count increases. This is why GPUs do so well.
True, but core count is not as important as you think. The point is, an x86 Xeon has roughly 150 watt to burn. It does not matter if it has 18 cores or 10 cores, the performance output will be roughly the same because they only have 150 watt juice. If Xeon 18 core had 300 Watt tdp and Xeon 10-core had 150 watt, then the Xeon 18 core would be roughly twice as fast. But that is not the case. Using 150 watt you can only extract that much performance, no matter how many cores. If you have many cores, they must be weak to fit into 150 watt. If you have few cores, they can be stronger. But still 150 watt is 150 watt. Intel can not surpass that barrier, which hampers performacne.

SPARC and POWER probably uses 250 watt or even more. They are sometimes water cooled, sitting in huge 1000kg 32-socket servers. They can burn lot more juice.

Have you seen benhmarks when they limit a GPU down to 200 watt, and compare it to 300 watt GPUs? The same GPU? The 300 watt version is easily faster, because it can use more watt. The same with SPARC/POWER. I do suspect that if Intel also had 250-300 wattage to burn, the x86 would be as fast as SPARC/POWER (even though SPARC/POWER spends more transistors on RAS and stuff other than performance).

Code:
And yet again you're equating "large workloads" with SAP.  Data centers are large workloads, HPC is large workloads.  Your definition of large workloads is wrong.
Large business workloads.

Are you just inventing these claims? Please provide some evidence.

I literally spent 2 minutes googling large socket count scaling on Xeons and pulled up these numbers. That's approximately 80% improvement as a function of socket count. That is considered very good scaling.

With this scaling, going from 16 to 32 sockets the Xeons would occupy top spot according to SAP

Literally what evidence do you have to suggest that the Xeons don't scale well? Provide specific sources.
Regarding x86 scales badly. The SAP link you showed me, is the first scale-up 16-socket x86 server SAP benchmark ever, the bench is brand new, just a month old. I have missed that link. It is a HP Integrity Superdome server, it is just a tweaked Unix server. HP had for long Integrity 64-sockets servers, running PA-RISC and Itanium cpus with HP-UX operating system. HP has for long tried to replace Itanium with x86, in their Kraken server. Now they have succeeded after many years, but they stayed at 16-sockets instead of going up to 64-sockets as Itanium could.

We see that HP reaches 460.000 saps with 16-sockets which is 40% better than the best 8-socket x86 which reaches 320.000 saps. So this is quite good, so I must revise my earlier statement: "x86 is now able to handle up to medium sized SAP workloads". Thanks for correcting me. I would not want to say false things, and if you prove me wrong, I immediately change my mind.

Regarding x86 scalability. According to wikipedia, scalability is the ability to go up and tackle larger and larger workloads. Which x86 can not do, it stays at 16 sockets. Whereas SPARC goes up to 64-sockets. The top SAP spot has 32-socket SPARC with 840.000 saps. The 8-socket POWER8 reaches 436.000 saps, almost as good as the 16-socket HP x86 server at 460.000 saps. So, I would not say that x86 is faster than SPARC or POWER, nor does it scale better.

Another thing, business benchmarks such as SAP is very hard to scale. They dont scale linearly as parallel workloads do, so you can not say that a 16-socket reaches X so therefore a 32-socket will reach 2X. There is a reason SPARC benchmark stayed at 32-sockets, I suspect that a 64-socket SPARC would only be 10% faster or so, because of hard scalability. Which means SPARC 64-socket would look bad, so therefore there are no 64-socket entries. That is my guess why there are no 64-socket entries.

Another claim without any supporting evidence. What RAS features and metrics specifically does a state-of-the-art Xeon fall short on? Please provide specifics, and examples where this was a factor in a purchasing decision.
I dont know exactly as I have not studied RAS. But x86 are notorious for being unreliable among Unix and Mainframe sysadmins. Have you heard about x86 servers going on for decades? It is like you say "how do you know Windows is unstable compared to unix, what does Windows fall short on, be specific" - everybody knows that Windows is not the most stable os. For instance, I hope you are not going to claim that you can hot swap everything in a x86 server, as you can do with SPARC/POWER/Mainframes. Why would anyone build such very expensive RAS into a cheap 4-socket server?

Great. I never said otherwise. Note that x86 is the most common processor for UNIX operating systems.
Yes, x86 is the cheapest. That is why.
 
By relevant I suppose you mean the tiny subset of workloads where they are competitive. I've said repeatedly, the majority of business workloads are not those applications. Can you provide any market data that suggests otherwise?
Business workloads, such as accounting. That is where the big money is, that market is the most important.


Citation needed.
Regardless of your lack of evidence,
LINPACK:
The top #2 positions are held by x86 systems.
Link please to Linpack?

SAP:
16 socket Xeon benchmark right up there with the 16 socket POWER and SPARC systems. Based on the 8 to 16 socket scaling of a Xeon, a 32 socket system is projected to take the top position. And again, for less money and power.
Again, you can not take a business software benchmark and project it scales linearly. If you could, it would be possible to run such workloads on clusters. It is not possible to scale linearly.

BTW, an 8-socket POWER8 gets almost the same SAP benchmark such as the 16-socket x86 server. And SPARC holds the top spot with almost twice as high score as the x86 server.

What do you think the "AES instructions" are?
Yes, I know. But I know that SPARC and POWER has more TDP to play with than x86. I also have seen benchmarks where SPARC was faster than x86 on crypto and compression and database queries, etc.
 
Intel apparently does far better as you have proved with 50%.

Anyway, this does not change that the x86 server market stinks which is extremely low margin. That is why everybody tries to exit it.
In the first sentence you agree that Intel has 50% margin on its x86 server market, which is quite good margin. In the second sentence you fall back on your claim that x86 server market is extremely low margin. How can you possibly reconcile these two statements?


What is your point? That Haswell is slower than POWER8 on all(?) benchmarks?
Check my link again. The single-threaded performance is faster on the Xeon than the POWER8.

In your link, an 8-socket POWER8 server reaches 5,130 CFP2006 and the fastest 8-socket x86 reaches 3.980.
Integer performance is basically the same per socket at around ~5000. The business workloads you've been talking about are going to stress integer performance and leave the floating point units mostly idle. And again, at less cost and power.


I dont think I am clear enough. I am talking about business software, used for accounting and such business stuff, like SAP, databases, etc. That are used by large business companies, banks, etc to do their everyday business.
Then don't keep saying "business workloads" or "large business workloads", because what you're talking about is only a tiny subset of "large business workloads".

Google has a very large cluster with 900.000 servers Ive read. But no one is doing accounting or pay rolls or other business stuff on that cluster. Neither Facebook.
Ironically, pay roll, accounting, etc, is computationally a tiny workload for Google and Facebook. Their largest workloads are run on their clusters. SAP is a tiny workload compared to large data center.

Amazon does lot of business, and that is served somewhere by a large database on a single scale-up server, most probably a SPARC or POWER server. So the database backend is not x86 server - I suspect.
No. Amazon uses commodity x86. Stop speculating - these things are easy to look up.

The reason is that Amazon generates many transactions, so you need a large database for that.
Amazon and most other sophisticated technology companies know how to architect large hardware and software systems so that they can be coarsely distributed. There is no reason that different users buying items on Amazon need to be served by a coherent database.

Their software engineers have failed if me buying an item on amazon is somehow stalling a thread on another user buying an item.

So when I say business software, I do not mean virtualization, HPC, etc. I mean accounting and such business stuff.
But virtualization, HPC, web services, and data centers *are* business software. You are using an incorrect definition. If you are talking about SAP only, then say SAP only.

Also, even among SAP workloads, you appear to be talking about the very tail-end. Most organizations don't require a multi-million dollar computer to run their SAP services.

So when you "business workloads", you actually mean a tiny fraction of SAP workloads.

And such code branches heavily so you can not run [...]
What? This simply isn't true. Can you provide a source and benchmarks for why branch heavy programs don't execute well on distributed systems? Can you even explain why this would be the case in terms of the program flow or system architecture?

When clusters perform poorly it is generally because of slow coherency and data sharing, not branches. Branches and memory accesses are different.

The Xeons are regarded as having the best branch prediction and ILP of any microprocessor on the market. These sorts of optimizations are evident when you look at benchmarks that are single-threaded, or more generally benefit from per-core IPC. See the Anandtech link again for an illustration of this performance; the Xeon appears to have higher per-core IPC.

Of course if Google replaced 900.000 SPARC or POWER servers in their cluster, it would certainly not be slower as each server is faster than x86.
Utter speculation on your part. You have no idea how Google's data center would perform on SPARC or POWER. I am interested in actual facts.


If we talk about strongest general purpose cpus, then x86 lags behind.
I am talking about all computing. I have no interest arbitrarily narrowing the scope of discussion.

In any case, the Xeon Phi will execute standard x86 binaries, and the upcoming Xeon Phi is general purpose and not add-in. So even by your arbitrary criteria, x86 wins on a naive flops comparison.

But still 150 watt is 150 watt. Intel can not surpass that barrier, which hampers performacne.
SPARC and POWER probably uses 250 watt or even more.
They can burn lot more juice.
1) Stop guessing about things like power usage. If you want to make an argument go look up the actual facts rather than speculating.

2) Intel has higher performance per watt, so your point is irrelevant.

3) Using more power is not a selling point. Ceteris paribus almost everyone prefers lower power.

Have you seen benhmarks when they limit a GPU down to 200 watt, and compare it to 300 watt GPUs? The same GPU? The 300 watt version is easily faster, because it can use more watt. The same with SPARC/POWER.
As you touch on, the comparison is only valid if the performance per watt is the same, which it isn't.

Regarding x86 scalability. According to wikipedia, scalability is the ability to go up and tackle larger and larger workloads. Which x86 can not do, it stays at 16 sockets. Whereas SPARC goes up to 64-sockets. The top SAP spot has 32-socket SPARC with 840.000 saps. The 8-socket POWER8 reaches 436.000 saps, almost as good as the 16-socket HP x86 server at 460.000 saps. So, I would not say that x86 is faster than SPARC or POWER, nor does it scale better.
The fact that x86 doesn't have a 32 socket system yet doesn't mean the architecture won't scale; it means nobody has built the system (presumably because the market size is so tiny). The trend from 8 socket to 16 socket showed good scaling, so by all evidence we can expect it to scale to 32 socket in a similar fashion.

Also your definition of scaling is bizarre. x86 goes from sub-watt embedded applications all the way up to 16 sockets with unified memory, and further into distributed memory systems with thousands of cores. The range of systems and workloads that x86 is deployed in far exceeds what people are using SPARC in.

Your argument that it won't scale because there's no 64 socket system is silly. This segment is like the 3-sigma tail of the SAP market. How many of these systems are sold every year? Setting 32 sockets as the cut-off point for what we consider scale-able is arbitrary. Xeon, SPARC, and POWER all show up in the SAP top 10. They are all viable options in that workload.

I could similarly say that SPARC doesn't scale because it doesn't have a 1024 socket system. Or it doesn't scale because it doesn't exist in milliwatt embedded sililcon applications. Or it doesn't scale because it's not used in distributed systems.

Aso you can not say that a 16-socket reaches X so therefore a 32-socket will reach 2X.
I never claimed the scaling would be linear. In fact, I explicitly said it was sub-linear. The trend still indicates that a 32 socket Xeon would take top spot or close to it.


I dont know exactly as I have not studied RAS. But x86 are notorious for being unreliable among Unix and Mainframe sysadmins.
If you don't know about it, then don't make claims. I have no interest in hearing your speculation.


Linkpack top 500 is here.
 
But that is not the case. Using 150 watt you can only extract that much performance, no matter how many cores. If you have many cores, they must be weak to fit into 150 watt. If you have few cores, they can be stronger. But still 150 watt is 150 watt. Intel can not surpass that barrier, which hampers performacne.
Your comments about power are confused.

Intel's (CMOS) manufacturing process is considered 2 years ahead of the rest of the industry. The per-transistor power consumption is lower then anyone else. This is a big factor in why Intel has lower power - because it has better silicon.

Intel targets a TDP based on market demands. It has nothing to do with the cores being "weak". It has to do with the manufacturing process, die area, supply voltage, and clock frequency, and how they think they can maximize profit. Performance scales sublinearly with respect to power consumption, so a careful tradeoff is made here in selecting a TDP.

Look at:
http://www.bit-tech.net/hardware/2014/09/03/intel-core-i7-5930k-and-core-i7-5820k-revie/8
For one of the i7s, by increasing the clock frequency ~50% (3 to 4.5GHz) the power consumption under load jumps ~90% (~220W to 436W). Intel has done their market research and determined this is where the sweet spot is.

If they wanted to sell parts with higher clock frequency, performance and power consumption, they could do so with existing silicon.
 
It is very difficult to parallelize some workloads. Some workloads are not even parallelizable, it is known they are impossible (aka known as P-complete problems).
Are you seriously suggesting that most business workloads or most database procedures are P-complete problems? Or is this just a non-sequitur meant to distract us? Complexity theory speaks to the inherent serialization of instructions in a program based on unavoidable data dependencies; it says nothing about the optimal memory architecture of a system, ie. distributed vs. low latency shared memory space.

The SAP "business workloads" you've been talking about don't constitute a single decision problem. They are multiple problems, and they are slow because of the traditional difficulties with coarse concurrency that requires 1) a coherent view of data, and 2) random data access over a large working data set. It's not because they are P-complete problems.

Ironically, actual P-complete problems are going to run better on processors with a higher single-threaded throughput because they don't benefit from multiple cores/threads. The benchmarks I linked above indicate that this is the Xeon.

Embarassingly parallel workloads are very specialized and typically only fit for scientific calculations, as explained by SGI.
In your quote SGI never said that only scientific workloads are parallelizable. You are misrepresenting what they said.

There are lots of non-scientific workloads that are highly parallelizable, at different coarseness levels.

Off the top of my head: running multiple programs at the same time, virtualizing multiple systems, separate threads for different components a game engine (audio, rendering, AI, etc.).

Your claim about only scientific computing being highly parallelizable is totally incorrect.
 
It is very difficult to parallelize some workloads. Some workloads are not even parallelizable, it is known they are impossible (aka known as P-complete problems). Typically, business workloads are not parallelizable. As explained by SGI talking about their large Linux servers with 10.000s of cores UV2000 and Altix:

No one knows what the future will bring. Maybe deep learning will become obsolete. Maybe someone will create a specialized chip that outperforms GPUs, for a fraction of the wattage, and the cost? I would not be so sure on deep learning's abilities.


It seemed that you claimed NVIDIA fares better than AMD because of the supercomputer market?


The GPGPUs are not general purpose cpus, you need to rewrite your code. SPARC with 1.100 gflops are general purpose, it runs the same code, just several times faster than x86.


It is true that SPARC and POWER are nisched to business workloads. Because that is where the big money are. And that is why IBM and Oracle are more successfull as companies than Intel or large x86 vendors such as SGI. No one wants to touch x86 servers.


True. But business workloads can only run on large scale-up servers. And such x86 servers hardly exist. So you have no choice than SPARC/POWER.

Not to resurrect a long dead thread, but I found this to be quite interesting in spite of what you have stated: https://groups.google.com/forum/#!topic/distsys-discuss/ZBPOe7pCV5M

Despite all that, this project on GitHub looks really interesting: https://github.com/antonmks/Alenka/blob/master/GPUvsSparc.txt


System configuration 1 : [1]

SPARC T4-4 Server with

4 SPARC T4 3GHz Processors, 32 cores, 256 threads

512 GB memory

4 Sun Storage F5100 Flash Arrays w/ 80 24GB FMODs each

Software : Oracle Database 11g Release 2 Enterprise Edition

Total cost : $925,000


System configuration 2 :

Pentium G620 2.6GHz 2 core CPU

16 GB memory

1 x 120GB internal SSD

1 Nvidia C2050 GPU

Software : Alenka GPU database [2]

Total cost : $2350



SQL for test query looks like this :


select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price,

sum(l_extendedprice*(1-l_discount)) as sum_disc_price,sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,

avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc,count(*) as count_order

from lineitem

where l_shipdate <= date '1998-12-01' - interval '[DELTA]' day (3)

group by l_returnflag, l_linestatus

order by l_returnflag, l_linestatus;




Results in seconds of query 1 :



SPARK 189s
GPU 173s


I find it interesting that a $2300 x86 system paired with a GPU can out-perform a nearly $1 million SPARC system with "far superior" specs, both from 2012; modern GPUs from 2015 would run circles around these systems, as well.
Just thought I would share this for those that come across this.
 
IMO, nVidia and AMD could become big players in this kind of large scale computing market if they invested a little more in it. Particularly nVidia with their Tesla compute units.
 
Not to resurrect a long dead thread, but I found this to be quite interesting in spite of what you have stated: https://groups.google.com/forum/#!topic/distsys-discuss/ZBPOe7pCV5M




I find it interesting that a $2300 x86 system paired with a GPU can out-perform a nearly $1 million SPARC system with "far superior" specs, both from 2012; modern GPUs from 2015 would run circles around these systems, as well.
Just thought I would share this for those that come across this.

Interesting but it's a test of a single SQL query. That PC isn't going to be running a massive database of any kind with that configuration. I'd rather see a comparison to an Intel system that can run a large database and do some real-world tests. That would provide more beneficial results. It seems there's a bit more to GPU-based databases though. Also from that link:

GPU databases require some restructuring- it's not about building smart query plans. It's
about getting bulk data that many queries will need to scan on to the GPU, then having
all running queries interested in that data run against that data. Pretty obvious:
transactional cost of pulling in data is high, query cost for running against the on gpu
data set is low, it's just a matter of job tracking all the queries that would benefit from
running against the range of the current data set.
 
Interesting but it's a test of a single SQL query. That PC isn't going to be running a massive database of any kind with that configuration. I'd rather see a comparison to an Intel system that can run a large database and do some real-world tests. That would provide more beneficial results.

No need for real-world experience here: just read the Oracle marketing papers like whats-his-face.
 
Back
Top