Haswell: Don't expect significant performance gains

"01:56PM - 4x the peak FP throughput of Nehalem" - http://hardforum.com/showthread.php?t=1715867&page=2

Four times the FP performance of Nehalem? I'll take it! Keep in mind I am running an i7-950.

But it won't translate to Folding performance improvements because Gromacs doesn't even support the original Sandy Bridge AVX yet:

http://www.gromacs.org/Documentation/Acceleration_and_parallelization#SSE.2c_AVX.2c_etc

Yes folks, just like SSE before it, AVX must have software support to bring you these huge gains. It may be some time before your application of choice gets the boost.
 
But it won't translate to Folding performance improvements because Gromacs doesn't even support the original Sandy Bridge AVX yet:

http://www.gromacs.org/Documentation/Acceleration_and_parallelization#SSE.2c_AVX.2c_etc

Yes folks, just like SSE before it, AVX must have software support to bring you these huge gains. It may be some time before your application of choice gets the boost.

Also the same reason why the 8150 can completely demolish the 2600k and 3770k when the software is compiled to take advantage of its new instruction sets, but very few software are actually compiled that way.
 
Also the same reason why the 8150 can completely demolish the 2600k and 3770k when the software is compiled to take advantage of its new instruction sets, but very few software are actually compiled that way.

The best gains will probably come from something handwritten as compilers are pretty temperamental about what they decide to vectorize. Writing some fast, vectorized code is some very tricky stuff, but yes, in certain situations you can generate a pretty enormous speed gain.

I wrote a software renderer a while ago and vectorized the geometry transformation at one point for SSE. Basically I went from processing a single vertex at a time to 4. There's some overhead and gotchas but if you can just nom on a stream of ready data then good things can happen. It wasn't a 4 fold increase in performance, but it was quite large for more or less "free."
 
Last edited:
The best gains will probably come from something handwritten as compilers are pretty temperamental about what they decide to vectorize. Writing some fast, vectorized code is some very tricky stuff, but yes, in certain situations you can generate a pretty enormous speed gain.

I wrote a software renderer a while ago and vectorized the geometry transformation at one point for SSE. Basically I went from processing a single vertex at a time to 4. There's some overhead and gotchas but if you can just nom on a stream of ready data then good things can happen. It wasn't a 4 fold increase in performance, but it was quite large for more or less "free."

Hand coding is king. Compilers can do nothing in comparison to the programmer knowing exactly what needs to be done in the code and coding it as such.

Being able to have multiple threads work on a single large piece of data without using any locks whatsoever can get you almost linear performance increases when multi-threading.

Looks like Intel is adding some hardware to try and help coders who don't know how to optimize their code really well.

The same goes for Microsoft. Visual Studio 2012 has auto-vectorizing ability as well.

Automation of this type is most likely never going to beat well hand coded stuff, but it will at least help code that is not optimized at all.

Of course this will probably lead to even poorer code since people will think that all they need to worry about is logic.. "as long as the code does what I need it to, the compiler.. and even processor will take care of the rest"... GRRRRR:mad:
 
Intel is choosing to focus less on powerful desktop gaming CPUs and more on low power application in the segments that are growing the fastest (tablet, ultrabook, mobile) - I'm sure if you took into account that the power consumption of the low power CPUs is dramatically less than the gaming CPUs, your find the power/watt is still increasing according to Moore's
 
Also the same reason why the 8150 can completely demolish the 2600k and 3770k when the software is compiled to take advantage of its new instruction sets, but very few software are actually compiled that way.


Exactly, but it does bring extra heat when stressing. ;)
 
Man, from what I read the architectural improvements and additions are outstanding. Increased the ports by two making the chip capable of 8ops, 1 cycle L2, increased BW for L1 and L2, increased OoO buffers, increased L2 TLB, improved virtual latency, new instructions including TSX, and more. This was just a pinch of improvements and additions not including the thermal improvements.

No, this will be an excellent processor.
 
Man, from what I read the architectural improvements and additions are outstanding. Increased the ports by two making the chip capable of 8ops, 1 cycle L2, increased BW for L1 and L2, increased OoO buffers, increased L2 TLB, improved virtual latency, new instructions including TSX, and more. This was just a pinch of improvements and additions not including the thermal improvements.

No, this will be an excellent processor.

+1

It looks to be a pretty big step forward.
 
In addition to my statements above, after thoroughly reading multiple articles it's become clear that AVX2 only offers double the performance over AVX if you are using the new Fused Multiply Add instruction. For all other usage, FP performance remains roughly the same as Sandy/Ivy.

And Shikami, all those improvements are nice, but they're really only increasing the cache/instruction bandwidth to better feed AVX/AVX2. You're not going to see the benefits of that on the day of release (because few mainstream applications support AVX yet), and it could take years to realize the investment in those improvements.
 
And Shikami, all those improvements are nice, but they're really only increasing the cache/instruction bandwidth to better feed AVX/AVX2. You're not going to see the benefits of that on the day of release (because few mainstream applications support AVX yet), and it could take years to realize the investment in those improvements.

I do know that the improvements and additions can affect AVX2, and are even for TSX's burden on L1. However, these architectural changes will effect all computing done by the processor. E.G. TLB increases alone always increase performance. So, to say that you will not see the benefits is incorrect (Q.V. Sandy Bridge)
 
I do know that the improvements and additions can affect AVX2, and are even for TSX's burden on L1. However, these architectural changes will effect all computing done by the processor. E.G. TLB increases alone always increase performance. So, to say that you will not see the benefits is incorrect (Q.V. Sandy Bridge)

Intel's latest leaked slides says up to 10% IPC improvement over SB.
 
Intel's latest leaked slides says up to 10% IPC improvement over SB.

That is a good increase, actually. Wonder why they said to SB instead of IB? There are differences between the two in computation speed, but not always much.
 
That is a good increase, actually. Wonder why they said to SB instead of IB? There are differences between the two in computation speed, but not always much.

Actually, I think it was IB. Just misstated that, as I tend to mix up SB and IB.
 
regardless of how much faster haswell will be over ivy bridge, haswell is what im going to buy.

My i7 920 and X58 chipset need a refresh and i want native Sata 6gb/s and usb 3.0 and a healthy gain of performance wouldnt hurt ( maybe not so much in games but whatever )

ditto
 
I just hope they don't repeat the mistake with Ivybridge and moving away from fluxless soldering of the IHS.
 
I just hope they don't repeat the mistake with Ivybridge and moving away from fluxless soldering of the IHS.


That really was a dumb move, but in terms so insignificant because overclockers are not who intel care about.
 
Intel's latest leaked slides says up to 10% IPC improvement over SB.

It's a 10% performance bump on average, not IPC increase. So Haswell looks to be another Sandy-to-Ivy + graphics.
 
That's a nice bump in average performance though for me personally coming from Bloomfield (20-30%+-). Plus all the stuff I'm missing on my old X58 board (mainly native SATA 6Gb/s ports which holds back my SSD). I can wait (can't wait)...lol
 
Last edited:
regardless of how much faster haswell will be over ivy bridge, haswell is what im going to buy.

My i7 920 and X58 chipset need a refresh and i want native Sata 6gb/s and usb 3.0 and a healthy gain of performance wouldnt hurt ( maybe not so much in games but whatever )


you could've had that with z77/IB.

This is very true, i just am having a battle with myself convincing myself to get IVY bridge.. and since haswell isnt that far out it and the new amd video cards should drop near the same time ( supposedly ). So thats when ill wait til
 
It's a 10% performance bump on average, not IPC increase. So Haswell looks to be another Sandy-to-Ivy + graphics.

I'm pretty sure it said 10% increase clock for clock on the slide. I could be mistaken though.
 
I'm pretty sure it said 10% increase clock for clock on the slide. I could be mistaken though.

You're mistaken.

On the CPU side you can expect a ~10% increase in performance on average over Ivy Bridge.

It's a mobile architecture, not a desktop one. Efficiency and GPU gains were the targets
 
How about some actual benchmarks to see who is right or wrong? :D
 
Looks like Haswell will be more about low power and integrated graphics, to compete with ARM in the mobile/tablet space.

Is it time for Moore's Law to R.I.P?
Moore's law isn't about performance exactly, it's about transistor density increasing. Performance just usually comes along for the ride.

A 10-15% performance increase at Haswell's target clock speed may be disappointing. That performance per watt will go up much more than that probably isn't interesting to most enthusiasts.

It's becoming clear that Intel is focusing on performance per watt while it has a pretty insurmountable lead in x86(32/64) CPU performance and no viable high performance competitors in that market. HPC had been one of AMD's shining spots, but with AVX2, which finally gains FMA and other improvements, Haswell may crush AMD there too.

TBH, a 10W Haswell variation for tablets does sound interesting. And the GPU doesn't sound too bad either. The top version may be very competitive (or possibly exceed it in several applications when the on package RAM is used) with the one in Piledriver currently out.

Once the TIM problem is fixed (as Intel promised), maybe it will be easier to overclock Haswell to higher frequencies. IB seemed to disappoint some people.
 
Back
Top