Haswell: Don't expect significant performance gains

Deads0uls · Sep 11, 2012

"01:56PM - 4x the peak FP throughput of Nehalem" - http://hardforum.com/showthread.php?t=1715867&page=2

Four times the FP performance of Nehalem? I'll take it! Keep in mind I am running an i7-950.

DejaWiz · Sep 11, 2012

Deads0uls said:
"01:56PM - 4x the peak FP throughput of Nehalem" - http://hardforum.com/showthread.php?t=1715867&page=2

Four times the FP performance of Nehalem? I'll take it! Keep in mind I am running an i7-950.

You know you linked the second page of this thread, right?

Tsumi · Sep 11, 2012

DejaWiz said:
You know you linked the second page of this thread, right?

Someone got too excited

EngrChris · Sep 11, 2012

It was from here.
http://www.anandtech.com/show/6263/intel-haswell-architecture-disclosure-live-blog

defaultluser · Sep 11, 2012

Deads0uls said:
"01:56PM - 4x the peak FP throughput of Nehalem" - http://hardforum.com/showthread.php?t=1715867&page=2

Four times the FP performance of Nehalem? I'll take it! Keep in mind I am running an i7-950.

But it won't translate to Folding performance improvements because Gromacs doesn't even support the original Sandy Bridge AVX yet:

http://www.gromacs.org/Documentation/Acceleration_and_parallelization#SSE.2c_AVX.2c_etc

Yes folks, just like SSE before it, AVX must have software support to bring you these huge gains. It may be some time before your application of choice gets the boost.

Tsumi · Sep 11, 2012

defaultluser said:
But it won't translate to Folding performance improvements because Gromacs doesn't even support the original Sandy Bridge AVX yet:

http://www.gromacs.org/Documentation/Acceleration_and_parallelization#SSE.2c_AVX.2c_etc

Yes folks, just like SSE before it, AVX must have software support to bring you these huge gains. It may be some time before your application of choice gets the boost.

Also the same reason why the 8150 can completely demolish the 2600k and 3770k when the software is compiled to take advantage of its new instruction sets, but very few software are actually compiled that way.

socK · Sep 11, 2012

Tsumi said:
Also the same reason why the 8150 can completely demolish the 2600k and 3770k when the software is compiled to take advantage of its new instruction sets, but very few software are actually compiled that way.

The best gains will probably come from something handwritten as compilers are pretty temperamental about what they decide to vectorize. Writing some fast, vectorized code is some very tricky stuff, but yes, in certain situations you can generate a pretty enormous speed gain.

I wrote a software renderer a while ago and vectorized the geometry transformation at one point for SSE. Basically I went from processing a single vertex at a time to 4. There's some overhead and gotchas but if you can just nom on a stream of ready data then good things can happen. It wasn't a 4 fold increase in performance, but it was quite large for more or less "free."

cyclone3d · Sep 11, 2012

socK said:
The best gains will probably come from something handwritten as compilers are pretty temperamental about what they decide to vectorize. Writing some fast, vectorized code is some very tricky stuff, but yes, in certain situations you can generate a pretty enormous speed gain.

I wrote a software renderer a while ago and vectorized the geometry transformation at one point for SSE. Basically I went from processing a single vertex at a time to 4. There's some overhead and gotchas but if you can just nom on a stream of ready data then good things can happen. It wasn't a 4 fold increase in performance, but it was quite large for more or less "free."

Hand coding is king. Compilers can do nothing in comparison to the programmer knowing exactly what needs to be done in the code and coding it as such.

Being able to have multiple threads work on a single large piece of data without using any locks whatsoever can get you almost linear performance increases when multi-threading.

Looks like Intel is adding some hardware to try and help coders who don't know how to optimize their code really well.

The same goes for Microsoft. Visual Studio 2012 has auto-vectorizing ability as well.

Automation of this type is most likely never going to beat well hand coded stuff, but it will at least help code that is not optimized at all.

Of course this will probably lead to even poorer code since people will think that all they need to worry about is logic.. "as long as the code does what I need it to, the compiler.. and even processor will take care of the rest"... GRRRRR

michilius · Sep 11, 2012

Intel is choosing to focus less on powerful desktop gaming CPUs and more on low power application in the segments that are growing the fastest (tablet, ultrabook, mobile) - I'm sure if you took into account that the power consumption of the low power CPUs is dramatically less than the gaming CPUs, your find the power/watt is still increasing according to Moore's

ShuttleLuv · Sep 12, 2012

Tsumi said:
Also the same reason why the 8150 can completely demolish the 2600k and 3770k when the software is compiled to take advantage of its new instruction sets, but very few software are actually compiled that way.

Exactly, but it does bring extra heat when stressing.

Deleted member 214115 · Sep 12, 2012

Man, from what I read the architectural improvements and additions are outstanding. Increased the ports by two making the chip capable of 8ops, 1 cycle L2, increased BW for L1 and L2, increased OoO buffers, increased L2 TLB, improved virtual latency, new instructions including TSX, and more. This was just a pinch of improvements and additions not including the thermal improvements.

No, this will be an excellent processor.

cyclone3d · Sep 12, 2012

Shikami said:
Man, from what I read the architectural improvements and additions are outstanding. Increased the ports by two making the chip capable of 8ops, 1 cycle L2, increased BW for L1 and L2, increased OoO buffers, increased L2 TLB, improved virtual latency, new instructions including TSX, and more. This was just a pinch of improvements and additions not including the thermal improvements.

No, this will be an excellent processor.

+1

It looks to be a pretty big step forward.

defaultluser · Sep 12, 2012

In addition to my statements above, after thoroughly reading multiple articles it's become clear that AVX2 only offers double the performance over AVX if you are using the new Fused Multiply Add instruction. For all other usage, FP performance remains roughly the same as Sandy/Ivy.

And Shikami, all those improvements are nice, but they're really only increasing the cache/instruction bandwidth to better feed AVX/AVX2. You're not going to see the benefits of that on the day of release (because few mainstream applications support AVX yet), and it could take years to realize the investment in those improvements.

Deleted member 214115 · Sep 12, 2012

And Shikami, all those improvements are nice, but they're really only increasing the cache/instruction bandwidth to better feed AVX/AVX2. You're not going to see the benefits of that on the day of release (because few mainstream applications support AVX yet), and it could take years to realize the investment in those improvements.

I do know that the improvements and additions can affect AVX2, and are even for TSX's burden on L1. However, these architectural changes will effect all computing done by the processor. E.G. TLB increases alone always increase performance. So, to say that you will not see the benefits is incorrect (Q.V. Sandy Bridge)

Tsumi · Sep 12, 2012

Shikami said:
I do know that the improvements and additions can affect AVX2, and are even for TSX's burden on L1. However, these architectural changes will effect all computing done by the processor. E.G. TLB increases alone always increase performance. So, to say that you will not see the benefits is incorrect (Q.V. Sandy Bridge)

Intel's latest leaked slides says up to 10% IPC improvement over SB.

Deleted member 214115 · Sep 12, 2012

Intel's latest leaked slides says up to 10% IPC improvement over SB.

That is a good increase, actually. Wonder why they said to SB instead of IB? There are differences between the two in computation speed, but not always much.

Tsumi · Sep 12, 2012

Shikami said:
That is a good increase, actually. Wonder why they said to SB instead of IB? There are differences between the two in computation speed, but not always much.

Actually, I think it was IB. Just misstated that, as I tend to mix up SB and IB.

Deads0uls · Sep 12, 2012

Tsumi said:
Someone got too excited

lol, I guess I did!

wrangler · Sep 12, 2012

Dudeyourlame said:
regardless of how much faster haswell will be over ivy bridge, haswell is what im going to buy.

My i7 920 and X58 chipset need a refresh and i want native Sata 6gb/s and usb 3.0 and a healthy gain of performance wouldnt hurt ( maybe not so much in games but whatever )

ditto

DarkStryke · Sep 12, 2012

I just hope they don't repeat the mistake with Ivybridge and moving away from fluxless soldering of the IHS.

ShuttleLuv · Sep 12, 2012

DarkStryke said:
I just hope they don't repeat the mistake with Ivybridge and moving away from fluxless soldering of the IHS.

That really was a dumb move, but in terms so insignificant because overclockers are not who intel care about.

pelo · Sep 12, 2012

Tsumi said:
Intel's latest leaked slides says up to 10% IPC improvement over SB.

It's a 10% performance bump on average, not IPC increase. So Haswell looks to be another Sandy-to-Ivy + graphics.

MasonD · Sep 12, 2012

That's a nice bump in average performance though for me personally coming from Bloomfield (20-30%+-). Plus all the stuff I'm missing on my old X58 board (mainly native SATA 6Gb/s ports which holds back my SSD). I can wait (can't wait)...lol

Dudeyourlame · Sep 12, 2012

Dudeyourlame said:
regardless of how much faster haswell will be over ivy bridge, haswell is what im going to buy.

My i7 920 and X58 chipset need a refresh and i want native Sata 6gb/s and usb 3.0 and a healthy gain of performance wouldnt hurt ( maybe not so much in games but whatever )

blahblahyoutoo said:
you could've had that with z77/IB.

This is very true, i just am having a battle with myself convincing myself to get IVY bridge.. and since haswell isnt that far out it and the new amd video cards should drop near the same time ( supposedly ). So thats when ill wait til

Tsumi · Sep 12, 2012

pelo said:
It's a 10% performance bump on average, not IPC increase. So Haswell looks to be another Sandy-to-Ivy + graphics.

I'm pretty sure it said 10% increase clock for clock on the slide. I could be mistaken though.

Hagrid · Sep 13, 2012

pelo said:
It's a 10% performance bump on average, not IPC increase. So Haswell looks to be another Sandy-to-Ivy + graphics.

So nothing to really get excited about... hehe

pelo · Sep 13, 2012

Tsumi said:
I'm pretty sure it said 10% increase clock for clock on the slide. I could be mistaken though.

You're mistaken.

On the CPU side you can expect a ~10% increase in performance on average over Ivy Bridge.

It's a mobile architecture, not a desktop one. Efficiency and GPU gains were the targets

boxleitnerb · Sep 13, 2012

He is not mistaken:
http://www.fudzilla.com/home/item/28318-haswell-to-be-10 -percent-faster-than-ivy-bridge

Now Intel tells its partners to expect that Haswell should end up at least 10 percent faster than Ivy Bridge based cores at the same clock.

pelo · Sep 13, 2012

boxleitnerb said:
He is not mistaken:
http://www.fudzilla.com/home/item/28318-haswell-to-be-10 -percent-faster-than-ivy-bridge

I'd trust Anand over Fudzilla. Anand was at the IDF and frequently speaks with their engineers. Fudzilla starts with FUD.

Hagrid · Sep 13, 2012

How about some actual benchmarks to see who is right or wrong?

pxc · Sep 13, 2012

dbr1 said:
Looks like Haswell will be more about low power and integrated graphics, to compete with ARM in the mobile/tablet space.

Is it time for Moore's Law to R.I.P?

Moore's law isn't about performance exactly, it's about transistor density increasing. Performance just usually comes along for the ride.

A 10-15% performance increase at Haswell's target clock speed may be disappointing. That performance per watt will go up much more than that probably isn't interesting to most enthusiasts.

It's becoming clear that Intel is focusing on performance per watt while it has a pretty insurmountable lead in x86(32/64) CPU performance and no viable high performance competitors in that market. HPC had been one of AMD's shining spots, but with AVX2, which finally gains FMA and other improvements, Haswell may crush AMD there too.

TBH, a 10W Haswell variation for tablets does sound interesting. And the GPU doesn't sound too bad either. The top version may be very competitive (or possibly exceed it in several applications when the on package RAM is used) with the one in Piledriver currently out.

Once the TIM problem is fixed (as Intel promised), maybe it will be easier to overclock Haswell to higher frequencies. IB seemed to disappoint some people.

Haswell: Don't expect significant performance gains

Limp Gawd

Fully [H]

[H]F Junkie

Gawd

[H]F Junkie

[H]F Junkie

Supreme [H]ardness

[H]F Junkie

Weaksauce

Supreme [H]ardness

Deleted member 214115

Guest

[H]F Junkie

[H]F Junkie

Deleted member 214115

Guest

[H]F Junkie

Deleted member 214115

Guest

[H]F Junkie

Limp Gawd

2[H]4U

[H]ard|Gawd

Supreme [H]ardness

2[H]4U

[H]ard|Gawd

[H]ard|Gawd

[H]F Junkie

[H]F Junkie

2[H]4U

Limp Gawd

2[H]4U

[H]F Junkie

Extremely [H]