GTX 1080 NVIDIA presentation leaked

Ieldra · May 15, 2016

NVIDIA GeForce GTX 1080 Official Slides Leaked - Async Compute, SMTP VR Processing, Higher Efficiency and More Detailed

harmattan · May 15, 2016

Overclocking headroom on the 1080 looks to be massive, if that report is to be believed. over 2ghz with a water block.

Odellus · May 15, 2016

harmattan said:
Overclocking headroom on the 1080 looks to be massive, if that report is to be believed. over 2ghz with a water block.

depends on how far past 2 GHz it goes. it needs to hit at least 2100 MHz to overclock as well as a 980 Ti (+25%).

MongGrel · May 15, 2016

If it hits 2 ghz at all it seems would pretty be impressive.

We'll see.

King Icewind · May 15, 2016

Tell me more about this GP100 and getting it in my puter.

Ieldra · May 15, 2016

I'm shocked they changed the SM configuration... If each SM has double the amount of SPs as gp100... Then the number of registers per sm is exactly the fucking same as maxwell hahaha

jwcalla · May 15, 2016

MongGrel said:
If it hits 2 ghz at all it seems would pretty be impressive.

We'll see.

Weren't they running at 2.1 GHz on the stock cooler in the public demo?

razor1 · May 15, 2016

Yep they were, pretty sure this card will be able to do more than 2ghz on water, if you can deliver enough power to it.

Brackle · May 15, 2016

razor1 said:
Yep they were, pretty sure this card will be able to do more than 2ghz on water, if you can deliver enough power to it.

Exactly 1 8pin isn't enough if you want to hit 2.5ghz which some people claim.

Bring on the MSI Lightning card!

Ieldra · May 15, 2016

Hhhgghbbbb
I think this is how it will work. Pascal SM's still cannot do graphics + compute concurrently because of the expensive context switch.

The solution is to partition the GPC (so at the level of single SMs) to do graphics + compute.

The problem with maxwell is that the SM partitioning (afaik) could only be altered at each drawcall, whereas with Pascal's pixel, Triangle and instruction level preemption the repartitioning can be done with finer granularity.

I expect this to work on maxwell as well, but there will need to be careful profiling to avoid pipeline stalls

gs

Also anyone notice how the witcher 3 is featured on their async compute slide?

Byle_Kennett · May 15, 2016

64 rops.. what a disappointment.

Ieldra · May 15, 2016

Byle_Kennett said:
64 rops.. what a disappointment.

64 pixels per clock

2000mhz

Byle_Kennett · May 15, 2016

What's the 980 TIs?

Ieldra · May 15, 2016

Byle_Kennett said:
What's the 980 TIs?

96

Byle_Kennett · May 15, 2016

Ieldra said:
96

Yeah but coulda been 96 pixels per clock... at 2000 mhz.

Ieldra · May 15, 2016

Byle_Kennett said:
Yeah but coulda been 96 pixels per clock... at 2000 mhz.

That would require increasing the rop:bus-width ratio which they already did with maxwell

This will be fine man, it's memory bandwidth that is the question mark

razor1 · May 15, 2016

Ieldra said:
Hhhgghbbbb
I think this is how it will work. Pascal SM's still cannot do graphics + compute concurrently because of the expensive context switch.

The solution is to partition the GPC (so at the level of single SMs) to do graphics + compute.

The problem with maxwell is that the SM partitioning (afaik) could only be altered at each drawcall, whereas with Pascal's pixel, Triangle and instruction level preemption the repartitioning can be done with finer granularity.

I expect this to work on maxwell as well, but there will need to be careful profiling to avoid pipeline stalls

gs

Also anyone notice how the witcher 3 is featured on their async compute slide?

Pascal's SM's can now do both compute and graphics kernels, at the same time its in the slides, this was not doable with Maxwell, as a whole you can have different queues on different SM's, but trying to force one SM on Maxwell to do both, you end up with major under utilization of the ALU's if the scheduler predicts incorrectly, hence the performance penalties when doing heavy compute tasks, as the graphics queue would have to wait.

Ieldra · May 15, 2016

razor1 said:
Pascal's SM's can now do both compute and graphics kernels, at the same time its in the slides, this was not doable with Maxwell, as a whole you can have different queues on different SM's, but trying to force one SM on Maxwell to do both, you end up with major under utilization of the ALU's if the scheduler predicts incorrectly, hence the performance penalties when doing heavy compute tasks, as the graphics queue would have to wait.

Can you link to the specific slide? I didn't see anything to this effect

This suggests the context switch penalty has been reduced

Byle_Kennett · May 15, 2016

Ieldra said:
That would require increasing the rop:bus-width ratio which they already did with maxwell

This will be fine man, it's memory bandwidth that is the question mark

razor1 · May 15, 2016

Ieldra said:
Can you link to the specific slide? I didn't see anything to this effect

Well check the dynamic load slides.

There is more.... just hasn't been leaked yet.

Ieldra · May 15, 2016

razor1 said:
Well check the dynamic load slides.

There is more.... just hasn't been leaked yet.

Dynamic load slide mentions gpu partitioning, nothing explicitly references sm-level concurrency and/or reduced context switch penalty

Can't wait for more info

If there's more that hasn't leaked yet... Then start leaking already! XD

MangoSeed · May 15, 2016

I think it's clear from the slides that there's no more static partitioning with Pascal. The slide also explains why Maxwell performance tanks with async.

Ieldra · May 15, 2016

MangoSeed said:
I think it's clear from the slides that there's no more static partitioning with Pascal. The slide also explains why Maxwell performance tanks with async.

Yeah I got that, but how does this relate to the context switch cost on the SMs. Am I having a brainfart?

Afaik maxwell can only repartition at the drawcall

razor1 · May 15, 2016

there should be no context switch when doing these types of operations.

MangoSeed · May 15, 2016

Ieldra said:
Yeah I got that, but how does this relate to the context switch cost on the SMs. Am I having a brainfart?

The pre-emption slide mentions under 100 microseconds switching cost for games. Didnt really address SM level concurrency, which is not necessarily something you want - i.e. compute and graphics fighting for the same L1 cache.

Still don't quite get how pre-emption is a general solution for async. Pre-emption is useful if you want to interrupt a running task to make way for a higher priority kernel (like VR time warp). But async is more about running multiple kernels concurrently to take full advantage of all available execution resources.

Ieldra · May 15, 2016

MangoSeed said:
The pre-emption slide mentions under 100 microseconds switching cost for games. Didnt really address SM level concurrency, which is not necessarily something you want - i.e. compute and graphics fighting for the same L1 cache.

Still don't quite get how pre-emption is a general solution for async. Pre-emption is useful if you want to interrupt a running task to make way for a higher priority kernel (like VR time warp). But async is more about running multiple kernels concurrently to take full advantage of all available execution resources.

My understanding is that you won't have SM level concurrency, but that concurrency arises from having multiple SMs (each working on either graphics or compute) executing g+c asynchronously and concurrently. Actually of they're different SMs it will be in parallel

The preemption should allow for repartitioning the SMs, no?

razor1 · May 15, 2016

preemption and context switching should only be used for prioritizing or syncing, think of critical path.

Ieldra · May 15, 2016

razor1 said:
preemption and context switching should only be used for prioritizing or syncing, think of critical path.

Speaking of critical path, notice how nvidia wrote path optimization on the slide with clocks. Ties in nicely to all our rants about clocking being an architectural limitation mainly

TaintedSquirrel · May 15, 2016

harmattan said:
Overclocking headroom on the 1080 looks to be massive, if that report is to be believed. over 2ghz with a water block.

2.5 GHz mentioned here.

OC3D :: Article :: GTX 1080 will get liquid cooled variant at 2.5GHz :: The GTX 1080 will launch in four versions

NKD · May 15, 2016

Byle_Kennett said:

ROFL, Bahahahah. I am not even talking about your video. You really got me laughing with your user name. Byle Kennet, wait is that Kyle Bennet. HAHAHA fucking dope.

Knil · May 15, 2016

TaintedSquirrel said:
2.5 GHz mentioned here.

OC3D :: Article :: GTX 1080 will get liquid cooled variant at 2.5GHz :: The GTX 1080 will launch in four versions

Insane if true, I'm curious to see how performance scales at this higher frequency. That's almost a 50% clock speed bump from 1700 GHz.

ChosenUno · May 15, 2016

TaintedSquirrel said:
2.5 GHz mentioned here.

OC3D :: Article :: GTX 1080 will get liquid cooled variant at 2.5GHz :: The GTX 1080 will launch in four versions

Holy shit.

3GHz on LN2?

Sw1sher · May 16, 2016

bench plz

harmattan · May 16, 2016

Sw1sher said:
bench plz

Halfway there to make that FS/FT post quota! Keep those two-word non sequiturs coming, you're almost there!

Edit: Holy smokes, just noticed you joined yesterday. You must really want to buy/sell something.

TaintedSquirrel · May 16, 2016

According to Nvidia's slides the 1080 should be 70% faster than a 980, hot damn.
Show me 2.5 GHz OC and we have a winner.

Creig · May 16, 2016

TaintedSquirrel said:
According to Nvidia's slides the 1080 should be 70% faster than a 980, hot damn.
Show me 2.5 GHz OC and we have a winner.

Show me the benches before I'll believe it's 70% faster than a 980 on average.

5150Joker · May 16, 2016

TaintedSquirrel said:
According to Nvidia's slides the 1080 should be 70% faster than a 980, hot damn.
Show me 2.5 GHz OC and we have a winner.

Now just imagine what big pascal will do. That's the monster I'm impatiently waiting on.

Ieldra · May 16, 2016

5150Joker said:
Now just imagine what big pascal will do. That's the monster I'm impatiently waiting on.

Titan with 3840 SPs @ 2ghz will be 15tflops

5150Joker · May 16, 2016

Ieldra said:
Titan with 3840 SPs @ 2ghz will be 15tflops

Yup, plus some nice hbm2 tossed into the mix. That's the true Pascal, this is all just midrange fluff

If I were in the market for midrange Pascal, the $380 1070 in SLI seems a way better value than $600 1080.

Sw1sher · May 16, 2016

harmattan said:
Halfway there to make that FS/FT post quota! Keep those two-word non sequiturs coming, you're almost there!

Edit: Holy smokes, just noticed you joined yesterday. You must really want to buy/sell something.

Almost! But no, i dont know why it says I just joined yesterday though, I've been here for over a month i believe. Definitely didnt post 50+ in day lol

edit: Its fucking May lmao

GTX 1080 NVIDIA presentation leaked

I Promise to RTFM

Supreme [H]ardness

[H]ard|Gawd

[H]ard|Gawd

Supreme [H]ardness

I Promise to RTFM

2[H]4U

[H]F Junkie

Old Timer

I Promise to RTFM

Limp Gawd

I Promise to RTFM

Limp Gawd

I Promise to RTFM

Limp Gawd

I Promise to RTFM

[H]F Junkie

I Promise to RTFM

Limp Gawd

[H]F Junkie

I Promise to RTFM

[H]ard|Gawd

I Promise to RTFM

[H]F Junkie

[H]ard|Gawd

I Promise to RTFM

[H]F Junkie

I Promise to RTFM

[H]F Junkie

[H]F Junkie

Limp Gawd

[H]ard|Gawd

Limp Gawd

Supreme [H]ardness

[H]F Junkie

Gawd

Supreme [H]ardness

I Promise to RTFM

Supreme [H]ardness

Limp Gawd