Vega GPU announced with some details, HPC so far

Worth remembering Fiji cannot do FP64, which was one reason they kept older gen just like Nvidia that could not do FP64 with Maxwell.
It was 1/16th ratio for Fiji I think (needed Hawaii GPUs if FP64 a requirement), Nvidia was 1/32 in Maxwell (required Kepler and either K40 or K80 for FP64).

Cheers

The nvidia volta based soc's announced specs also tell us that there will be at least one Volta flavor with native FP64 and packed 32/16/8
 
Isn't GP100 ~20tflops fp16?

As far as bandwidth efficiency is concerned I'm not sure its directly comparable because gp100 is a DP chip and there must be headroom for mixed 32/64 operation.

On the other hand I agree it cannot be a 600mm die if it's only 64CU like Fiji.

I'm much more inclined to believe it's two low clocked chips rather than one 1500mjz chip frankly
Polaris 10 XT2 pretty obvious to me. Zauba price double.. double performance etc.
 
Anarchist4000 the doom numbers are really meaningless because we have no idea what scene was tested, Titan xp could be struggle with 50fps there for all we know
 
So the mi25 is dual gpu? I'm confused by the clock rates.

Call me a Patty pooper but I don't believe the 1500mhz haha

Lemme check b3d
I mentioned in another thread here, in theory if AMD can increase the core count roughly by 25% then they would not need a massive clock speed.
Myself and Anarchist discussed this a bit in the other thread, I was coming from context of larger die (say around 500mm2) with more CU-cores, while his thoughts are around an architecture change around increased numbers of scalars per CU and smaller dies combined.
https://hardforum.com/threads/possi...-with-high-end.1918951/page-3#post-1042685349
https://hardforum.com/threads/possi...-with-high-end.1918951/page-3#post-1042685545
Read both of my posts as they need to be seen together, and you can also see Anarchists thoughts in there for a counterpoint perspective.

BTW while they are reporting the 12.5 TFLOPs FP32 for the MI25 that could be the full Vega core and only for HPC as they do not mention the number of cores, the Prosumer-gaming market would have the lower yield 4096 core that is not 12.5 TFLOPs and would come out 1st.
Which would tie in with the delay for launching the MI25 IMO.
Worth noting the MI25 is slightly ahead of the GP102 Tesla P40 in terms of FP32 compute, P40 has just under 12TFLOPs at 250W.
Cheers
 
Last edited:
You guys are discussing multiple cards from the launch, as well as this bench and you think they are not using Polaris in one?

Polaris is the smallest/least powerful one, Fiji is the mid tier one.
 
It's a single chip. Stop saying its two. We know you hate AMD, move along.
The dual die discussion was based on Vega 10 supposedly being smaller than Vega 11. I believe according to an AMD employee a while back. The opposite of the Polaris situation. If that's the case, the little Vega is besting P100 by 30% and Vega is radically more efficient than Polaris.
 
BTW while they are reporting the 12.5 TFLOPs FP32 for the MI25 that could be the full Vega core and only for HPC as they do not mention the number of cores, the Prosumer-gaming market would have the lower yield 4096 core that is not 12.5 TFLOPs and would come out 1st.
Which would tie in with the delay for launching the MI25 IMO.
Worth noting the MI25 is slightly ahead of the GP102 Tesla P40 in terms of FP32 compute, P40 has just under 12TFLOPs at 250W.
Just realized an error in my reading skills. While the FP16 performance is in theory faster than P100, it was benchmarked against GP102 which lacks the double rate FP16. Not the P100. MI25 was therefore ~46% faster than GP102 in deep learning. Likely slower, not shown, than P100 following along the bandwidth limits. That also lines up with Vega10 being little Vega and likely ~300mm2 on par with GP104, albeit with HBM. If fabbed at TSMC with a slightly higher ideal clock speed that 4096+1024(equivalent scalar) at 1.25GHz makes a lot of sense. The scalar portion wouldn't be very efficient for deep learning and nice parallel benchmarks, so 20% of theoretical performance is left on the table. Eliminate the scalars and the benchmark is 46% higher FP16 performance for MI25 versus 55% higher theoretical TFLOPs. Well within what I'd consider a margin of error and reasonably efficient. That lines up much better with realistic expectations.

So this wouldn't be the enthusiast class product, just a high end around 1080 and likely approaching GP102 graphics performance. A dual Vega10 could in theory still exist, but we'd be looking at double the performance not accounting for thermal issues. That also leaves room for a Vega11 around 450mm2 along the lines of GP102 that is still likely small enough to fit onto an APU.
 
  • Like
Reactions: N4CR
like this
Just realized an error in my reading skills. While the FP16 performance is in theory faster than P100, it was benchmarked against GP102 which lacks the double rate FP16. Not the P100. MI25 was therefore ~46% faster than GP102 in deep learning. Likely slower, not shown, than P100 following along the bandwidth limits. That also lines up with Vega10 being little Vega and likely ~300mm2 on par with GP104, albeit with HBM. If fabbed at TSMC with a slightly higher ideal clock speed that 4096+1024(equivalent scalar) at 1.25GHz makes a lot of sense. The scalar portion wouldn't be very efficient for deep learning and nice parallel benchmarks, so 20% of theoretical performance is left on the table. Eliminate the scalars and the benchmark is 46% higher FP16 performance for MI25 versus 55% higher theoretical TFLOPs. Well within what I'd consider a margin of error and reasonably efficient. That lines up much better with realistic expectations.

So this wouldn't be the enthusiast class product, just a high end around 1080 and likely approaching GP102 graphics performance. A dual Vega10 could in theory still exist, but we'd be looking at double the performance not accounting for thermal issues. That also leaves room for a Vega11 around 450mm2 along the lines of GP102 that is still likely small enough to fit onto an APU.

They showed two versions remember, Doom probably on the cut down version and separately the reported performance of the MI25 and here we need to give leeway because if it is a competitor setting it up (well setup cuDNN-libraries-versions important and related in this case) or selectively comparing, same way Intel did their own benchmark statistics against Nvidia hardware but that caused Nvidia to respond about the benchmarking, not sure if they will bother with AMD as for now they do not represent same threat to them as Intel in the HPC business.
To date DeepBench uses FP32 training, and by default its settings aligned towards Maxwell architecture (they need modifying for Pascal).
So to me the Mi25 is the full core Vega, Doom is the cut Vega, and yeah agree down the line they may release a dual-Vega at least for Prosumer.

Anyway the benchmark is not fully replicating what clients would buy the cards for in Deep Learning and more general, and I do not think it is yet configured optimally for the P40 or P100, which is what MI25 is really up against for FP32 and the P100 for FP16.
If I get time I will try to hunt down Baidu's results for the Maxwell Titan and Pascal Titan in the benchmark; maybe they used Baidu's figures and maybe older one *shrug*.
Anyway there are other neural benchmarks that AMD should had considered that are more developed for now with regards to Deep Learning/neural networks and broader support.
AlexNet is less ideal in some ways but better in others, should had done both.
Cheers
 
Last edited:
They showed two versions remember,
The only confirmed difference was the memory pool though. 16GB to 8GB in what are probably 4 and 2 stacks as I think 4HI HBM2 is the only configuration being produced. Likely just a clamshell setup to get some more capacity on the pro board. It may be a cut down part, but there could also be a 490/490X differentiating between those. Vega 11 would in theory fall as some sort of Fury tier product. I have no idea how many games were already hinting FP16 math, even if to conserve register space, but that could be a rather significant performance boost for the Vegas Nvidia can't easily match. GP100 is the only card that in theory could approach that and that's not even the big Vega.
 
This IS the big Vega ;)

There are 0 FP16 games in Windows. Because its simply not supported yet. The basic support comes with the SM6.0 addon.

GDC-Shader-Model-6.0-2-pcgh.jpg
 
Last edited:
Yeah I am keen to see FP16 in games but it will still be a little way off for PC gaming (I still think this will cause slight headaches in rendering engine-post processing porting as it is not trivial to do without errors in X cycles), further compounded by discrete GPU Polaris that does not have the accelerated packed FP16 functions; keeping consoles separate to this.
The packed FP16 acceleration is really needed more for mainstream Polaris owners rather than enthusiast Vega ones, so this will also slow its PC development adoption combined with Pascal also not supporting FP16 for consumers.

The AoTS (seems it was Vega going by reports now) and Doom results (cannot really read too much into either tbh) suggests we are dealing with a reduced core Vega for PC Gaming and both perfect games for AMD in different ways, while the MI25 with those FP32 results is the full core releases slightly later; both using the same die; this makes perfect sense to me because this is the optimal way to launch a new GPU-die and exactly what Nvidia did with GP102.
Everything seems for now to be aligning with what I mentioned in the other thread with it being a single large-ish die and when talking about the evolution from Fiji and relative performance between then and also Nvidia.

Cheers
 
Last edited:
If that is the case then AMD really screwed the pooch with their marketing again.
My mistake looks like they were actually showing the gaming GPU (and reports suggest the AoTS score was the Vega GPU as well) while just reporting the MI25 and its benchmark results.
I tend to think the consumer model is cut down while the MI25 is the full core, both using same die of around 470mm2 - 500mm2 IMO.
How they structure enthusiast Vega will be interesting as it is fair to say they had too many products overlapping or close with performance-enthusiast tiers last gen; 390/390X/Nano/Fury/Fury X, where Nvidia had 970/980/980ti - keeping Titan separate as the 980ti was really the PC gaming focused product.
Cheers
 
what was really weird to me (in a negative way) was how they used a titan xp with the battlefield 1 presentation on their new cpu today. I assumed theyd have their new polaris gpu in there or at least a crossfired rx 480 or fury x
 
Just realized an error in my reading skills. While the FP16 performance is in theory faster than P100, it was benchmarked against GP102 which lacks the double rate FP16. Not the P100. MI25 was therefore ~46% faster than GP102 in deep learning. Likely slower, not shown, than P100 following along the bandwidth limits. That also lines up with Vega10 being little Vega and likely ~300mm2 on par with GP104, albeit with HBM. If fabbed at TSMC with a slightly higher ideal clock speed that 4096+1024(equivalent scalar) at 1.25GHz makes a lot of sense. The scalar portion wouldn't be very efficient for deep learning and nice parallel benchmarks, so 20% of theoretical performance is left on the table. Eliminate the scalars and the benchmark is 46% higher FP16 performance for MI25 versus 55% higher theoretical TFLOPs. Well within what I'd consider a margin of error and reasonably efficient. That lines up much better with realistic expectations.

So this wouldn't be the enthusiast class product, just a high end around 1080 and likely approaching GP102 graphics performance. A dual Vega10 could in theory still exist, but we'd be looking at double the performance not accounting for thermal issues. That also leaves room for a Vega11 around 450mm2 along the lines of GP102 that is still likely small enough to fit onto an APU.


Its not like that, Vega 10 will be bigger than GP104, somewhere in the middle of GP104 and GP102 closer to the later. From Polaris if they want to keep the same ratio's of TMU/ROP/shader units, is a 85% increase in die size minus the bus size 15%, so its going to end up around 400mm or greater. I don't see them cutting down TMU's and ROP's this time, they NEED to keep them at similar ratios to Polaris otherwise they end up with another chip like Fiji, where the shader array just gets unused because of other bottlenecks. What would the use of all those front end changes if they are going to bottleneck it again somewhere else?

Pretty easy to see in GPU's there won't be anything more than 4096 ALU's for Vega at least for consumer graphics cards in the near term cause they would be showing that in their presentations and not a card that seems to perform around a 1080.

There is no blind siding nV here, cause nV already has their chips out, the only one left is the 1080ti and that we can say is probably going to be around the Titan X pascal. There is no strategy that AMD can use to take an advantage cause they are so late that it won't matter.

Added to this AMD will also be planning on a cut down version that will span to the lower end performance chips, by cutting CU's, unless they think all their chips are fully functional, which not going to happen.

And just look at past history, when did AMD ever show anything that isn't the best they could show? I can't think of a single time.

Only way is Vega 11 is no where near ready for mass production is what you are getting at, and that I don't believe. cause it sure sounds like they are both tapped out a quarter and half ago.
 
Last edited:
what was really weird to me (in a negative way) was how they used a titan xp with the battlefield 1 presentation on their new cpu today. I assumed theyd have their new polaris gpu in there or at least a crossfired rx 480 or fury x


They wouldn't want to give away the performance yet, cause if they showed it with BF1 the mystery would be gone that's why they showed it with a unreleased DLC......
 
Last edited:
There are 0 FP16 games in Windows. Because its simply not supported yet. The basic support comes with the SM6.0 addon.
FP16 already exists as a packed format. The compiler as an optimization could assume that FP16 inputs want FP16 math which was the fuzziness mentioned. Won't always work, but shader replacement and game optimizations are a thing. Current games could patch it in easily enough.

Yeah I am keen to see FP16 in games but it will still be a little way off for PC gaming (I still think this will cause slight headaches in rendering engine-post processing porting as it is not trivial to do without errors in X cycles), further compounded by discrete GPU Polaris that does not have the accelerated packed FP16 functions; keeping consoles separate to this.
The packed FP16 acceleration is really needed more for mainstream Polaris owners rather than enthusiast Vega ones, so this will also slow its PC development adoption combined with Pascal also not supporting FP16 for consumers.
Vega isn't locked to enthusiast though. For discrete yes atm, but all the upcoming Zen APUs will be Vega as well. So it will exist alongside the current Polaris lineup. Doubling effective compute power there could be huge for that market and not insignificant.

what was really weird to me (in a negative way) was how they used a titan xp with the battlefield 1 presentation on their new cpu today. I assumed theyd have their new polaris gpu in there or at least a crossfired rx 480 or fury x
I think they used 2 in SLI, probably to push the CPU more. Demoing a new CPU at 20% utilization isn't a great way to show off it's merits.

Its not like that, Vega 10 will be bigger than GP104, somewhere in the middle of GP104 and GP102 closer to the later. From Polaris if they want to keep the same ratio's of TMU/ROP/shader units, is a 85% increase in die size minus the bus size 15%, so its going to end up around 400mm or greater. I don't see them cutting down TMU's and ROP's this time, they NEED to keep them at similar ratios to Polaris otherwise they end up with another chip like Fiji, where the shader array just gets unused because of other bottlenecks. What would the use of all those front end changes if they are going to bottleneck it again somewhere else?
Those ratios could change with an architecture improvement. We know geometry has seen significant improvements from that PS4 interview. TMU/ROP/shader ratios could be adjusted if they are expecting a higher usage of compute in the future or performance changed with architecture. As for the size of the die I think that may be really hard to peg. AMD has been rather vague about it as well. My scalar idea could make the logic much more dense, but there are far too many vairables there on the implementation used. The biggest size limitation I still perceive is that one, ideally both, Vegas need to be small enough to fit on an interposer with Naples. Along with what are likely 4 stacks of HBM2. That's a 16 core/32 thread Zen plus a full Vega die and HBM2. That's a lot of real estate, but directed at the server market. Although it would probably be really fun to game on for an enthusiast.

There is no blind siding nV here, cause nV already has their chips out, the only one left is the 1080ti and that we can say is probably going to be around the Titan X pascal. There is no strategy that AMD can use to take an advantage cause they are so late that it won't matter.
The blindsiding may have already happened with the double FP16 on PS4, likely Scorpio, and getting pushed out with Vega. Once devs take advantage of that, and they will because even Nvidia benefits with the register savings, there will be a lot of added compute capacity. That likely translates directly into higher resolutions without stressing ROPs/TMUs. Blindsided may not be a good description, but the move may have been deliberate as the capability does exist with GP100.

Pretty easy to see in GPU's there won't be anything more than 4096 ALU's for Vega at least for consumer graphics cards in the near term cause they would be showing that in their presentations and not a card that seems to perform around a 1080.
...
Added to this AMD will also be planning on a cut down version that will span to the lower end performance chips, by cutting CU's, unless they think all their chips are fully functional, which not going to happen.
They haven't shown anything in regards to actual core counts. Even 4096 is a guess. The card they demoed looks like it will be well ahead of a 1080, but all comparisons are highly speculative right now. At the very least we know a 480 and 1060 are roughly comparable with 4:3 bandwidth ratio and Vega should have an even larger advantage around 8:5 with lower latency HBM2 on top of architecture improvements. So long as Vega doesn't perform worse than Polaris as an architecture it seems likely it's faster.

Only way is Vega 11 is no where near ready for mass production is what you are getting at, and that I don't believe. cause it sure sounds like they are both tapped out a quarter and half ago.
Even the Vega 10s they demoed were weeks old according to some AMD employee. One of their engineers for ROCm even said they're still fighting with the graphics guys to get cards. Everything shown had an actual product to back it up. It's entirely possible Vega 11 wasn't ready yet if Vega 10 is only starting to arrive. Given the changes to the WSA and HBM2 supply situation there could be fab changes.
 
FP16 already exists as a packed format. The compiler as an optimization could assume that FP16 inputs want FP16 math which was the fuzziness mentioned. Won't always work, but shader replacement and game optimizations are a thing. Current games could patch it in easily enough.


Vega isn't locked to enthusiast though. For discrete yes atm, but all the upcoming Zen APUs will be Vega as well. So it will exist alongside the current Polaris lineup. Doubling effective compute power there could be huge for that market and not insignificant.
.....

The blindsiding may have already happened with the double FP16 on PS4, likely Scorpio, and getting pushed out with Vega. Once devs take advantage of that, and they will because even Nvidia benefits with the register savings, there will be a lot of added compute capacity. That likely translates directly into higher resolutions without stressing ROPs/TMUs. Blindsided may not be a good description, but the move may have been deliberate as the capability does exist with GP100.

.....
Even the Vega 10s they demoed were weeks old according to some AMD employee. One of their engineers for ROCm even said they're still fighting with the graphics guys to get cards. Everything shown had an actual product to back it up. It's entirely possible Vega 11 wasn't ready yet if Vega 10 is only starting to arrive. Given the changes to the WSA and HBM2 supply situation there could be fab changes.

That is a good point about Vega and APU, but it is still going to be limited as the largest footprint for would be outside of the Vega-Ryzen CPU and Vega enthusiast discrete GPU, could be a nice way to access FP16 like you say but still not convinced how well it will work using mGPU and depending upon the APU for all FP16 calculations for the rendering game engine/post processing effects while the more powerful dGPUs do the rest of the rendering engine/post processing work.
Even if it can work, it is going to take a long time for developers to get this working well in such a setup, then we need to consider Vulkan and DX12 and how well that is going with developers for such a context (lots of time needed again).

I really cannot see PS4 or Scorpio pushing PC games with FP16, developers just will not include it on PC ports as it is just not worth it in terms of development resources vs accessibility by PC gamers , and the PS4 Pro implementation is a fair bit away from 'Zen'-Vega APU and more custom, but then consoles are highly custom relative to the PC APU products even now.

Vega weeks old...
It will still have the current drivers developed, and strange how it was mature and stable enough for the unreleased Rogue One game to show to the world at the event :)
So not sure I buy into the AMD employee comment.

Nothing to do with you Anarchist, but I find it amusing how Nvidia and Jen-Hsun is attacked by various sites for not showing the GP100/Drive PX2 on stage with many sources saying they do not exist and yet we get a mock-up Cube from AMD with a pie in the sky figure used to compare against the P100 (which in reality is designed to be in an 8-slot node as the DGX-1, context is the sale pitch is the same for both).
Anyway I do think that was a really bad decision by AMD to show the Intel comparison with the Titan Pascal, that segment should had been removed and left the narrative more about AMD as they did at the end with Rogue One.

But here is a thought, maybe that was the CPU division getting their own back on Raja's division and using what they wanted without bothering to support GPU division as it was a CPU team demonstration (so their call on setup) :)
More realistic is the Vega card could not driver the BF1 comparison to a minimum of 60fps (for whatever reason and we need to see how it pans out), but the truth could be somewhere between both.
Cheers
 
Last edited:
OK I am laughing, because I think Raja is going to be irritated with the CPU division over the Intel vs AMD using the Titan Pascal, who themselves are probably irritated about the Intel deal the Radeon group organised.
Not only was it in the Ryzen event yesterday, but that information is also part of their official news brief :)
Again at 3.4 GHz, Ryzen was shown beating the game framerates of a Core i7 6900K playing Battlefield™ 1 at 4K resolution, with each CPU paired to an Nvidia Titan X GPU.
http://ir.amd.com/phoenix.zhtml?c=74093&p=irol-newsArticle&ID=2229447

Yeah definitely think not everything is happy families in AMD at the moment between the two divisions (Raja's team the new one called Radeon Technologies Group) now split with separate VPs and stronger focused briefs for their areas.
Remember it was not long ago Radeon started the Intel i5 + MSI 480 great deal: http://radeon.com/en-us/radeon-intel-bundle/

Cheers
 
Last edited:
I really cannot see PS4 or Scorpio pushing PC games with FP16, developers just will not include it on PC ports as it is just not worth it in terms of development resources vs accessibility by PC gamers , and the PS4 Pro implementation is a fair bit away from 'Zen'-Vega APU and more custom, but then consoles are highly custom relative to the PC APU products even now.
Fairly sure devs are already going after FP16. Packing is too obvious of an optimization technique, and once that occurs executing could happen through trial and error or compiler optimizations. The packing is already reasonably supported on current cards. As a compiler optimization or shader replacement from AMD it could happen. Take sebbbi for instance who already seems to be going crazy with it.

It will still have the current drivers developed, and strange how it was mature and stable enough for the unreleased Rogue One game to show to the world at the event :)
So not sure I buy into the AMD employee comment.
If Vega is 4096 cores it's likely near binary compatible with Fiji. Most GPUs work well enough firing up old games, although they could likely perform better if optimized for the new architecture. I buy the comment because moving Vega to TSMC or another fab may have been justified. We always assumed it was TSMC, maybe they moved to Intel. lol If it's an HBM shortage, which seems likely, getting an actual sample produced could still be difficult. Having the actual dice is only half the battle.

Anyway I do think that was a really bad decision by AMD to show the Intel comparison with the Titan Pascal, that segment should had been removed and left the narrative more about AMD as they did at the end with Rogue One.
If they wanted to make the narrative only about themselves. There is something to be said for showing your products working well with competitors and other leaders in the market.
 
Fairly sure devs are already going after FP16. Packing is too obvious of an optimization technique, and once that occurs executing could happen through trial and error or compiler optimizations. The packing is already reasonably supported on current cards. As a compiler optimization or shader replacement from AMD it could happen. Take sebbbi for instance who already seems to be going crazy with it.


If Vega is 4096 cores it's likely near binary compatible with Fiji. Most GPUs work well enough firing up old games, although they could likely perform better if optimized for the new architecture. I buy the comment because moving Vega to TSMC or another fab may have been justified. We always assumed it was TSMC, maybe they moved to Intel. lol If it's an HBM shortage, which seems likely, getting an actual sample produced could still be difficult. Having the actual dice is only half the battle.


If they wanted to make the narrative only about themselves. There is something to be said for showing your products working well with competitors and other leaders in the market.

Yes I agree FP16 will be used but only where it provides tangible benefits to development costs, and there is no benefit for PC Gaming when the majority of products do not support FP16; even if AMD sells Ryzen moderately well it is still marginal compared to the existing Intel CPUs gaming base and that is not going to change anytime soon even with strong sales.
It needs to gain traction and a notable userbase in the mainstream and mainstream-performance PC Gaming hardware sectors; this means Polaris/lower Pascal and possibly up to i5 CPU equivalents, this will probably mean the 8C16t Ryzen is also outside of this target audience and the 1st CPU product it seems they are launching.
And that is ignoring just how effective FP16 would work in mGPU for PC Gaming as Vega dGPU is not mainstream and will not be for quite awhile (will need to replace all Polaris dGPU products sold).

Sebbi is a console developer and I get why he is keen, especially with the control the developers have over consoles at a low level and their very custom designs, same way it will be pushed with the Nintendo Switch.

The headache this will cause porting will be interesting as it is far from trivial to do accurately and within certain number of cycles that has critical dependencies with rendering-post processing operations, but that is for another thread and when we see the 1st console game making heavy use of FP16.
Cheers
 
Last edited:
The headache this will cause porting will be interesting as it is far from trivial to do accurately and within certain number of cycles, but that is for another thread and when we see the 1st console game making heavy use of FP16
Porting should actually be simple enough. Just a matter of running two passes for the packed data. Compilers or devs should be able to unwind that easily enough. No different than if you didn't have double the execution rate. Simply two instructions as opposed to one if the data was packed.

Yes I agree FP16 will be used but only where it provides tangible benefits to development costs, and there is no benefit for PC Gaming when the majority of products do not support FP16
Packed FP16 should be reasonably well supported. I don't recall when Nvidia gained support, but I thought Maxwell2 supported it. AMD should have support for GCN1.2 (tonga/Fiji) and newer. At the very least it reduces register pressure for everyone.
 
Anarchist, you are giving AMD too much credit for something that hasn't come out yet, for something you don't know what it is. Its all fine with speculation, but yeah most of the stuff you are saying, I don't think are going to happen lol. Because none of what has happened so far, what has been stated so far, really fits in with anything you stated.

There is only two valid reasons for Vega's launch time, HBM 2 and they aren't going to compete at the top end enthusiast level, and we know if its only HBM2 they will launch in Q3 of 2016 if launches after that the second part is going to real too.

Just go by history and what has happened in the past, no need to try to make convoluted reasons for something so simple.
 
Last edited:
Porting should actually be simple enough. Just a matter of running two passes for the packed data. Compilers or devs should be able to unwind that easily enough. No different than if you didn't have double the execution rate. Simply two instructions as opposed to one if the data was packed.


Packed FP16 should be reasonably well supported. I don't recall when Nvidia gained support, but I thought Maxwell2 supported it. AMD should have support for GCN1.2 (tonga/Fiji) and newer. At the very least it reduces register pressure for everyone.
Nvidia has supported accelerated native FP16 for quite awhile and the 1st to do so :)
Look back to the Maxwell Tegra SoC X1.

As I mentioned in a different post packed FP16 container has been supported by for awhile though by both but that is not the same thing.
Sorry to disagree but it is not trivial to do the conversion accurately and when there are criticial dependencies within certain cycle time; you waste quite a few cycles (relative to it worked on natively) when it is not native maths and more so when looking to do the maths with a high level of accuracy as it then becomes more complex operation.
I have linked in the past (will be tricky to find) on B3D the dot product functions actual work pertaining to this by some very smart developers working with CUDA, they will have greater experience as it is something that has been part of the science world for awhile compared to modern gaming.
I think Sebbi has also hinted it is not trivial to do accurately, but did not touch on cycles required nor potential complexity this has with the diverse complex critical dependencies you will get with rendering engines and post processing effects, compounded by the mix of using both FP32 and FP16 operations.

That is why we need to wait and see the 1st game than can actually make heavy use of FP16 on the consoles and go from there, along with insight on how it may impact porting to PC Gaming.
Cheers
 
Last edited:
what was really weird to me (in a negative way) was how they used a titan xp with the battlefield 1 presentation on their new cpu today. I assumed theyd have their new polaris gpu in there or at least a crossfired rx 480 or fury x

I assume they wanted to show a serious scenario not limited by GPU that could be replicated to prove they weren't fudging the numbers.
 
I assume they wanted to show a serious scenario not limited by GPU that could be replicated to prove they weren't fudging the numbers.
Think about this though;
They carefully released figures for AoTS and Doom (with Vulkan API that uses bespoke AMD Shader Extensions to get the Doom performance), but decide to use an Nvidia GPU to prove they weren't fudging the numbers :)
Also remember how they did slightly fudge the numbers when comparing 480 crossfire to NVidia GTX1080 at the Polaris launch.
So if they could AMD would not use Nvidia product as it detracts from the narrative they are trying to build.

At launch/early press Events, any company will do anything to present their product in the best light and only mention competitor's if it shows them being worst.
However I do think part of this is the CPU Division doing some payback at the Radeon Technology Group managed by Raja and why they happily mention the Nvidia Titan not just at the Ryzen event but also in the official news publication (seems created with the CPU-Gaming division VP and not more universally).
I do not think Lisa Su has the same level of control over these senior personalities as Rory Read did.

Cheers
 
Last edited:
Think about this though;
They carefully released figures for AoTS and Doom (with Vulkan API that uses bespoke AMD Shader Extensions to get the Doom performance), but decide to use an Nvidia GPU to prove they weren't fudging the numbers :)
Also remember how they did slightly fudge the numbers when comparing 480 crossfire to 1080 at the Polaris launch.
So yeah if they could AMD would not use Nvidia product as it detracts from the narrative they are trying to build.

At launch, any company will do anything to present their product in the best light and only mention competitor's if it shows them being worst.
However I do think part of this is the CPU Division doing some payback at the Radeon Technology Group managed by Raja and why they happily mention the Nvidia Titan not just at the Ryzen event but also in the official news publication (seems created with the CPU-Gaming division and not more universally).
I do not think Lisa Su has the same level of control over these senior personalities as Rory Read did.

Cheers

Honestly I'm getting tired of this, it doesn't make any sense man, this blender thing...

I'm going to reset my overclock just to see what my cpu will boost to in the blender benchmark, I don't imagine it will boost to its maximum single core.

In an ideal world they would cut the bullshit and just compare at locked clock parity. This applies to Intel, nvidia as well, the latter playing irritating games with base/boost clocks. Remember the 980ti has a boost of 1075mhz or something, and nvidia alternated between using base and boost for tflop rating, completely skewing comparisons with Pascal.

Not to mention the now industry standard misleading bar charts lol
 
Not to mention the now industry standard misleading bar charts lol
Yeah the bar charts give me the giggles, I swear Nvidia/AMD must outsource this to the same company :)
And a very good point about Blender they also used as that fits perfectly with my point.
Cheers
 
OK, OK

I assume they wanted to show at least look like they were showing a serious scenario not limited by GPU that could be replicated to prove make people think they weren't fudging the numbers. :borg:

EDIT: and to keep solid vega numbers for closer to release
 
OK, OK

I assume they wanted to show at least look like they were showing a serious scenario not limited by GPU that could be replicated to prove make people think they weren't fudging the numbers. :borg:

EDIT: and to keep solid vega numbers for closer to release
Are they really that dumb though? Seems like a very stupid path to take
 
Anarchist, you are giving AMD too much credit for something that hasn't come out yet, for something you don't know what it is. Its all fine with speculation, but yeah most of the stuff you are saying, I don't think are going to happen lol. Because none of what has happened so far, what has been stated so far, really fits in with anything you stated.

There is only two valid reasons for Vega's launch time, HBM 2 and they aren't going to compete at the top end enthusiast level, and we know if its only HBM2 they will launch in Q3 of 2016 if launches after that the second part is going to real too.

Just go by history and what has happened in the past, no need to try to make convoluted reasons for something so simple.
I'm just going off what AMD engineers, including Raja, have said. Just because you don't like or can't accept it isn't may fault. I'm not making convoluted reasons out of anything. Just pointing out rather obvious issues that appear to be affecting the entire market. Granted the original roadmap had Vega launching in Q1.

From what I've seen, my predictions are looking increasingly accurate, although I may have underestimated gains a bit. The theoretical performance matches my design around expected clocks. Benchmarked performance seems reasonably accurate to the design, although apparently with the double rate FP16 not really implemented yet according to Raja. Bunch of anecdotal evidence around suggesting it may be the case, and the design seems like a solid improvement. I've still been pondering over the design and coming up with new possibilities. Seems likely it will have a rather significant (256K L0/1) cache increase to support the scalars though. I'm yet to see any evidence my theory was inaccurate. The only mistake was misinterpreting that original benchmark. That it was coming out well ahead of P100, which shouldn't have been possible. Not without a pair of chips or grossly understated FP16 performance along with bandwidth.

As I mentioned in a different post packed FP16 container has been supported by for awhile though by both but that is not the same thing.
Sorry to disagree but it is not trivial to do the conversion accurately and when there are criticial dependencies within certain cycle time; you waste quite a few cycles (relative to it worked on natively) when it is not native maths and more so when looking to do the maths with a high level of accuracy as it then becomes more complex operation.
I have linked in the past (will be tricky to find) on B3D the dot product functions actual work pertaining to this by some very smart developers working with CUDA, they will have greater experience as it is something that has been part of the science world for awhile compared to modern gaming.
I think Sebbi has also hinted it is not trivial to do accurately, but did not touch on cycles required nor potential complexity this has with the diverse complex critical dependencies you will get with rendering engines and post processing effects, compounded by the mix of using both FP32 and FP16 operations.
A sweeping conversion accurately maybe not, but that isn't required. For for a dev or engineer to set down, look at a costly shader, and decide if the conversion is worthwhile it's pretty simple. No need to perfectly optimize, just hit a few critical sections. A single programmer can probably knock that out in a day. As I'm sure you're well aware this is a widespread practice in the industry already. Otherwise we wouldn't constantly have people complaining about having optimized game-ready drivers on day one. Or driver updates breaking things for that matter. There are entire categories of math that could be rather safely converted. Most normals and fragment data should be safe. Internal mechanisms for backface culling would be fairly safe. May very well be where PS4 Pro is getting the geometry boost over Polaris. Nearly any floating point value falling between -1 and 1 should work. Those cases are somewhat easy to find.

Yeah Sebbbi did a breakdown on what's required, but any programmer familiar with floating point math should be familiar with that. It's simple significant figures that engineering and computer science students learn Freshman year if not in high school. He's also promised to make use of it in a lot of his games and I doubt he's alone. I'd expect it to be showing up shortly after the launch of Vega, possibly along with SM6 and a handful of optimizations there. Likely followed by major titles patching in support.

Sure it won't be 2x performance everywhere, but any gains are welcome. Even a fraction of a 100% theoretical performance increase is a nice gain for anyone trying to optimize. Packed and double rate are different, but if a developer has already determined which variables are suitable for storage as FP16 as opposed to FP32 to save registers that's a great spot to start. I can't think of any cases where values unsuitable for packed FP16 will work for double rate FP16. While the conversion may take more cycles, that's still preferable to a cache miss or lower occupancy because you're out of register space. That's the very reason packed FP16 already exists in the absence of double rate FP16 on many products from all IHVs.
 
You need Windows aka DirectX to support it first. And then engines etc. So maybe in 2 years.

FP16 is standard on smartphones however. Everything there is FP16.
 
Sorry Anarchist but that is not a breakdown, does not cover cycles, or costs incurred on resources (how much gains you can actually get for said operation), and finally frequency-applicabilbity of the accelerated dot product sums-functions.
Again this is an agree to disagree.
Also remember he is a console developer and I covered some of this earlier including how we need to consider how this will be different compared to PC - low level control that is much greater on console and custom design with FP16 is just one aspect.
Lets see how it pans out with the 1st console game to heavily make use of this for PS4/PS4 Pro/Xbox-One/Scorpio and the implications-insights this has not just for consoles but also PC Gaming and porting - key point in mentioning all consoles is to see how the development pans out even just for them.

CHeers
 
Last edited:
I'm just going off what AMD engineers, including Raja, have said. Just because you don't like or can't accept it isn't may fault. I'm not making convoluted reasons out of anything. Just pointing out rather obvious issues that appear to be affecting the entire market. Granted the original roadmap had Vega launching in Q1.

From what I've seen, my predictions are looking increasingly accurate, although I may have underestimated gains a bit. The theoretical performance matches my design around expected clocks. Benchmarked performance seems reasonably accurate to the design, although apparently with the double rate FP16 not really implemented yet according to Raja. Bunch of anecdotal evidence around suggesting it may be the case, and the design seems like a solid improvement. I've still been pondering over the design and coming up with new possibilities. Seems likely it will have a rather significant (256K L0/1) cache increase to support the scalars though. I'm yet to see any evidence my theory was inaccurate. The only mistake was misinterpreting that original benchmark. That it was coming out well ahead of P100, which shouldn't have been possible. Not without a pair of chips or grossly understated FP16 performance along with bandwidth.


A sweeping conversion accurately maybe not, but that isn't required. For for a dev or engineer to set down, look at a costly shader, and decide if the conversion is worthwhile it's pretty simple. No need to perfectly optimize, just hit a few critical sections. A single programmer can probably knock that out in a day. As I'm sure you're well aware this is a widespread practice in the industry already. Otherwise we wouldn't constantly have people complaining about having optimized game-ready drivers on day one. Or driver updates breaking things for that matter. There are entire categories of math that could be rather safely converted. Most normals and fragment data should be safe. Internal mechanisms for backface culling would be fairly safe. May very well be where PS4 Pro is getting the geometry boost over Polaris. Nearly any floating point value falling between -1 and 1 should work. Those cases are somewhat easy to find.

Yeah Sebbbi did a breakdown on what's required, but any programmer familiar with floating point math should be familiar with that. It's simple significant figures that engineering and computer science students learn Freshman year if not in high school. He's also promised to make use of it in a lot of his games and I doubt he's alone. I'd expect it to be showing up shortly after the launch of Vega, possibly along with SM6 and a handful of optimizations there. Likely followed by major titles patching in support.

Sure it won't be 2x performance everywhere, but any gains are welcome. Even a fraction of a 100% theoretical performance increase is a nice gain for anyone trying to optimize. Packed and double rate are different, but if a developer has already determined which variables are suitable for storage as FP16 as opposed to FP32 to save registers that's a great spot to start. I can't think of any cases where values unsuitable for packed FP16 will work for double rate FP16. While the conversion may take more cycles, that's still preferable to a cache miss or lower occupancy because you're out of register space. That's the very reason packed FP16 already exists in the absence of double rate FP16 on many products from all IHVs.


I don't think anyone has talked about the things you have been speculating on yet. I haven't heard anything about those things direct from AMD. Please link, All of these things, no one at B3D has been talking about either and many of them are in the game industry, but they don't talk about it? Why is that? Cause yeah, not going to happen.

Yes you are predicting, and no I haven't seen anything accurate about them yet. Your Fiji predictions and async predictions alike. You are just guessing blind with a dart board.

FP16 has not supported in any API (well sm 6.0 yeah, but no engine supports that yet) so what do you think the time frame will be once Vega comes out and if its has full speed FP16 support? Magic 2 or 3 or 4 years? And Dev's will support it when AMD has 25% or lower marketshare? Use your head man. Speculating is all fine if you start using business tactics that would influence the need for those changes.

So come again?
 
Last edited:
Back
Top