Volta Rumor Thread - Volta spotted in the Wild

Discussion in 'nVidia Flavor' started by Dayaks, Jul 25, 2017.

  1. GoldenTiger

    GoldenTiger 3.5GB GTX 970 Slayer

    Messages:
    17,304
    Joined:
    Dec 2, 2004
    Thy aren't that far ahead really, the 10 core Intel at same price as thread ripper 16 cores is barely behind except for rare fully threaded loads and we aren't talking by much. It also can be oc'd which ryzen threadripper has no real room for (maxes out at the 4ghz wall). In gaming and anything else, the 7900x wins handily.
     
  2. Maddness

    Maddness Gawd

    Messages:
    685
    Joined:
    Oct 24, 2014
    I just meant compared to the past 10 odd years. AMD haven't even been in the same state, let alone ballpark compared to Intel.
     
    GoldenTiger likes this.
  3. Shintai

    Shintai [H]ardness Supreme

    Messages:
    4,991
    Joined:
    Jul 1, 2016
    Maybe they are not as competitive as you think. Same with people trying to compare theoretical flops between cards and vendors.

    Zen in a lot of cases have the tendency to drop into SB IPC performance plus its latency issue.
     
    Dayaks likes this.
  4. Maddness

    Maddness Gawd

    Messages:
    685
    Joined:
    Oct 24, 2014
    Well imo, they are more competetive in the CPU market than they currently are with GPU's.
     
  5. Shintai

    Shintai [H]ardness Supreme

    Messages:
    4,991
    Joined:
    Jul 1, 2016
    They are also worse of in the GPU segment now than they where with Bulldozer in the CPU segment.
     
  6. tangoseal

    tangoseal [H]ardness Supreme

    Messages:
    5,193
    Joined:
    Dec 18, 2010
    Why?? Mini dp is mich easier than the big retard connector. Plus you can get mini so cables for like 3 bucks on Monoprice.
     
  7. {NG}Fidel

    {NG}Fidel [H]ardness Supreme

    Messages:
    5,703
    Joined:
    Jan 17, 2005
    Ugh wait on Volta or upgrade to the ti. My performance at 2050 mhz is great but I still have that itch.
     
    Maddness likes this.
  8. Maddness

    Maddness Gawd

    Messages:
    685
    Joined:
    Oct 24, 2014
    I'm sure your doctor can sort that.
     
    Aireoth, razor1 and {NG}Fidel like this.
  9. delita

    delita [H]ard|Gawd

    Messages:
    1,054
    Joined:
    Mar 10, 2014
    I feel ya. I've got the OG Titan Xp and the XP and Ti are marginally faster but not enough to be worth me spending the money. But I switched over to 4k and I neeeeeed dem extra frames. O Volta, Where Art Thou?
     
    {NG}Fidel likes this.
  10. razor1

    razor1 JustReason is my Lover

    Messages:
    9,278
    Joined:
    Jul 14, 2005
    aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS9aL1cvNzA1MDIwL29yaWdpbmFsLzIwMi5KUEc=.jpg

    For anyone that thinks this looks like GCN lol, it doesn't. Its much more granular! Every single Sub-ALU and thread are scheduled separately.

    Unlike today's GCN where each block is or other unified architectures which threads are done per SMX or Array.

    Just straight out flops Volta will be 50% faster than what Pascal is at and then 50% increase in efficiency of the SM's power wise

    aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS8wLzQvNzA1MDI4L29yaWdpbmFsLzIxMC5KUEc=.jpg
     
    Last edited: Aug 29, 2017
    delita likes this.
  11. Anarchist4000

    Anarchist4000 [H]ard|Gawd

    Messages:
    1,585
    Joined:
    Jun 10, 2001
    GCN is 4 independently scheduled sub cores sharing memory and texture units. The picture you linked would be a good example of a GCN CU. Minus the scalar, LDS, etc of course.

    For a 7nm part on an even larger die it better. Those improvements seem inline with a node change and larger chip. Theoretical numbers in all likelihood won't be a good indicator of Volta performance. At least not for FP32 derived results. All the hardware scheduling will be more about using those FLOPs efficiently.

    I believe an Nvidia engineer commented somewhere that the hardware scheduling was counter-intuitively more efficient. Makes sense as a compiler has to guess while hardware can decide.
     
  12. razor1

    razor1 JustReason is my Lover

    Messages:
    9,278
    Joined:
    Jul 14, 2005
    Do you know what a sub core is, GCN doesn't have sub cores, it as VLIW unit that is comprised of..... each part can't act separately and if tried to can't be fully utilized at the same time (not the same as sub cores), Volta doesn't haven't this restriction, each sub core in Volta is a scalar unit that is fully capable of doing any type of scalar operation independent of each other, just like Pascal's units before it! SIMD vs SIMT look it up. This is why thread level granularity is important to Volta and why it solves the problems that Maxwell had and much much more than what GCN has.

    Why even talk about a 7nm part for Volta, Volta won't use GF or Samsung, nor is it going to go to 7nm, nV will have another architecture ready for sub 10nm processes.

    Having dedicated units was unnecessary, their Gigathread engine handles all of that now, still hardware driven, just not a separate unit.
     
    Last edited: Aug 29, 2017
  13. Anarchist4000

    Anarchist4000 [H]ard|Gawd

    Messages:
    1,585
    Joined:
    Jun 10, 2001
    The only difference with GCN was that there was a 4 cycle cadence allowing some components to be shared. If duplicated they'd just be sitting idle wasting space.

    I think you misunderstood what the scalar and thread synchronization is. If each lane is an independent scalar, then SIMT no longer exists, as Single Instruction Multiple Thread makes little sense with many instructions. The independent part only makes sense with temporal SIMT with each lane being a different warp and instructions repeating. Only requirement is the ability to put a lot of warps in flight and have a lot of cache to facilitate it. I'll take a second look at the hot chips presentation when I have a full monitor available, but SIMT doesn't make sense in the context given, it'd be MIMD.

    Thread level granularity in that context would be the same workgroup level of synchronization that has existed for some time.
     
  14. razor1

    razor1 JustReason is my Lover

    Messages:
    9,278
    Joined:
    Jul 14, 2005
    Someone what to tell him the definition of SIMD and SIMT, he missed going to class that day.......

    Since AMD uses SIMD not SIMT (nV uses SIMT), AMD has no advantage of doing thread level independence in there GCN (not even that it can't do it, SIMD's are not capable of this) lol, wow just keeps flowing out doesn't it dude. Don't even know the basics of the architectures and making things up as you go along......

    Just horrible man, reading all those white papers and you don't know simple concepts like this, what is the use of all that reading? Its like trying to calculus without knowing how to multiple........

    Now if you comment anymore on this. Its just your folly, cause the differences of SIMD and SIMT are well know and one works at an instruction level/block and one works at a thread level with access to the same instruction and block level (it has both giving it more flexibility but at the cost of latency), guess which ones.....

    SIMT is just an extension of SIMD, with its control starting at the thread level.




    Simple for anyone to understand

    This is the difference, not the crock you just posted lol.

    http://yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.html

    This is the difference as I said one is scheduled at a thread level and one at the Block level. Man just gets better and better as you post.


    Now AMD's ALU's are scheduled per warp/wave AT THE SAME TIME there is no Independence cause at the warp/wave level, its doing the same thing as Maxwell and Pascal, gets all the info and breaks it down and spills it over to the units. Maxwell and Pascal have one additional step prior to this as I stated they look at the threads and put them based on the SMX blocks. GCN doesn't do that (hence why its an SIMD), and this gives it more granularity at a block level, because it doesn't shuffle the threads around it takes everything as the same. This is also why divergence doesn't happen as much in GCN.

    Volta takes a it a step further than current SIMT GPU's, it takes those threads and look at each individual thread and analysis and predicts them and then assigns them on a per block/ALU level, and that is why the sub cores need to work independent from each other, since with Volta's model things are going out of sync even at a thread level, (async to be exact) so than at an instruction level its also asynchronous all the time.

    Now since I got the definitions out of the way for you of the different shader arrays can you now understand went wrong with your assumptions about Volta? It doesn't' look anything like GCN cause its not a DAMN SIMD, its a SIMT. DAMN trying to change definitions to your own twisted view too lol, funny as FUCK.

    If you can't understand that, just understand this, SIMT is a model between SMT and SIMD, it has both capabilities, both their weakness and both their strengths. Volta reduces those weaknesses by increasing the SIMT's granularity at a thread level and when looking at the block or ALU level GCN doesn't even come close to that type of granularity that Volta achieves.

    What nV did from Pascal to Volta, you can look at is as what happened with branching performance from back in the days. 6800 was not good at dynamic branching, it worked but wasn't that great, when ATi created a chip that was great a dynamic branching because they had more granularity on their 1x00 chips, nV was still using 6x00 type architecture in their 7x00 series chip. The extra dynamic branching performance didn't do squat for ATi chips at the time, the 7x00 chips kept up for the most part. Just like now with Pascal it keeps up (equal priced cards) with Vega when it comes to async compute. Once the G80 came out, its dynamic branching performance was leagues ahead of the dynamic branching performance of ATi's 1x00 chips and better than that of the r600 too. With Volta its entire pipeline starting from the threads are asynchronous, not just the instructions (what we see with GCN).

    keep this in mind GCN is closer to Fermi's architecture than it is to Maxwell, and much further away from Volta. But when scheduling is concerned GCN never worked at a thread level like Maxwell, Pascal or Keplar did/does, let alone Volta.


    Back on my ignore list.
     
    Last edited: Aug 30, 2017
    GoldenTiger, DLGenesis and Raendor like this.
  15. Anarchist4000

    Anarchist4000 [H]ard|Gawd

    Messages:
    1,585
    Joined:
    Jun 10, 2001
    SINGLE Instruction Multiple Data/Thread

    That means ALL lanes are on the same instruction (single instruction), but with SIMT the location of the data need not be consecutive. So while looping through repetitive ADD, MUL, etc; SIMT can jump around a bit more. As your blog clearly says, SIMD does the same exact thing with permutations. DPP, swizzle, and LDS with GCN when that behavior is occasionally beneficial. That "high latency" hidden with overlapping waves anyways and "high" ultimately determined by the caching subsystem. It's only practical if indexing relatively coherent addresses.

    Now if you even went to a class on basic computer architectures you still failed it. Then went on a forum, broadcasted said ignorance, and made a fool of yourself. You need only count to one to understand this shit!

    If you're going to go on a forum and try to educate people, at least get the basics right! All those words to still not understand wtf is actually happening.
     
  16. razor1

    razor1 JustReason is my Lover

    Messages:
    9,278
    Joined:
    Jul 14, 2005
    yeah I never stated otherwise, I stated the threads are setup different lol, do you know understand, nV's architecture also has thread scheduling that is quite different from GCN, which has been there since Keplar?

    GCN doesn't thread schedule, from a instruction level, or block level, the chips work very similiar, that is what I SAID, god reading comprehension is gone too.

    here is the quote for ya dude


    IN GREEN!

    WOW just unbelievable.

    at this point you can't even count to 1 cause you can't even read what I typed ;)

    I knew you wouldn't understand it because at the base SIMT is SIMD, but it has additional features at a thread level. You didn't know that, you just assumed that nV is going to use SIMD, they did in Fermi, they went away from that because thread scheduling was needed for better throughput. You would have know this if you have used the utilization calculators while doing CUDA programming. Of you course you didn't because you aren't a programmer, you just sit there and tell EE's and programmers that they have no common sense. Yet the end result is I needed to go back to definitions of what the different architectures are to show you Volta is not like GCN, no where FUCKIN NEAR.
     
  17. Anarchist4000

    Anarchist4000 [H]ard|Gawd

    Messages:
    1,585
    Joined:
    Jun 10, 2001
    The only difference is the addressing per lane as opposed to incrementing by lane. Scheduling is the exact same beyond the overlapping nature of GCN.
     
  18. DooKey

    DooKey [H]ard DCOTM x4

    Messages:
    6,080
    Joined:
    Apr 25, 2001
    Why the fuck are we talking about GCN in a Volta thread. Please ban this AMD fan parody account.
     
  19. razor1

    razor1 JustReason is my Lover

    Messages:
    9,278
    Joined:
    Jul 14, 2005

    That is where you are completely wrong, thread scheduling is completely different, I suggest you read on Fermi SIMD to SIMT papers, nV and others (colleges) have wrote 2 or 3 of them actually even more.

    Tesla was SIMD, Fermi was already SIMT, got mixed up,

    https://courses.cs.washington.edu/courses/cse471/13sp/lectures/GPUsStudents.pdf

    This goes through all the different types of GPU architectures current available and MIMD, since you mentioned that to.

    You can't tell me they are the same or even close, each one its day and night differences.

    MIMD/SPMD seems like what Volta has but nV papers still call it SIMT, so most likely I would believe its SIMT, there has to be some conditionals in there that stop it from being a full MIMD. We will know that in the future. But do you see Even SIMT being close to SIMD? or if its closer to MIMD that is so far different that SIMT, how do you think Volta looks like GCN?

    AMD isn't the center of the universe when it comes to GPU design, they haven't been since the G80 was released, Architecturally for GPGPU they have been pretty much insignificant since then. They actually have been one step behind since G80, remember the r600 was still a VLIW all the way up to the HD6xxx series. Prior to that they did have some serious advantages when it came to their pipeline when ATi was there, but they couldn't capitalize on it because there was no market, who created that market, nV did. (this reflects their HPC and DL market share and Raja stating they ware just happy they can have something on paper now as a comparison, yet they can do everything Volta can yeah ok *sarcasm*, they can't even do what Pascal, or Maxwell can do at a thread level, but you make it sound like Volta is GCN and then it has independent thread scheduling, which NO GPU has right now)
     
    Last edited: Aug 30, 2017
  20. KazeoHin

    KazeoHin [H]ardness Supreme

    Messages:
    6,507
    Joined:
    Sep 7, 2011
    You mean after Vega fumbles and trips out the door?
     
  21. Dayaks

    Dayaks [H]ardness Supreme

    Messages:
    4,974
    Joined:
    Feb 22, 2012
    The 1080 launched two months after P100. I'm going to get myself all delusional that V104 will launch in a month.
     
    {NG}Fidel, Shintai and KazeoHin like this.
  22. Anarchist4000

    Anarchist4000 [H]ard|Gawd

    Messages:
    1,585
    Joined:
    Jun 10, 2001
    Threads aren't scheduled, warps are as groups of threads following similar paths. If they diverge only portions are executing with lanes masked off. SIMD means all grouped threads share the same instruction with different inputs. Not unlike multiplying across columns in a spreadsheet. If those colleges say otherwise then they are simply wrong. SIMT is still SIMD, but able to vary those inputs slightly. Often that occurs when all lanes are indexing the same data. There are no independent threads being scheduled.

    As I said before, SIMT is the equivalent of feeding permutations or swizzling the inputs. Has absolutely nothing to do with scheduling. In the case of SIMD, other hardware can pack a vector and achieve the same result. That's why SIMT never existed prior to Nvidia as it's essentially SIMD with a marketing name.

    It's likely still the same SIMD/T design, but as I explained above external hardware is handling the inputs. Probably a form of dynamic warp packing from a thread group to ensure all threads in a warp are executing the same instruction instead of masking them off as I suggested when that first Volta blog released. That's not too much of a stretch from what already exists. In the case of MIMD, some lanes would have different instructions, which isn't the case here.

    Volta looks like GCN in how warps get to the execution units. Hardware to facilitate async behavior and fences/barriers. Pausing a queue until conditions are met to avoid involving the CPU and latency that would incur. That's been part of the problem with Maxwell and Pascal. That will be a good thing as the programming model will finally be consistent for devs, especially coming from console and increasingly asynchronous behavior with VR and decoupled, GPU driven rendering. Then a hardware ability to switch context at a warp level, but packing blurs the lines a bit. That's where the independently scheduled line comes from. It's a bit different in that Volta likely works with thread groups as opposed to GCNs independent waves, but that's irrelevant to the programming model as it's handled in hardware and largely transparent to the programmer. AMD and Intel could do something similar if they felt the trade-off was worthwhile. The concurrent execution will be derived from sub cores in Volta and within a SIMD for GCN.
     
  23. Factum

    Factum Gawd

    Messages:
    933
    Joined:
    Dec 24, 2014
    Kinda doubt it...VEGA doesn't really put pressure on NVIDIA...so they can take their time....and earn even more profits on the PASCAL R&D....to be fuled into more R&D.
    But I will bet that the VOLTA launch in consumer space will be very different from the VEGA launch.
     
  24. Shintai

    Shintai [H]ardness Supreme

    Messages:
    4,991
    Joined:
    Jul 1, 2016
    I think Geforce Volta line is waiting for GDDR6 mainly across the entire lineup. I wouldn´t be surprised if GV106 was back to a 128bit controller, yet still featuring 224/256GB/sec.
     
  25. razor1

    razor1 JustReason is my Lover

    Messages:
    9,278
    Joined:
    Jul 14, 2005

    Depends on the size of the chip I don't see Volta being smaller than Pascal for the same tier chips. We already know TSMC 12nm is cheaper per wafer than 16nm too.

    This is also why I don't think we will see GDDR6 in anything lower than the enthusiast segment, Those will be the only chips that will need more than 500 Gb/s (probably be @ ~750 Gb/s).

    So if the chip is larger, no need to go with a smaller bus and use more expensive vram, that is actually counter productive against margins. And we have seen nV having no issue in throwing in any type of ram across entire line ups, doesn't matter if a 1060 or higher 1070, they will use what is necessary for the bus size.
     
  26. Shintai

    Shintai [H]ardness Supreme

    Messages:
    4,991
    Joined:
    Jul 1, 2016
    I dont see how using a narrower bus have anything to do with that? :)

    Being able to skip 64 or 128bit GDDR5 would be great for costs. Also why GV104 is 256bit and GV102 384bit.

    Remember you have 3 companies with full scale GDDR6 production.
     
  27. razor1

    razor1 JustReason is my Lover

    Messages:
    9,278
    Joined:
    Jul 14, 2005
    There hasn't even been rumors about those chips yet and we have no clue outside of estimates of the GVxxx chip sizes done by looking at V100, and for GV104 its larger than GP104 there is no way around it, its closer to the GP102 in size.

    Initial costs of GDDR 6 any newly made ram costs are always high, because its volume will be low, and the only one that will have GDDR 6 in early 2018 is Samsung. All others will come later on in the year.
     
  28. Shintai

    Shintai [H]ardness Supreme

    Messages:
    4,991
    Joined:
    Jul 1, 2016
    But what has that to do with GDDR6?

    We know GV100 uses 4 stacks of HBM2 for 900GB/sec.
    We know GV102 will use 384bit GDDR6 at 16Ghz for 768GB/sec (Hynix says so).
    We pretty much know GV104 will use 256bit GDDR6 and have 448-512GB/sec.

    For GV106 there are 2 options. 256bit GDDR5. Or 128 or 192bit GDDR5X/GDDR6.
     
  29. razor1

    razor1 JustReason is my Lover

    Messages:
    9,278
    Joined:
    Jul 14, 2005

    GP104 if it has the space to use a bigger bus (more pads), it can use GDDR5x and achieve enough bandwidth to sustain its performance. And also it will improve nV's margins. Simple, less cost, same performance.

    The only reason to do a larger bus, is because the chip already has the die space for the extra pads. The bus itself is not costly nor are the pads, just need the space for the pads.
     
  30. Shintai

    Shintai [H]ardness Supreme

    Messages:
    4,991
    Joined:
    Jul 1, 2016
    A bigger bus means more die space, more I/O power draw and that's cost.

    And GDDR5X is replaced by GDDR6. GDDR5X is essentially just a prototype for GDDR6 with very tiny differences.
     
  31. razor1

    razor1 JustReason is my Lover

    Messages:
    9,278
    Joined:
    Jul 14, 2005

    The die space is already there, going from a 256 bit bus to a 385 bit bus is nothing, the silicon that it will take up is less that 5% difference when we start looking at the. I/O power draw does increase yes, but that can be equalized by the speed of the vram
     
  32. Shintai

    Shintai [H]ardness Supreme

    Messages:
    4,991
    Joined:
    Jul 1, 2016
    The die space isn't there. You have to increase the die.

    As long as you cost wise can simply go with faster memory then that's the route. Just as GP104 didn't use a 384bit GDDR5 bus for obvious reasons.

    History have shown this over and over, again and again.
     
  33. razor1

    razor1 JustReason is my Lover

    Messages:
    9,278
    Joined:
    Jul 14, 2005
    GV104 will have the die space.

    The bus does increase it much for like less that 5% increase. The pads are what is more important the space needed for them. If the die is not big enough to support the space needed for the pads, then the die space will increase ALOT. Then it because useless to do if the option of faster vram is there.
     
  34. Shintai

    Shintai [H]ardness Supreme

    Messages:
    4,991
    Joined:
    Jul 1, 2016
    Yes if you add another 15mm2 or so and 10-15W instead of using faster memory. And that's all without counting for the ROPs you need to increase too and L2.

    GV104 will be 256bit.
     
  35. razor1

    razor1 JustReason is my Lover

    Messages:
    9,278
    Joined:
    Jul 14, 2005
    My estimates puts GV104 at 380 to 420mm, it can use a bigger bus easily.

    so yeah 15mm2 is right now what is the cost of that 15mm2 on a per chip level, its not much on 16nm then compare the cost of the vram. We already know the process is cheaper for nV then 16nm too.

    As I stated Samsung will be the only vendor of GDDR6 for the first half of 2018. Supply will be limited.

    And as I stated, nV doesn't stick to any one thing, it all depends on what the final cost of the product is and what margins they can get out of it.
     
  36. Shintai

    Shintai [H]ardness Supreme

    Messages:
    4,991
    Joined:
    Jul 1, 2016
    Micron and Hynix disagrees..
     
  37. razor1

    razor1 JustReason is my Lover

    Messages:
    9,278
    Joined:
    Jul 14, 2005

    Sorry it was Hynix not Samsung, err Micron stated early 2018, I don't trust em. Samsung slipped from 2017 to 2018, so that is tenuous too.

    And anyways even with 3 companies starting mass production in 2018, they still need to ramp up. Which you and I know it takes 1 quarter. Now if the performance segment launches first like nV has been doing for the past 2 gens, there will not be enough volume of GDDR6 to sustain that launch, it will end up like what we see with Vega. Supply will be that limited. You think nV will do that? I haven't see them do that since the 6800 ultra launch.
     
  38. Dayman

    Dayman Limp Gawd

    Messages:
    195
    Joined:
    Jul 12, 2017
  39. Dayman

    Dayman Limp Gawd

    Messages:
    195
    Joined:
    Jul 12, 2017
    razor1 and Shintai like this.
  40. Shintai

    Shintai [H]ardness Supreme

    Messages:
    4,991
    Joined:
    Jul 1, 2016
    The HS colour makes all the difference :D
     
    Dayman likes this.