Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
https://youtu.be/8hnuj1OZAJs?t=91
https://youtu.be/8hnuj1OZAJs?t=136
I was just watching some gameplay footage with this card at 4K. Look at what happens at the 1:39 and 2:20 mark, massive stutter. It's like the card is running out of VRAM because the usage goes down a lot when this happens, seems like it's swapping new assets in/out.
This was the first thing that popped into my head when I learned that the HBM would achieve it's high bandwidth by being extremely wide.
Now it seems that the Fury X has 8x 512bit wide memory controllers. Totalling 4096 bits. It's also dual issue.
For comparison the 980ti has 12x 32 bit memory controllers (980 vanilla has 8).
GDDR5 chips are 32 bits wide (fitting nicely) and transfer 2 lots of data per clock. So a minimum of 64 bits of data is written per clock to one area of memory.
In the case of the Ti, that means each controller writes a minimum of 64 bits per clock to one page of memory. It cannot write to multiple pages simulatenously, so in order to get the most out of it's controller it is required to cache memory accesses into contiguous 64 bit lumps. That's obviously not so difficult - but memory granularity is the precise reason that we have 12x32 bit controllers instead of 1x384 bit controller (because those controllers cost die space and transistors!). It's not so difficult because everything is 32bit nowadays and when you are writing a framebuffer and z-buffer pixel you are writing 64 bits of data (w/o compression).
However, whack the bus width up to 512 bits, remaining dual issue and suddenyl every time you write to memory you need to write 1kb of data in one go, or else you are wasting bandwidth. If you have a ton of stuff to write, and it's all over different areas of memory, you're boned because you can only write to one memory page at a time. GDDR5 has a page size of 2kb (no idea about the HBM implementation). Changing your memory page incurs a latency penalty.
Quite clearly, the memory granularity issue with HBM just got a whole lot more severe. I wanted to know what AMD has done to alleviate this. How muhc extra cache does Fiji have to handle this?
Can somebody do some testing with single pixel polygons? this is how you'd show up memory granularity tanking the efficiency of a memory bus, throw millions of single pixel polygons at it that aren't connected to each other, so are separate objects with separate draw calls.
I'm willing to bet that doing a test like this would hurt both cards a lot (because single pixel polygons are bastards for efficiency everywhere, they're a worst case scenario I suppose) but I think it might absolutely kill the Fury.
If it does, then AMD really need to work on their caching to make best use of the memory bus.
Or... I could be talking a whole lot of hot air
The normal fury might prove to be a lot better for those reasons as well because they would probably have some better cooling on the VRMs..
Makes you wonder but im sure nvidia has ran some weird test to boost things.
Our focus also was not entirely spent at 4K either.
I for one will not ignore 1440p gamers, that is the resolution where you are able to mostly maximize settings in most games with new cards, and pretty much the resolution to use if you want great looking PC gaming with maximum game settings at acceptable performance. 4K, you cannot max out games, the GPUs aren't powerful enough yet. 1080p is mostly a given, every game will perform great at 1080p on high-end cards, but 1440p presents a challange still for some, and is IMO the best test of a graphics card. 4K gaming as I said is growing, but it is no where near the saturation levels of 1440p and 1080p PC gaming.
someone on reddit talked about the differences between GDDR5 & HBM in terms of how to feed data to the GPU
https://www.reddit.com/r/Amd/comments/3b6n1c/absolutely_nobody_is_talking_about_fury_x_and/
I'm inclined to think it's a driver issue and AMD has a lot of work to still do in optimizing for HBM usage otherwise I assume we'd have heard complaints about stutter from 2x GTX 980 or 2x 290X set ups by now with their 4GB frame buffers
Nvidia didn't need any help.AMD just helped nvidia sell a ton of video cards
https://youtu.be/8hnuj1OZAJs?t=91
https://youtu.be/8hnuj1OZAJs?t=136
I was just watching some gameplay footage with this card at 4K. Look at what happens at the 1:39 and 2:20 mark, massive stutter. It's like the card is running out of VRAM because the usage goes down a lot when this happens, seems like it's swapping new assets in/out.
I don't think it's likely at all, At least, not strictly a vram issue. Even if you look at [H]s own data, Fury is not as far behind a 980Ti at 4k as it is at 1440p. If it was strictly a VRAM issue it would tank harder at 4k but instead it's narrowing the gap.
I can run 200% resolution scale on BF4 which is for all intents and purposes, 4k (actually a bit higher since i'm at 1200p) and it won't pause like it did on that video, and that's with 2GB cards. I have NEVER seen a game pause like that because vram was saturated.
It's strange because you can see his VRAM tank by around a 1GB when it happens. Maybe nVidia's drivers are better to handle the situations due the 970 fiasco? (bahaha) His VRAM gets up to around 3.6GB then dumps down to 2.7GB ish when the stuttering happens.
Thought I saw and AMD slide saying that Fury X was the "gateway to 5k gaming." 5k would be quad 1440p. They must have been referring to 5k solitaire. Form what i understand, the heirachy will be Fury X, Fury, Fury nano, then 390x. I also recall them saying that Fury nano will be substantialy more powerful than the 390x. The problem is that the FURY X is not much better than the 390x. it will be interesting to see how the other two cards thread there way in there.
It's definitely a superior technology to GDDR5.
Yes and who needs that superiority when GDDR5 still bests it today.
HBM is an improvement on just about everything. It's important to not confuse Fury's performance with HBM, the only reason it's even performing as well as it is, is because of HBM. A lot more bandwidth, less power consumption and a much smaller package. It's definitely a superior technology to GDDR5.
Except it doesn't. 980Ti > Fury X isn't the same thing. Otherwise nvidia wouldn't bother with it with pascal. If Fury X was using gddr5 it would be slower and consume way more power.
A delayed game is eventually good, but a rushed game is forever bad.I'm tired of the drivers excuse. [H] has to review the card as it is, not as it could be in 1, 3, 6, or 12 months. It's not like AMD just got cards to use. They've had cards for many months. They've had plenty of time to work on drivers. Unless it's a really small team which would be a shame.
There is a downside that may be the reason for some of the performance issues, Memory Granularity, it was mentioned before.
The size of a data write is much much larger on HBM and can lead to not being to complete a write in one pass because when data has to be placed in more than one page, only one page can be written to at a time.
This can lower max bandwidth and increase latency.
More info.
https://www.reddit.com/r/Amd/comments/3b6n1c/absolutely_nobody_is_talking_about_fury_x_and/
I'm tired of the drivers excuse. [H] has to review the card as it is, not as it could be in 1, 3, 6, or 12 months. It's not like AMD just got cards to use. They've had cards for many months. They've had plenty of time to work on drivers. Unless it's a really small team which would be a shame.
That's all dependent on the instruction widths supported and if there's a delayed write cache implemented.
I can byte align, word align DWORD align, QWORD align my code. DWORD is the most efficient in terms of speed. But then I waste memory if I'm writing single byte structures.
I'm tired of the drivers excuse. [H] has to review the card as it is, not as it could be in 1, 3, 6, or 12 months. It's not like AMD just got cards to use. They've had cards for many months. They've had plenty of time to work on drivers. Unless it's a really small team which would be a shame.
How does the driver excuse affect you? You are using a nvidia card till it dies(according to your signature. Go about your day and don't let what some company you probably have never used before bother you.
How does the driver excuse affect you? You are using a nvidia card till it dies(according to your signature. Go about your day and don't let what some company you probably have never used before bother you.
I'm tired of the drivers excuse. [H] has to review the card as it is, not as it could be in 1, 3, 6, or 12 months. It's not like AMD just got cards to use. They've had cards for many months. They've had plenty of time to work on drivers. Unless it's a really small team which would be a shame.
so the launch drivers only exposed 57.41% of the card's actual performance as compared to today.
well in reality there is not that much time between final silicon and launch....they do the best they can and unfortunately for BOTH NVidia and AMD there is no way their launch drivers will ever be truly polished.
AMD 290X shows a HUGE improvement between launch and now in terms of driver performance
Jan 14 HOCP reviews 290X and it does only 1080p in the Dying Light game. now (as of last week) it runs 2560*1440 with the same setting and frame rates (well within 2fps)
doubt it?
http://hardforum.com/showpost.php?p=1041694207&postcount=74
2073600 * 59.3fps = 122964480 pixels a second
3686400 * 58.1fps = 214179840 pixels a second
so the launch drivers only exposed 57.41% of the card's actual performance as compared to today.
hmm can't compare the review, different level, different settings.