RX 6800 problem

1_rick

Supreme [H]ardness
Joined
Feb 7, 2017
Messages
5,400
So I picked up a Gigabyte RX 6800 right before the price drops this spring, meaning I paid too much for it, of course. It was originally in a Gigabyte Z690-I Ultra DDR4, which was one of the boards that had widespread problems with the primary PCIe slot (I don't know if this is relevant or not.) Gigabyte replaced it for free with a new version, the Ultra Plus DDR4, but it took forever to power on, power off, and sleep/wake (like, 30 seconds to sleep or wake, and twice as long to power on or off.) So I replaced the new board with an Asus Rog Strix Z690-F Gaming Wifi.

At some point the GPU started crashing/freezing. Sometimes it would just crash, others (much more often) I'd get a popup from the Adrenalin software saying the GPU'd stopped responding. This would crash a game if one was running, and if i had browsers open, the browser windows would just go all black, and clicking would reload all the tabs.

I RMAd the video card to Gigabyte, who basically ran Furmark and a few other benchmarks, observed no crashes, and said it was "fixed". Obviously, since I'm posting this, it wasn't--it crashed a day or so after I reinstalled it.

It doesn't do it very often--mostly--maybe once or twice a week, but sometimes it'll go on a tear and do it several times in one afternoon. I guess I could live with it, but I really don't want to.

Anyone have any ideas on seeing if this can be narrowed down further, or if there's anything I could do that might stop it? I did set the PCIe slot to Gen 3 for a while, to no effect. I'm wondering if I should just can the card, which I can afford, but it would suck because of the stupidly-inflated price I paid. (And, of course, there are almost no RX 6800s left, so I'd have to go up or down to a 69x0 or 67x0, or wait for the 7000 series come out, or go back to nVidia.)
 
Have you tried submitting another RMA or just getting a refund from the vendor(probably too late for late)? If you don't have to waste time troubleshooting let them deal with it.

For now, maybe try underclocking the memory & core and seeing if that will make it stable. You can also modify the memory timings temporarily with something like AMDmemtweak. By default they shouldn't be running at fast timing in adrenalin, but there is also an option to generically do that there which could make it unstable as well. My 6900xt would sometimes blackscreen and crash periodically, but that's because I was running it with modified memory timings & overlcocked memory for mining while I used it as a PC. It was pretty rare, but was basically what you described. I also had a 5700XT (bought new from microcenter) that was unstable at stock clocks but seemed to do alright in gaming mostly.
 
Have you tried submitting another RMA or just getting a refund from the vendor(probably too late for late)? If you don't have to waste time troubleshooting let them deal with it.
I just got it back like two days ago from the first RMA, so not yet. As for the vendor, it was Micro Center and I got it in February so it's long past the return window unless they're willing to take it back for being defective, that avenue's closed.

It's running completely stock unless Adrenalin boosted anything without telling me.
 
I just got it back like two days ago from the first RMA, so not yet. As for the vendor, it was Micro Center and I got it in February so it's long past the return window unless they're willing to take it back for being defective, that avenue's closed.

It's running completely stock unless Adrenalin boosted anything without telling me.
Yeah, I guess what I'm saying is that you might've lost the silicon lottery and even at stock it's unstable. That's basically how my 5700XT was. You can lower the memory clocks and max boost targets to test that, and if that still doesn't work you can try modifying the memory timings although that gets more complex.
 
do you have any WHEA errors in event viewer? if so AMD gpus dont like memory errors
None since I put the card back in the system. But the reason the original motherboard was replaced was because it was spewing hundreds of WHEA errors a second--like I said, known problem with some Z690s.

IIRC I tried running this Asus mobo with the PCIe slot set to Gen 3 for a week or so but it didn't help.
 
Huh. Fired up the Radeon software (22.10.1, just upgraded from 22.7.something when I put the card back in) and looked at the performance. I've got a pair of 1440p monitors running at 144 hz, and the software reports the card's vram clocks at 1988 or so, with the graph yellow instead of red like all the others on that tab. Tried clicking the vram overclock button--out of 3 tries, 1st and 3rd caused driver timeouts, and the second was able to push the clock to 2150MHz. Also tried setting both monitors to 60Hz after seeing something on Reddit suggesting high refresh rates can push the vram clock high. Letting the machine sit for 5-10 seconds didn't do anything to the clock rate.

This was all more or less at idle--I had a couple of browsers open with a total of 4 tabs, Arduino and VS Code IDEs open, and the Curseforge app, but no games running; GPU speeds were reported as like 25MHz and 50-60C junction temps. 60-second stress test seemed fine, with the GPU junction temp going into the low 70s.
 
None since I put the card back in the system. But the reason the original motherboard was replaced was because it was spewing hundreds of WHEA errors a second--like I said, known problem with some Z690s.

IIRC I tried running this Asus mobo with the PCIe slot set to Gen 3 for a week or so but it didn't help.
the whea error thing is a known issue, i ran into on that same strix board, cant remember the fix though. googling z690 and whea should get it though.
i also had issues where i had to install an old gpu to get the board to clock down to pcie4.
AND it needed bios updates.
also, you vram thing above, thats normal, mine does it too.
 
the whea error thing is a known issue, i ran into on that same strix board, cant remember the fix though. googling z690 and whea should get it though.
Well, again, the WHEA errors I was seeing was on a Gigabyte board, not this Asus. Gigabyte recalled it. I don't see them on the Strix, and I'm current with the BIOS.

also, you vram thing above, thats normal, mine does it too.
That's good to know, at least.
 
Any friends/family with a gaming rig you can pop it in to and test? It sounds more like its a compatibility or platform issue than the GPU itself. I went around and around and around with this with my 6900XT and ended up building a whole second system to test it and it ended up being a faulty 5800X that caused it.
 
Well, again, the WHEA errors I was seeing was on a Gigabyte board, not this Asus. Gigabyte recalled it. I don't see them on the Strix, and I'm current with the BIOS.
yeah it wasnt just a GB thing, its a z690 thing and one bios setting fixed it. the strix may have been updated since, my issues happened in Aug. as Lig just suggested, id try it in another system if you can.
 
Testing the PSU next is a good idea. What many people aren't aware of is the RM series isn't the same quality as the Corsair RMx series. Not that they are bad units, they aren't. They are more a budget psu when compared to the RMx units.
If you find that the psu isn't the problem, I can give you some pointers when dealing with Gigabytes tech support people that may help.
 
Any friends/family with a gaming rig you can pop it in to and test? It sounds more like its a compatibility or platform issue than the GPU itself. I went around and around and around with this with my 6900XT and ended up building a whole second system to test it and it ended up being a faulty 5800X that caused it.
Well, I do have a 5800X that's currently got a 2070 in it, and I could try swapping cards.
 
yeah it wasnt just a GB thing, its a z690 thing and one bios setting fixed it.
Right, I think at least one other vendor had a problem board.

I had a 1070 in this one for a week or two while the 6800 was being RMAd, and had no problems at all, FWIW.
the strix may have been updated since
I checked this morning, and I'm on the newest, 2004, Strix bios.
 
Is there anything beyond using a PSU tester or following the instructions here? https://help.corsair.com/hc/en-us/articles/360025085372-How-to-Test-a-power-supply-unit

The last time I was in the BIOS the voltage readings looked OK although it obviously only lists one value for each of the three main voltages.
That's basically it. I doubt the PSU is bad since it's only a year old, so maybe check that later (unless maybe you've had local power utility issues that could've damaged it?). The PSU tester they have listed is pretty handy because you don't have to check each set of pins one by one.
 
Is there anything beyond using a PSU tester or following the instructions here? https://help.corsair.com/hc/en-us/articles/360025085372-How-to-Test-a-power-supply-unit

The last time I was in the BIOS the voltage readings looked OK although it obviously only lists one value for each of the three main voltages.
Checking the rails underload with a multimeter would be best. That's most likely to show any anomalies. If you can test it while running your game, particularly if you can catch it black screening, would be ideal.
I'm not familiar with psu testers and don't know if you can use them underload or not.
Testing your GPU in your spare rig as has been suggested, is the quickest easiest way to find out if its a GPU or PSU issue tho.
 
Oh, another thing I forgot to mention is sometimes I'll see a weird flickering, mostly on what I'll call overlays, like when you mouse over a playing Youtube video and the timeline and gui appear over the video. The overlay part will look kind of like the video below, which I've cued up to 19:00 or so. The video is something I just happened to see, and it's entirely irrelevant to this except the cued-up portion is flashing a high-speed LED in slow motion (so, potential seizure warning, too, I guess.) A couple of other things that function as layers over other stuff will do it but I can't remember what was doing it--maybe the windows outlines you get in Win 11 when dragging a window around and potentially triggering aero snap or whatever they call it now.

 
Oh, another thing I forgot to mention is sometimes I'll see a weird flickering, mostly on what I'll call overlays, like when you mouse over a playing Youtube video and the timeline and gui appear over the video. The overlay part will look kind of like the video below, which I've cued up to 19:00 or so. The video is something I just happened to see, and it's entirely irrelevant to this except the cued-up portion is flashing a high-speed LED in slow motion (so, potential seizure warning, too, I guess.) A couple of other things that function as layers over other stuff will do it but I can't remember what was doing it--maybe the windows outlines you get in Win 11 when dragging a window around and potentially triggering aero snap or whatever they call it now.


if you have hdr enable try turning it off.
 
So the flickering or issues with video playback on Chromium based browsers with extended monitors is a known issue. AMD is aware of it and was in driver release notes.

If you switched boards while using the same OS install, I'd try a different install on a different drive with a different Sata cable if possible. To start narrowing the possible causes need to start eliminating items and see if anything changes.

I would also try a complete factory reset uninstall and reinstall for the AMD drivers. As a hail mary I'd also try the card in a different PCIE slot.

If I had to guess, I'd guess it's the card or some conflict of drivers.
 
So the flickering or issues with video playback on Chromium based browsers with extended monitors is a known issue. AMD is aware of it and was in driver release notes.
didn't know about that, but it wasn't happening with the 1070. It has been happening for a while, pretty sure when I still had 22.7.

So far, it does not seem to be happening on the other computer, although I've only been using it for an hour, and have only watched ~30s of youtube.

Computer only had m.2 nvme drives in it. When I sent the card out for RMA, I uninstalled the Radeon software & driver, shut it down, pulled the card, put in the 1070, and then installed the nvidia drivers. Putting it back, I did the same.

Putting the 6800 in this computer, I ran DDU to uninstall drivers for the 2070 that was in it, shut down, swapped cards, immediately ran the AMD driver installer.
 
The other computer has only a single 1080p 75hz display, and I just checked and the vram at desktop (one browser w/3 tabs, and the launcher for Guild Wars 2 running) and the vram clock speed is 192Hz.

When I put the card back, assuming a problem doesn't reveal itself as obvious enough to try to RMA, I'll DDU.
 
GPU speeds were reported as like 25MHz and 50-60C junction temps. 60-second stress test seemed fine, with the GPU junction temp going into the low 70s.
did you try repasting? temps seem high
 
did you try repasting? temps seem high
Hadn't even thought of that, might look into it if nothing else shows up first.

PSU tester should show up tomorrow. Haven't seen any issues so far with the card in the other computer, although all I've played is Guild Wars 2 and modded Minecraft so far.
 
Half an hour of Doom 2016, no problems. 1080p, pretty much pegged at 200fps the whole time.
 
I checked both the GPU connectors, both the 8-pin EPS connectors, 2 of the SATA connectors on the SATA cable (I'm only using one connector, for the fan hub), and the main 24-pin. The manual says that the range for PG should be 100-500ms, and shouldn't blink, but it was blinking. Everything else was solid.

20221018_100004.jpg
 
The card froze up twice today in the other system, dang it--one "the display adapter stopped responding" and it shook it off and kept going, and then, a few hours later, the whole computer froze up.
 
It's locked up the computer twice more since then, hard--nothing but a power cycle works. Anyone wanna buy a mildly-defective card, cheap? :)
 
Is there anything beyond using a PSU tester or following the instructions here? https://help.corsair.com/hc/en-us/articles/360025085372-How-to-Test-a-power-supply-unit

The last time I was in the BIOS the voltage readings looked OK although it obviously only lists one value for each of the three main voltages.
You can test voltages with GPU-Z. This way you can test under load and record the minimum voltage across the entire time.
In GPU-Z, go to the sensors tab. All of the fields showing voltages, click once in each field and it will change to "Min" which means minimum value. Clicking in the fields repeatedly toggles between Min, Max, Avg, and current reading which is the default.
You might also click in the temp fields and set them to Max, check on the hotspot temp if that GPU reports it.

Now, fire up your game and play for an hour or 2. When you exit the game, check the minimum voltages seen.

But, It looks like you bought a tester, but you didn't really need to. But cool to have. But if it doesn't test under load, then check it with GPU-Z.

Since you have determined that is in fact the card, have you tried RMA'ing it again? let them know it takes a few hours to happen, and that you have observed it in 2 completely different pc's. I'm interested in hearing how Gigabyte treats their customers.
 
Back
Top