ASUS Vega 64 STRIX Crashing

Joined
Apr 8, 2020
Messages
7
Hey All, I have Vega 64 Asus Strix (ROG-STRIX-RXVEGA64-O8G-GAMING) and I'm struggling with unstabilized, after 1-2 hours of gaming the computer collapse (shut down) and reboot immediately I can't really get solved this issue no matter what I tried.

my system spec -

z390 designare gigabyte

4x16 ram corsair lpx

i9 9900k

Asus vega 64

hx1000i

3xhdd

2xm.2

corsair commander + 6 fans

Noctua dh CPU cooler



- Mobo Bios latest - F9b (also happened with F7) tried default settings and my Hackintosh version bios, no difference.

- AMD Latest driver 20.5.1 (also happened with the ASUS officially driver from their website and the AMD stable version 20.4.2)

- Windows 10 20H1 version

- Power connected with 2 different cables to PSU

- I set my PSU to a multi-rail mode in iCUE software, also tired with single rail mode, no difference.

- tried 2 sticks of ram, swap them, 1 stick, no difference.

- I did MemTest86, everything fine.

- tried different profiles on ASUS GPU Tweaks, also tried to change the bios from the card dual bios option, no difference.

- tried to update the firmware from ASUS website, no difference.

- tried to update the firmware from https://www.techpowerup.com/vgabios/ , again, no difference.

- temperatures are fine, around 70-80 Celsius



really don't know what to do or try, Asus support told that they can't do anything, and I can send the card to them, but I must to use this card for my working OSX machine, so I want to try to point on the problem.



I want to try to replace the thermal pad and change the thermal paste to try again and check if there are any temp issues, like in this post


please help! :)
 

noko

Supreme [H]ardness
Joined
Apr 14, 2010
Messages
5,580
Underclock the GPU and GPU memory and see if that makes a difference or just set the Power Limit to like -50% then work your way up if the problem stops.
  1. Probably a good idea to re paste, since the temperature sensor is registering one part of the GPU, dried up paste can have uneven heat transfer so another part of the GPU could be much hotter than normal
  2. Underclocking or lowering the Power Limit will reduce the heat load on the GPU, which will also reduce the heat load on other parts of the computer, all this will tell you if it is heat/power related to something

What makes you think it is the GPU?

You can pull the GPU and re seat it, other option is try the other PCIe slot for testing purposes. Another graphics card available to try?

What power supply do you have, how old, how many watts? Does the fan speed up when under load on the power supply? Keep it single rail, no reason I know of going dual rail.
  1. As a note, I have a Seasonic 850w power supply on a 6700K plus 2x 1080Ti setup which now when gaming in SLI and both GPU's are working hard will do this as well, after about 1 year of mining 24/7 the Power Supply appears to have gotten weak. You could reduce the power of the GPU and then the problem goes away thinking it is the GPU when actually it is a weak power supply
  2. If you have anything shorting, USB device, motherboard, GPU, card etc. most power supplies will automatically shutdown -> So your power supply could be protecting against a short or low ground
    1. Heat on a component, motherboard itself warped and heat causing a short is not out of the whelm of possibilities
    2. Take everything out of the computer, put it on a box and test using minimal cards, hookups etc.
    3. Any extra power supply cables possibly grounding out? Look for odd things even something shorting out pins on the motherboard, blow out the computer with air, flip it upside down etc. - even shake it and listen for anything loose
  3. Blow out the power supply, clean any filters, (most of my cases have filters for the power supply, one of them I have to actually take the power supply out to clean it! errrr)
  4. Try a different power supply if you have one, currently new ones are very hard to find and/or very expensive
Anyways you have to isolate the actual cause
 
  • Like
Reactions: N4CR
like this

Furious_Styles

[H]ard|Gawd
Joined
Jan 16, 2013
Messages
1,947
You should try to isolate the problem. Run a CPU bench like P95/OCCT and see if it crashes and check temps. Then load up the GPU with a bench from unigen (heaven/superposition) and do the same with temps. You can use MSI afterburner for this. I highly recommend uninstalling all of the asus software.
 
Joined
Apr 8, 2020
Messages
7
Thanks for the reply!
noko great advice and ill try to under-clocking the GPU and see how it will react.
I did mostly all of them, I don't have a spare power supply to check, but I try to disconnect and connected everything again and try to see if I can point on something . this machine is one year old. Seldom gaming, only in the last four months I start to use the GPU more, so everything is still "fresh".
My first thought was that the power supply might be causing some issue. Still, some people told me that the behaviour of my machine that crashed and rebooted immediately does not sound like PSU.. but I tried to check all the wires and people that build the same specs as I did, and I can't find any problem. I don't know how I can measure the PSU without external hardware...

I'll try to make the CPU bench and see what's going on, about the GPU bench; I can't understand what do you mean? To make benchmark and change the values on the MSI burner or in the and tune section ? or just to check temp or stability ? in general, when I'm making the bench test for the GPU its pass every time.
 

blackmomba

Limp Gawd
Joined
Dec 5, 2018
Messages
402
I had your card and also had problems but mine was a driver issue

What happens when the machine crashes ? How many monitors ?
 
Joined
Apr 8, 2020
Messages
7
I had your card and also had problems, but mine was a driver issue.

What happens when the machine crashes? How many monitors?

Hey! Thanks for the comment.
Interesting, so how you fixed your problem ? after install a specific driver or tuning settings?

When it happens, the computer completely shut down, and imidiatlyy run again, like a restart. I'm using one monitor 3440x1440, and lately, I add a small monitor. But this issue happened before I add the small monitor.
 

PontiacGTX

Gawd
Joined
Aug 9, 2013
Messages
756
Try an older driver (read the unsolved problems for the verison you are using and the one you plan to install)version if not RMA it...
 
Joined
Apr 8, 2020
Messages
7
It's an excellent idea. I tried that with the official driver form ASUS website for specific that card, and I remembered that was sable, but it crashed too, I can try to check another old version, do you remember which one was worked for you?
 

DrDoU

2[H]4U
Joined
Jun 4, 2007
Messages
2,624
green screen crash or black screen crash and then reboot. after 20H1 update were drivers,video, reinstalled? Have you tried just the driver only leaving out all of the other features? maybe overlay problems. have you updated to the latest mb chip set drivers? Trying to think of something other posters have left out.
 

ManofGod

[H]F Junkie
Joined
Oct 4, 2007
Messages
12,031
Hey All, I have Vega 64 Asus Strix (ROG-STRIX-RXVEGA64-O8G-GAMING) and I'm struggling with unstabilized, after 1-2 hours of gaming the computer collapse (shut down) and reboot immediately I can't really get solved this issue no matter what I tried.

my system spec -

z390 designare gigabyte

4x16 ram corsair lpx

i9 9900k

Asus vega 64

hx1000i

3xhdd

2xm.2

corsair commander + 6 fans

Noctua dh CPU cooler



- Mobo Bios latest - F9b (also happened with F7) tried default settings and my Hackintosh version bios, no difference.

- AMD Latest driver 20.5.1 (also happened with the ASUS officially driver from their website and the AMD stable version 20.4.2)

- Windows 10 20H1 version

- Power connected with 2 different cables to PSU

- I set my PSU to a multi-rail mode in iCUE software, also tired with single rail mode, no difference.

- tried 2 sticks of ram, swap them, 1 stick, no difference.

- I did MemTest86, everything fine.

- tried different profiles on ASUS GPU Tweaks, also tried to change the bios from the card dual bios option, no difference.

- tried to update the firmware from ASUS website, no difference.

- tried to update the firmware from https://www.techpowerup.com/vgabios/ , again, no difference.

- temperatures are fine, around 70-80 Celsius



really don't know what to do or try, Asus support told that they can't do anything, and I can send the card to them, but I must to use this card for my working OSX machine, so I want to try to point on the problem.



I want to try to replace the thermal pad and change the thermal paste to try again and check if there are any temp issues, like in this post


please help! :)

Well, I own the exact same card, an RMA replacement that I bought from someone on here last September, and I have had zero issues with it. My guess is because it already has the proper thermal pads already installed. Therefore, I would do that, if you have had the card for a few years.
 

PontiacGTX

Gawd
Joined
Aug 9, 2013
Messages
756
It's an excellent idea. I tried that with the official driver form ASUS website for specific that card, and I remembered that was sable, but it crashed too, I can try to check another old version, do you remember which one was worked for you?
I am using 20.3.1

https://www.amd.com/en/support/kb/release-notes/rn-rad-win-20-3-1

Radeon RX Vega series graphics products may experience a system crash or TDR when playing games with Instant Replay enabled. A workaround for users experiencing these issues is to disable Instant Replay.

I dont have this issue but you can install this driver and try this, if it doesnt work try undervolting it and reduce a bit the clock speed it could be a problem with voltages so voltages might be low for load or you might be experiencing overheating (use hwinfo to read gpu,hbm and vrm temperature)

if it doesnt work register your card for warranty and start the process for RMA (I dont know how is done the process with ASUS)
 
Last edited:

N4CR

Supreme [H]ardness
Joined
Oct 17, 2011
Messages
4,687
PSU most likely if it's VGA related. Try using MSI Kombustor at native resolution or something that loads GPU up, it's how it diagnosed my PSU/V64 setup issues. Watch the power lights on side of the card.

V64 crashes are most likley when you don't have FPS limiter e.g. in menus and the fps shoots to 1000 and power spikes - power spikes/transients are most of the issue powering a V64. Undervolting and limiting FPS will make a big difference if that is your issue and you are waiting on a new PSU.
 
Joined
Apr 8, 2020
Messages
7
Hey again, well I did the thermal pad and paste change, and under-clock the GPU from the Asus GPU tweak II to this settings -
GPU clock (Mhz) - 1400
GPU Voltage (mV) -1100
Memory Clock (MHz) - 1900
Memory Voltage (mV) - 1100
Fan Speed - Default
Power target (%) - 90
GPU Temp Target (C) - 78
external fan speed - default

the default settings is -
Gpu clock (Mhz) - 1630
Gpu Voltage (mV) -1200
Memory Clock (MHz) - 1900
Memory Voltage (mV) - 1100
Fan Speed - Default
Power target (%) - 100
GPU Temp Target (C) - 78
external fan speed - default

I can play a bit longer but I for crash-reboot a few times since then ... I'm lost for more ideas
 

N4CR

Supreme [H]ardness
Joined
Oct 17, 2011
Messages
4,687
Voltage is really high for GPU core speed. Do you have another GPU to test?

Also try hbm underclock
 

bluestang

Limp Gawd
Joined
Dec 14, 2018
Messages
171
Undervolt that Vega! Default volts are way too high.

GPU no more than 1050mv. I'm between 1000 and 1050 depending on what I'm using the GPU for.
HBM can be set at 1000mv and still hit stock MHz.

And get rid of ASUS GPU Tweak and use AfterBurner or just use Glabal Wattman in Radeon Settings. I just use Wattman as AfterBurner doesn't let me do/control some of the things I want to.


I have the Sapphire Nitro+ Vega 64 and my settings in Global Wattman:

Performance / Watt Profile set to Custom
GPU Frequency (MHz) set to Dymanic and at 1632 in State 7
Voltage Control (mv) set to Manual and 1000mV
Memory Frequency (MHz) set to Dymanic and at 1000 in State 3
Voltage Control (mv) set to Manual and 1000mV
Fan Speed RPM) set to Manual and 1750 min / 2750 Target
Temperature (C) Control set to Manual and 80 Max / 75 Target
Power Limit set to 20%

Setting to Dynamic allows the Card to self regulate based on temps and load. You will most likely end up with higher MHz too than with stock volts as stock volts cause high HBM temps and high GPU Hot Spot temps.
 
Last edited:

ThreeDee

[H]F Junkie
Joined
Sep 5, 2001
Messages
10,944
what exactly are you running for a PSU? .. I had similar issues with totally different hardware an it was my PSU that was failing. Swapped it out and back to being a stable machine again.
 

PontiacGTX

Gawd
Joined
Aug 9, 2013
Messages
756
if undervolt and underclocking doesn't work then RMA the card if possible if not time to upgrade getting a RX 5700(XT)
 
  • Like
Reactions: N4CR
like this
Joined
Apr 8, 2020
Messages
7
Hey again, back here after another test running and still no solution.

here is a screenshot from stock settings, and default bios -
amit asus.gif


also, I tried to undervolt my GPU based on that guide - https://www.reddit.com/r/Amd/comments/a17lp7/vega_64_definitive_undervoltoverclock_guide_for/

But still, have the same crashing after 20-30 minutes. Also, I noticed that after a stress test and crash, I do another stress test, and it's crash much quicker, so im pointing on temp or voltage from the GPU or the PSU.
Also, I tried to make a test with my ram and try each stick on slot one but a crash in all of them.

still need help :(

p.s - when I mention crash, its complete pc power off and its restart to boot again by himself.
 

sirmonkey1985

[H]ard|DCer of the Month - July 2010
Joined
Sep 13, 2008
Messages
22,225
could be the vrm temps. fan speed seems really low for those temps though almost like it's in silent/quiet/power saver mode(or what ever asus calls it) seeing as how the fans ramp up slightly before you took the picture and then ramped back down to 26%.
 

noko

Supreme [H]ardness
Joined
Apr 14, 2010
Messages
5,580
Hey again, back here after another test running and still no solution.

here is a screenshot from stock settings, and default bios -
View attachment 274828

also, I tried to undervolt my GPU based on that guide - https://www.reddit.com/r/Amd/comments/a17lp7/vega_64_definitive_undervoltoverclock_guide_for/

But still, have the same crashing after 20-30 minutes. Also, I noticed that after a stress test and crash, I do another stress test, and it's crash much quicker, so im pointing on temp or voltage from the GPU or the PSU.
Also, I tried to make a test with my ram and try each stick on slot one but a crash in all of them.

still need help :(

p.s - when I mention crash, its complete pc power off and its restart to boot again by himself.
some Adhoc checks that maybe applicable or not:

  • If your case has a filter for your Power Supply, make sure it is clean. Some forget about the P/S filters, if your case has one
  • Open side of case up and have a fan blow air into it
  • You gotta isolate the problem better, get that GPU in another machine, take it to a shop, friend etc.
  • Reinstall Windows but I would wait until finding out if the card is good or not
  • Test a different P/S, better to just put the GPU in another rig to test
 
Joined
Apr 8, 2020
Messages
7
could be the vrm temps. fan speed seems really low for those temps though almost like it's in silent/quiet/power saver mode(or what ever asus calls it) seeing as how the fans ramp up slightly before you took the picture and then ramped back down to 26%.

well its the default settings, no silent/quite/gaming mode...

some Adhoc checks that maybe applicable or not:

  • If your case has a filter for your Power Supply, make sure it is clean. Some forget about the P/S filters, if your case has one
  • Open side of case up and have a fan blow air into it
  • You gotta isolate the problem better, get that GPU in another machine, take it to a shop, friend etc.
  • Reinstall Windows but I would wait until finding out if the card is good or not
  • Test a different P/S, better to just put the GPU in another rig to test

thanks!
- I don't have any filter on my power supply, and it still shines as the first day I got him.
- ill try that!
- that exactly what I should do, to make sure that its the GPU or the PSU
- did that before, does not help.
- ill try to try this card on a different machine.

btw, if ill flash firmware from another company, it can destroy my GPU? I thought to flash the Saphire firmware and try to see what happen
 
Top