Random reboots: PSU or GPU at fault?

Discussion in 'Power Supplies' started by cheesechoker, Nov 14, 2014.

  1. cheesechoker

    cheesechoker n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    • Windows 8.1
    • Intel Core i7-3770 3.40GHz
    • Gigabyte B75M-D3H mobo
    • Corsair CX600M 600W power supply
    • Zotac Geforce GTX 770 4GB (ZT-70304-10P) @ 344.65 driver
    • Stock everything no overclocking
    ^ I just upgraded my video card & PSU to the CX600M and GTX 770 shown above.

    Since then I've been getting random restarts while playing Skyrim. These are hard reboots, not driver failures. (I turned off Automatic Restart, and there's no bluescreen -- PC just shuts off.)

    At first I suspected overheating, so I took some GPU-Z logs of the crashes (see below). GPU temp is <50°C. Power consumption is also <40% of TDP. This seems quite modest. Yet my PC dies regularly after 10-30 minutes of playing.

    (Skyrim @ 2175x1527 DSR, Ultra settings, hi-res texture pack)

    I tried swapping in my old GTX 650 and it runs fine for hours with no crashes. But that doesn't necessarily mean the 770 is at fault, since the power requirements are different. It could still be the PSU not putting out enough juice for the 770, right?

    Should I return the PSU or the video card? Both? I have another 1-2 days left for in-store return. After that I'd have to return on warranty to the mfr.
     
  2. Araxie

    Araxie [H]ardness Supreme

    Messages:
    6,312
    Joined:
    Feb 11, 2013
    definitely faulty PSU.. RMA it.. but just out of curiosity, check your chip voltage and temps.. it only happen in skyrim or any other game?.
     
  3. cheesechoker

    cheesechoker n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    So far I've only been testing in Skyrim but I'll try some other games to make sure.
    Is this any different from checking the "VDDC" and "GPU temp" fields that GPU-Z outputs? I looked at the values it logged just before my PC crashed and they seemed fine...
     
  4. Araxie

    Araxie [H]ardness Supreme

    Messages:
    6,312
    Joined:
    Feb 11, 2013
    sorry bud, forgot to say processor.. voltage and temps from the processor..
     
  5. cheesechoker

    cheesechoker n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    So.. part of the trouble was again my own stupidity: I forgot the PC was plugged into a UPS, which is only rated for 400W. Power consumption at the UPS was approaching 400W just before the PC shut off. So that's no good. I moved the PC outside the UPS. I also switched my tests from Skyrim to Furmark, because it triggers the reboot problem a lot faster.

    Despite those changes my rig still isn't stable. Furmark stress test results are below.

    Code:
    Target temp   Power usage    Result
    ================================================
     79C           100% TDP     Crash after 12 mins
     65C            59% TDP     Crash after 12 mins 
     60C            52% TDP     Crash after 14 mins
    
    (Furmark "Furry Donut v2" @ 1920x1080, 8xMSAA)
    
    I'm using EVGA Precision and it won't let me set the target temp any lower than 60C. At 60C the GPU is only running @ 731mHz, only drawing about 120W, and it STILL can't last 15 minutes??
    At this point I'm leaning toward the GPU being faulty.

    My friend will be back next week, so I'll try the GPU in his rig and verify.
     
  6. cheesechoker

    cheesechoker n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    Good point, I'll try another stress test and log the CPU stats this time.
     
  7. cheesechoker

    cheesechoker n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    I tried stress testing the processor by itself using Prime95.

    I found that certain Prime95 tests will cause the PC to shut off too. So now there are 2 ways I can cause the PC to shutoff: running Furmark on the GTX770 or Prime95 running solo.

    Even weirder: it's always the same Prime95 tests that kill it.
    "Inplace Large FFT" or "Blend" = crash in 15 mins or less.
    But "Small FFT" seems to work fine (I've run it for 30+ mins, several times, with no problems).

    Here's a shot of Hwmon showing temps & voltage: http://i.imgur.com/i0zz4qA.png
    CPU temps are 85-88C at peak.

    The more tests I run, the less I understand... Will have to start swapping out components until Prime95 becomes stable.
     
    Last edited: Nov 20, 2014
  8. Tsumi

    Tsumi [H]ardForum Junkie

    Messages:
    12,967
    Joined:
    Mar 18, 2010
    Large FFT and Blend would use more power than Small FFT I believe. Since your crashes are in high power scenarios, it's likely you have a defective power supply. As 600 watts should be more than sufficient to power your system with overclocking. Yet it's crashing with only one of the two main power consumers loaded.
     
  9. Sith'ari

    Sith'ari Gawd

    Messages:
    573
    Joined:
    Oct 13, 2013
    Can't you borrow a power supply from a friend? That would easily resolve the matter.!!
     
  10. cheesechoker

    cheesechoker n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    Thanks for the advice. You guys are probably right. I will strong-arm my friend (well, acquaintance) to let me borrow his PSU.

    Even if he says no, since I've established that CPU load alone can crash the PC in Prime95, I can also swap in my old 500W PSU and compare. If it gets stable with the old PSU, then probably my new PSU is defective.

    The only other possible cause that occurred to me is memory: I believe Small FFT tests fit into cache, but Large and Blend do not, so they test the RAM to some extent. Since those were crashing, maybe the RAM or the CPU-memory bus becomes unstable at high temps?
     
  11. cheesechoker

    cheesechoker n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    Well, I returned the PSU to the store and got a replacement (same model: CX600M). Tested it with Prime95 and it rebooted in less than 5 minutes of testing.

    So that narrows it down: it's either the motherboard, the memory, or the CPU.

    I am surprised it wasn't the PSU though -- just before I returned it today, I swapped in my old 500W Antec, and ran the same Prime95 test for 35 minutes with no problems at all. That proved that the CX600M was faulty, in my mind. Guess I was wrong.
     
    Last edited: Nov 22, 2014
  12. Sith'ari

    Sith'ari Gawd

    Messages:
    573
    Joined:
    Oct 13, 2013
    The replacement unit they sent you was brand new inside a shielded box i presume?
    Or did they return you the one you have sent them in the 1st place (supposedly fixed!!)?
     
  13. cheesechoker

    cheesechoker n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    I didn't have to send it out: the store replaced it on the spot. I got a brand-new CX600M off the shelf, in retail box. There's no chance that 2 of these PSUs in a row could be faulty (??) so I'm going to try removing sticks of RAM now.

    And if that doesn't make a difference, I guess I'll RMA my motherboard :(
     
  14. cheesechoker

    cheesechoker n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    I installed an aftermarket CPU cooler I bought earlier. Temps are 15-20C less than before: idle 30-40C, peak 68-70C. And that (knock on wood) seems to have fixed my problems with CPU stability. I can run any Prime95 test for at least 30+ mins with no issues. IntelBurnTest works fine too.

    However, it is still not stable under combined CPU+GPU load.

    Here's my OCCT results:
    • CPU test: works OK for 30+ mins
    • GPU test: crashes in under 10 mins
    • PSU test: crashes in under 3 mins

    Tried some memory-related troubleshooting: removing some sticks, using only 2 slots instead of 4, boosting the DRAM freq from 1.5V to 1.6V, and increasing the clock from 1333mHz to 1600. None of this made any difference (except the PSU test died even faster @ 1600mHz).

    This is frustrating. Guess I'll RMA the video card?? I don't want to RMA the mobo unless I absolutely have to -- it would put my PC completely out of commission.
     
    Last edited: Nov 23, 2014
  15. Sith'ari

    Sith'ari Gawd

    Messages:
    573
    Joined:
    Oct 13, 2013
    What i don't understand is why with the Antec PSU everything worked without problems, that's why i asked you about the RMA in my previous post!! :confused:
    If with Antec everything works like clockwork, then no matter how bizarre might seem, the 2nd PSU, might also be faulty!! :(
     
  16. cheesechoker

    cheesechoker n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    Yeah, that confused me too: if the CPU was crashing under Prime, why did it appear to work with the Antec? :confused::confused:

    Anyway... at this point, the CPU tests run fine.. for whatever reason. But as soon as I bring the GTX 770 into the picture, I get shutoffs. So I guess I'll focus on the GPU.
     
  17. cheesechoker

    cheesechoker n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    Back again. I borrowed a Geforce 8800 GTX (768MB). It sucks down ~180W of power, so it's midway between the GTX 650 (65W) and 770 (230W). And with the 8800 my PC still shuts off, exactly the same as before.

    This thread's getting long, so here's a summary of the troubleshooting I've tried:
    • Replaced the PSU
    • Tested every RAM slot in isolation
    • Tested every RAM stick in isolation
    • Tried underclocking RAM
    • Tested both PCIe ports (x4 and x16)
    • Tested 3 different video cards: GTX 650 (seems ok), 8800 GTX (dies), GTX 770 (dies)

    And my status is:
    • Run memtest for 60+ mins = no issues
    • Run Prime95 for 35+ mins = no issues
    • Run OCCT CPU test for 30+ mins = no issues
    • Run OCCT GPU test for < 8 mins = PC shuts off. :(

    Anyway! At this point, I should blame the motherboard, right? I think I've done enough tests to rule out the RAM, CPU, PSU, and video card.

    But I'm still puzzled that swapping in the 650 makes a difference-- if the mobo is faulty, why would it tolerate a 650 but not the other 2 cards?

    Edit: Apparently the GTX 650 does NOT make a difference, it still fails. It just takes longer for the computer to reboot during the GPU stress test. But eventually, it does.
     
    Last edited: Dec 1, 2014
  18. Dead Parrot

    Dead Parrot 2[H]4U

    Messages:
    2,378
    Joined:
    Mar 4, 2013
    If I have read this thread correctly and not missed anything, your rig works fine with the old Antec. If that is correct, then the problem is with the new model of PS. My guess is that some aspect of the new PS can't provide the amount of power needed on a specific rail while the Antec can. Before randomly replacing system parts other then the PS, I would find out the specific power needed at each voltage by the different video cards. Since it works with the 650 and not the 8800 or 770, find out the difference in power needs between the cards. NOT the total power, but the power needed at each voltage. You will also need similar power numbers for the MB/CPU combo. Add, then compare those numbers to the output specs of both the Corsair and Antec PS. And verify the number of rails each PS has at each voltage.
     
  19. cheesechoker

    cheesechoker n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    What I said earlier in the thread was wrong -- I was able to kill the PC using the 650, but it took a lot longer (30 mins vs. 5 mins of stress testing). And in real-world use the 650 never died on me while gaming, whereas the 8800 and 770 did. I did not stress-test the 650 on the Antec PSU for very long, so I dunno if it would&#8217;ve eventually failed there too.

    But I did look up the voltages!

    Old PSU (Antec EA-500)
    Code:
    ---------------------------------------------------------------
     DC Output    | +3.3V | +5V  | +12V 1 | +12V 2 | -12V | +5Vsb |
    --------------|-------|------|--------|--------|------|-------|
     Max          |  24A  |  24A |   17A  |   17A  | 0.8A |  2.5A |
    ---------------------------------------------------------------
    5V, 3.3V max load     = 130W
    +12V1, +12V2 max load = 408W
    New PSU (CX600M)
    Code:
    ----------------------------------------------------
     DC Output    | +3.3V | +5V  | +12V | -12V | +5Vsb |
    --------------|-------|------|------|------|-------|
     Max Load     |  25A  |  25A |  46A | 0.8A |   3A  |
    --------------|--------------|------|------|-------|
     Max combined |     130W     | 552W | 9.6W |  15W  |
     wattage      |              |      |      |       |
    ----------------------------------------------------
                                Total Power : 600W
    And the power requirements of my hardware (couldn&#8217;t find values for CPU/mobo)
    • Core i7-3770: ??
    • Gigabyte GA-B75M-D3H: ??
    • Zotac GTX 770: min. 42A +12V (&#8220;based on a Core i7 3.2GHz configuration&#8221;)
    • ASUS 8800GTX: min. 30A +12V
    • MSI GTX 650: min. 20A +12V

    So... the Corsair has 46A on +12V, which should be enough for a 770 + Core i7. And the 8800 should've been a breeze. As for the Antec, it does have 2x +12V rails, but only 17A on each. The Corsair seems better all around.

    Anyway, at this point I don&#8217;t even have the PC with me -- I finally gave up & dropped it at the local PC repair shop. Haven&#8217;t heard back yet. Thanks for your reply though, it spurred me to do some more research.

    I will post again when the shop gets back to me.
     
    Last edited: Dec 8, 2014
  20. Dead Parrot

    Dead Parrot 2[H]4U

    Messages:
    2,378
    Joined:
    Mar 4, 2013
    Here is a data point that may be of some use: My system is a Gigabyte GA-Z77X-UD3H with a i7-3770@3.4Ghz, 16G Ram, several SSDs, 1 HD, and a GTX-460 based video card. I am using an Antec EA-500D PS. Power usage as measured with a Kill-A-Watt meter is 227W with 2 CPU folding processes and 1 GPU folding process running. The machine has been 100% stable doing this since purchasing a couple of years ago. It runs 24/7. Power usage with just the 2 CPU processes and normal OS stuff running is about 108W. So a real world power measurement of a GTX460 is it uses about 115W more running GPU stuff as opposed to just displaying normal OS stuff. Firing up a World of Tanks session results in a draw of around 220W. All folding is set to stop while gaming.

    Measurements are of power going into the PS and do not include things like monitors and such.
     
  21. SavageThrash

    SavageThrash Limp Gawd

    Messages:
    243
    Joined:
    May 16, 2007
    Generally a straight up reboot as opposed to a crash is a PSU issue. If your getting lockups or GPU related crashes such as odd lines appearing on screen followed by the rest of the system locking and crashing. My best guess is the corsair isn't up to the task power wise. The Antec isn't either but you expected that. I never was a fan of corsair PSU personally. If you wanted to get fancy you could try using both PSUs but powering different components. If it's stable that way then neither of your PSUs are up to the task as a single unit.

    If this was a motherboard issue I would suspect it would crash equally as fast with either PSU.
     
    Last edited: Dec 10, 2014
  22. cheesechoker

    cheesechoker n00b

    Messages:
    13
    Joined:
    Nov 14, 2014
    Just got off the phone with the shop. The PSU (CX600M) is defective... again. It folds like a leaf under sustained power usage.

    That means I got 2 bad CX600M's in a row. Most likely the entire batch from the factory is defective. Lesson learned: don't buy 2 identical pieces of hardware from the same shipment!! And, trust Hardforum posters -- yall blamed the PSU and you were right.

    I'm getting an EVGA instead & gonna live happily ever after. Thanks for all your help!