• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

MAJOR system issues, need advice

TheBlank

Limp Gawd
Joined
Nov 15, 2006
Messages
164
Sorry in advance for the "book," this has been two weeks of hell.

Okay, so I built a machine for a someone several months ago. I'd built a karaoke machine for her bar, and she was impressed enough with the work that she wanted me to replace her other machine with something similar. So, some of these specs will seem overkill for a general use machine, but that's what she wanted:

Case: Corsair Obsidian 750D
CPU: Intel Core i7 6700K 4ghz
Mobo: MSI Z170A GAMING M9
RAM: G.SKILL TridentZ 16gb PC3200
CPU Cooler: Corsair Hydro H60
PSU: Corsair RM850X (Replaced with Corsair HX750i)
Primary SSD: Samsung 850 Pro 512gb
Secondary HDD: HGST Deskstar 3TB
GPU: MSI GeForce GTX 950
Optical Drive: LG 16x Blu Ray Burner
Operating System: Windows 10 Pro 64

With Dual Dell U2415 monitors


Anyway, the machine worked flawlessly for the first few months. Then, about two months ago, it suddenly would not wake from sleep mode. Yes, I know. Just disable sleep mode, but it's a feature that she wants to be able to take advantage of.

Anyway, the tower would seem powered, but the monitors would never come out of sleep mode. One would have to hard boot to begin using the machine again, although this would almost always lead to a BSOD. Usually a WHEA_UNCORRECTABLE_ERROR

I began by unplugging any unnecessary peripherals. No change. I clean uninstalled and updated the NVidia drivers. No change. I updated the BIOS. No change. I disconnected the NVidia GPU, and used the onboard GPU. No change.

By this point, Windows would no longer boot at all, and every attempt to enter safe mode would result in a locked up windows loading animation.

I used a system restore image I had created after the base install initially, but the problems persisted. In fact, I started getting new BSOD message: MACHINE_CHECK_EXCEPTION

When I went to take the tower apart, I noticed that there had been a sandal underneath the tower, perfectly placed to block the air intake of the PSU. Opening the tower, I noticed a smell which could either be burned electronics, or simply warm.

I re-seated every cable, packed up the tower, and tried to boot into Windows. After a completely successful boot I began testing the sleep mode, and both a 10 minute test, and a 5 hour test were successful. So, I began a much longer test.

When I returned 17 hours later, the problems were back full force. Windows would not load at all, had to hard boot, and perform a system restore. Now, I wasn't getting WHEA BSODs, I was just getting a ton of MACHINE_CHECK and the not seen before CLOCK_WATCHDOG_TIME_OUT.

Going more short form here:

Tested the PSU:
Voltages were out of spec (way too low)

Replaced the PSU with Corsair HX750i : Brief improved system performance, but quickly right back to old problems.

Ran memory test: Consistently passed. Even with individual sticks installed.

(Edit: forgot a step)

Swapped SSD for known working HDD. Windows installation failed, BSOD on restarts during installation.

Replaced the Mobo and CPU:
Improved stability of the machine, but major Windows installation corruption despite several re installations, and the sleep mode issue persists. System will work until Windows updates are being applied, and then at some random point, a major crash during restart, boot, or sleep will corrupt something and the whole installation becomes unrecoverable.

Removed GPU, Front Panel headers, secondary hard drive, optical drive.

I'm currently running sleep mode tests using the last step in that list. While short sleep mode tests work, Windows Update corrupts and has to be restarted, the monitor will flash on and off randomly, and event viewer already lists 142 errors 24 of them being license install failed for type: 1

I just ran an hour long sleep test. The tower powered up, but the monitor never turned on. Had to hard boot, and while the MSI loading screen came up, it's just sitting here with a black screen instead of Windows.

I'm worried that the SSD might be shot, which I really hope it isn't because I've had to replace so much of this PC already. Every time I've replaced a component, some portion of the myriad of issues has improved, but it is still far from stable. The problems seem to be almost random, and right now they seem to be data related. There was also the weird encounter before the latest installation, where when doing a clean installation of Windows 10 Pro, upon reboot a screen came up asking me to chose one of 6 different windows installations to continue with. It wasn't up long enough for me to make a selection, and then proceeded as normal, but that was completely unexpected.

I did a complete repartition before this current installation and I haven't seen the issue again.

I'm about to do a HDD scan of the secondary drive using my personal laptop, and then a scan of the SSD as well.

If anyone has any suggestions, recommendations, notice something I've missed, please let me know. I'm at the point where I'm going to have to get a new SSD, RAM, GPU, and secondary HDD piece by piece and test each new part. Time consuming to say the least. I really don't want to have to cost this person even more in parts, and I'm starting to feel really stressed out and like I'm coming across to my client like I don't know what I'm doing. I realize that it is quite possible that that's exactly what needs to happen (as I've had to do it twice with my own machines when a PSU went Kamikaze), but I'm trying really hard to avoid that for her.

I have four other machines that I've built with very similar setups, and all of them are running completely rock solid. I'm really tempted to blame that sandal for killing the PSU, and then the PSU for killing... well... everything else, but it's honestly somewhat jarring to have to constantly go back to her and say that we have to replace yet another part.

Thank you in advance.
 
Last edited:
Not your fault if the customer blocked the air vent and the box overheated.
Depending what failed, I'd say warranty is void after blocking the vents.

Is the box stuffed into a computer desk or small confined space with poor
ventilation too?

Is it the PSU that stinks/smells burnt or the mobo?
Get your nose right up to it, sometimes that's what it takes.

I'd take a machine like that back to my shop and strip it down and rebuild it.
Any iffy parts get swapped out with spares to troubleshoot.

Also, she needs to lighten up about demanding sleep mode. There are many
brand new OEM systems that will not wake up properly from sleep mode.
Just had a customer that bought a couple of high end Dell XPS systems and
the mouse freezes after it wakes up. I turned off sleep mode.
It's been that way forever on sleep mode issues.

I'd say no overclocking.... you may even want to underclock it slightly for stability.

Good luck.

ETA: Make sure it has enough case fans running.
 
Call it crazy if you like, but for what it's worth just try a quick install of Windows 7. You've done everything I would have done except that. At least maybe rule out the OS too. Another might be monitors too. It's worth a try and see if the monitors just don't like the system. (unlikely i know but you never know). Also unlikely but still possible (i've had happen though) is a faulty SATA cable. It's a nice build but I would have went with an ASUS mobo. Had a couple MSI mobo's myself and they crapped out after a while. (their graphics cards still kick ass)
 
Not your fault if the customer blocked the air vent and the box overheated.
Depending what failed, I'd say warranty is void after blocking the vents.

Yeah, I know that it isn't the fault of my build or my experience, it's just difficult when people start looking to you as a computer guy (that they're willing to pay). They start having unreasonable expectations of how quickly or cheaply you can fix something, and that's both difficult to combat and hard on the ego.

Is the box stuffed into a computer desk or small confined space with poor
ventilation too?

Fortunately it is not. Originally she wanted it in something like that, but listened when I explained that that was a terrible idea. The only place it could be placed was in the somewhat large footwell of her desk, with plenty of room up top, and plenty of room to the back as the back of the footwell is only partially obstructed. Incidentally, this is undoubtedly why a sandal got kicked underneath the tower to begin with.

Is it the PSU that stinks/smells burnt or the mobo?
Get your nose right up to it, sometimes that's what it takes.

It's the PSU that smelled, albeit very faintly. I did get right up on it, and the smell was faint enough that I wasn't able to completely conclude that it was burned electronics or not. Also, I don't know how long that sandal was under the PSU, so the "burning" could've happened awhile before I was called in to check it out. In either case, the voltages tested low, so I replaced it.

I'd take a machine like that back to my shop and strip it down and rebuild it.
Any iffy parts get swapped out with spares to troubleshoot.

Yeah, that's pretty much what I've been doing. Unfortunately, I don't have any spare DDR4 RAM laying around. Just went and purchased some just in case. And having to buy parts a smattering at a time, and having to recommend even more and more adds stress to the situation.

Also, she needs to lighten up about demanding sleep mode. There are many
brand new OEM systems that will not wake up properly from sleep mode.
Just had a customer that bought a couple of high end Dell XPS systems and
the mouse freezes after it wakes up. I turned off sleep mode.
It's been that way forever on sleep mode issues.

I know that sleep mode has been finicky for a long, long time. However, all of my machines have been able to use it, as well as this one until the sandal fiasco. Seeing as that was a stable capability beforehand, I've somewhat been using it as a way to determine if I've weeded out all of the damaged components. It seems to start an avalanche of unreasonable errors and expose other problem areas. For instance, at one point I had the machine running sleep mode and almost everything else just fine. 5 minute sleep mode was fine. 5 hours was fine too. Somewhere between 5 hours and 17 hours something went haywire, and that installation of Windows never recovered.

I'd say no overclocking.... you may even want to underclock it slightly for stability.

Yeah, I never planned to do any overclocking of the system, as longevity was definitely one of the overall goals of the build. That being said, I did try to run the RAM with it's XMP profile. Just recently on a reboot while testing a spare hard drive, BIOS kicked out an "overclock error" and reset the RAM back down to 1.2V and 2133mhz. Part of the reason I picked up more RAM was because I'm worried that that indicates damage and instability for the RAM in question, even though it passes memory testing.


ETA: Make sure it has enough case fans running.

Hah, oh yes. That's one reason I made sure to get the 750D case. Three out-venting 120mm fans up top, two 120mm intakes in the front, and the CPU cooler 120mm fan is installed as an intake through it's radiator.


Call it crazy if you like, but for what it's worth just try a quick install of Windows 7. You've done everything I would have done except that. At least maybe rule out the OS too. Another might be monitors too. It's worth a try and see if the monitors just don't like the system. (unlikely i know but you never know). Also unlikely but still possible (i've had happen though) is a faulty SATA cable. It's a nice build but I would have went with an ASUS mobo. Had a couple MSI mobo's myself and they crapped out after a while. (their graphics cards still kick ass)

I certainly understand ruling out the OS, and to some degree I've done that. Windows 10 was working fine until this fiasco, and I made several attempts to rule out a driver/software/windows update causing the issue. System restores, new installations only to certain points, and all have met with failure. I might have to give installing a different OS just to test the components a try, but I'm wary of changing the working environment too much and missing other components that are acting out in a very specific way.

I've replaced the SATA cables already (sorry that wasn't in the OP). I'm just about to swap monitors just to rule that out as well. A bad monitor is rare enough, but two bad monitors? I've already swapped between the two in the dual setup, and that didn't change anything. She has said her luck with computers is epically bad, however... so...

I too was an ASUS fan for quite awhile. I used several ASUS boards and video cards in several styles of builds, but eventually all of them crapped out. Hard drive controller issues, all of them. Two of my ASUS GTX570 directcu ii cards died as well. When I started reading reports of LGA 2011v3 motherboards being shipped with bent pins and ASUS refusing to acknowledge anything other than user error, I made the switch. Other than this motherboard, every MSI board I've used has been nearly flawless (although I hate the killer network cards, or rather the software that's included with them). 2 high-end gaming/production rigs, one home server, one media heavy karaoke machine, and two gaming laptops.

ASUS burned me over a decade ago, and they started burning me again, so I've moved away from their components. I still use their networking equipment, though.


Currently, I've got a new 4TB HGST HDD installed for testing purposes (it was originally intended for a NAS box RAID, but first things first). The installation was actually faster than the SSD, and included several more steps that I haven't been running into with the SSD. Including setting up the network immediately, gathering updates during the installation, proper resolution of the display during the installation proccess, and having sleep mode as an option directly after the initial installation. When using the SSD, I would have to update several levels before sleep mode even became an option. All things that are damning the SSD in my eyes.

I tried connecting the SSD to my laptop to run Samsung Magician software and check the SMART values (I know, not conclusive with an SSD), but despite it identifying the drive as a Samsung SSD, it couldn't find SMART values, and also refused to fully connect to it as it insisted it wasn't a Samsung SSD.

Ill be connecting a different monitor to the system, and if that doesn't change anything I'll reconnect the old GPU to see if it's just Intel's shoddy graphics interfering.

Edit:

Nothing working so far. Just rolled the motherboard BIOS back, and replaced the RAM. Waiting on Windows to reset itself so I can test.



Thanks for replying.
 
Last edited:
Back
Top