Sorry in advance for the "book," this has been two weeks of hell.
Okay, so I built a machine for a someone several months ago. I'd built a karaoke machine for her bar, and she was impressed enough with the work that she wanted me to replace her other machine with something similar. So, some of these specs will seem overkill for a general use machine, but that's what she wanted:
Case: Corsair Obsidian 750D
CPU: Intel Core i7 6700K 4ghz
Mobo: MSI Z170A GAMING M9
RAM: G.SKILL TridentZ 16gb PC3200
CPU Cooler: Corsair Hydro H60
PSU: Corsair RM850X (Replaced with Corsair HX750i)
Primary SSD: Samsung 850 Pro 512gb
Secondary HDD: HGST Deskstar 3TB
GPU: MSI GeForce GTX 950
Optical Drive: LG 16x Blu Ray Burner
Operating System: Windows 10 Pro 64
With Dual Dell U2415 monitors
Anyway, the machine worked flawlessly for the first few months. Then, about two months ago, it suddenly would not wake from sleep mode. Yes, I know. Just disable sleep mode, but it's a feature that she wants to be able to take advantage of.
Anyway, the tower would seem powered, but the monitors would never come out of sleep mode. One would have to hard boot to begin using the machine again, although this would almost always lead to a BSOD. Usually a WHEA_UNCORRECTABLE_ERROR
I began by unplugging any unnecessary peripherals. No change. I clean uninstalled and updated the NVidia drivers. No change. I updated the BIOS. No change. I disconnected the NVidia GPU, and used the onboard GPU. No change.
By this point, Windows would no longer boot at all, and every attempt to enter safe mode would result in a locked up windows loading animation.
I used a system restore image I had created after the base install initially, but the problems persisted. In fact, I started getting new BSOD message: MACHINE_CHECK_EXCEPTION
When I went to take the tower apart, I noticed that there had been a sandal underneath the tower, perfectly placed to block the air intake of the PSU. Opening the tower, I noticed a smell which could either be burned electronics, or simply warm.
I re-seated every cable, packed up the tower, and tried to boot into Windows. After a completely successful boot I began testing the sleep mode, and both a 10 minute test, and a 5 hour test were successful. So, I began a much longer test.
When I returned 17 hours later, the problems were back full force. Windows would not load at all, had to hard boot, and perform a system restore. Now, I wasn't getting WHEA BSODs, I was just getting a ton of MACHINE_CHECK and the not seen before CLOCK_WATCHDOG_TIME_OUT.
Going more short form here:
Tested the PSU: Voltages were out of spec (way too low)
Replaced the PSU with Corsair HX750i : Brief improved system performance, but quickly right back to old problems.
Ran memory test: Consistently passed. Even with individual sticks installed.
(Edit: forgot a step)
Swapped SSD for known working HDD. Windows installation failed, BSOD on restarts during installation.
Replaced the Mobo and CPU: Improved stability of the machine, but major Windows installation corruption despite several re installations, and the sleep mode issue persists. System will work until Windows updates are being applied, and then at some random point, a major crash during restart, boot, or sleep will corrupt something and the whole installation becomes unrecoverable.
Removed GPU, Front Panel headers, secondary hard drive, optical drive.
I'm currently running sleep mode tests using the last step in that list. While short sleep mode tests work, Windows Update corrupts and has to be restarted, the monitor will flash on and off randomly, and event viewer already lists 142 errors 24 of them being license install failed for type: 1
I just ran an hour long sleep test. The tower powered up, but the monitor never turned on. Had to hard boot, and while the MSI loading screen came up, it's just sitting here with a black screen instead of Windows.
I'm worried that the SSD might be shot, which I really hope it isn't because I've had to replace so much of this PC already. Every time I've replaced a component, some portion of the myriad of issues has improved, but it is still far from stable. The problems seem to be almost random, and right now they seem to be data related. There was also the weird encounter before the latest installation, where when doing a clean installation of Windows 10 Pro, upon reboot a screen came up asking me to chose one of 6 different windows installations to continue with. It wasn't up long enough for me to make a selection, and then proceeded as normal, but that was completely unexpected.
I did a complete repartition before this current installation and I haven't seen the issue again.
I'm about to do a HDD scan of the secondary drive using my personal laptop, and then a scan of the SSD as well.
If anyone has any suggestions, recommendations, notice something I've missed, please let me know. I'm at the point where I'm going to have to get a new SSD, RAM, GPU, and secondary HDD piece by piece and test each new part. Time consuming to say the least. I really don't want to have to cost this person even more in parts, and I'm starting to feel really stressed out and like I'm coming across to my client like I don't know what I'm doing. I realize that it is quite possible that that's exactly what needs to happen (as I've had to do it twice with my own machines when a PSU went Kamikaze), but I'm trying really hard to avoid that for her.
I have four other machines that I've built with very similar setups, and all of them are running completely rock solid. I'm really tempted to blame that sandal for killing the PSU, and then the PSU for killing... well... everything else, but it's honestly somewhat jarring to have to constantly go back to her and say that we have to replace yet another part.
Thank you in advance.
Okay, so I built a machine for a someone several months ago. I'd built a karaoke machine for her bar, and she was impressed enough with the work that she wanted me to replace her other machine with something similar. So, some of these specs will seem overkill for a general use machine, but that's what she wanted:
Case: Corsair Obsidian 750D
CPU: Intel Core i7 6700K 4ghz
Mobo: MSI Z170A GAMING M9
RAM: G.SKILL TridentZ 16gb PC3200
CPU Cooler: Corsair Hydro H60
PSU: Corsair RM850X (Replaced with Corsair HX750i)
Primary SSD: Samsung 850 Pro 512gb
Secondary HDD: HGST Deskstar 3TB
GPU: MSI GeForce GTX 950
Optical Drive: LG 16x Blu Ray Burner
Operating System: Windows 10 Pro 64
With Dual Dell U2415 monitors
Anyway, the machine worked flawlessly for the first few months. Then, about two months ago, it suddenly would not wake from sleep mode. Yes, I know. Just disable sleep mode, but it's a feature that she wants to be able to take advantage of.
Anyway, the tower would seem powered, but the monitors would never come out of sleep mode. One would have to hard boot to begin using the machine again, although this would almost always lead to a BSOD. Usually a WHEA_UNCORRECTABLE_ERROR
I began by unplugging any unnecessary peripherals. No change. I clean uninstalled and updated the NVidia drivers. No change. I updated the BIOS. No change. I disconnected the NVidia GPU, and used the onboard GPU. No change.
By this point, Windows would no longer boot at all, and every attempt to enter safe mode would result in a locked up windows loading animation.
I used a system restore image I had created after the base install initially, but the problems persisted. In fact, I started getting new BSOD message: MACHINE_CHECK_EXCEPTION
When I went to take the tower apart, I noticed that there had been a sandal underneath the tower, perfectly placed to block the air intake of the PSU. Opening the tower, I noticed a smell which could either be burned electronics, or simply warm.
I re-seated every cable, packed up the tower, and tried to boot into Windows. After a completely successful boot I began testing the sleep mode, and both a 10 minute test, and a 5 hour test were successful. So, I began a much longer test.
When I returned 17 hours later, the problems were back full force. Windows would not load at all, had to hard boot, and perform a system restore. Now, I wasn't getting WHEA BSODs, I was just getting a ton of MACHINE_CHECK and the not seen before CLOCK_WATCHDOG_TIME_OUT.
Going more short form here:
Tested the PSU: Voltages were out of spec (way too low)
Replaced the PSU with Corsair HX750i : Brief improved system performance, but quickly right back to old problems.
Ran memory test: Consistently passed. Even with individual sticks installed.
(Edit: forgot a step)
Swapped SSD for known working HDD. Windows installation failed, BSOD on restarts during installation.
Replaced the Mobo and CPU: Improved stability of the machine, but major Windows installation corruption despite several re installations, and the sleep mode issue persists. System will work until Windows updates are being applied, and then at some random point, a major crash during restart, boot, or sleep will corrupt something and the whole installation becomes unrecoverable.
Removed GPU, Front Panel headers, secondary hard drive, optical drive.
I'm currently running sleep mode tests using the last step in that list. While short sleep mode tests work, Windows Update corrupts and has to be restarted, the monitor will flash on and off randomly, and event viewer already lists 142 errors 24 of them being license install failed for type: 1
I just ran an hour long sleep test. The tower powered up, but the monitor never turned on. Had to hard boot, and while the MSI loading screen came up, it's just sitting here with a black screen instead of Windows.
I'm worried that the SSD might be shot, which I really hope it isn't because I've had to replace so much of this PC already. Every time I've replaced a component, some portion of the myriad of issues has improved, but it is still far from stable. The problems seem to be almost random, and right now they seem to be data related. There was also the weird encounter before the latest installation, where when doing a clean installation of Windows 10 Pro, upon reboot a screen came up asking me to chose one of 6 different windows installations to continue with. It wasn't up long enough for me to make a selection, and then proceeded as normal, but that was completely unexpected.
I did a complete repartition before this current installation and I haven't seen the issue again.
I'm about to do a HDD scan of the secondary drive using my personal laptop, and then a scan of the SSD as well.
If anyone has any suggestions, recommendations, notice something I've missed, please let me know. I'm at the point where I'm going to have to get a new SSD, RAM, GPU, and secondary HDD piece by piece and test each new part. Time consuming to say the least. I really don't want to have to cost this person even more in parts, and I'm starting to feel really stressed out and like I'm coming across to my client like I don't know what I'm doing. I realize that it is quite possible that that's exactly what needs to happen (as I've had to do it twice with my own machines when a PSU went Kamikaze), but I'm trying really hard to avoid that for her.
I have four other machines that I've built with very similar setups, and all of them are running completely rock solid. I'm really tempted to blame that sandal for killing the PSU, and then the PSU for killing... well... everything else, but it's honestly somewhat jarring to have to constantly go back to her and say that we have to replace yet another part.
Thank you in advance.
Last edited: