How prevalent are the Skylake stability issues in real world use?

samuelmorris

Supreme [H]ardness
Joined
Dec 20, 2010
Messages
5,506
I recently replaced aging hardware with Z170-based platforms in two machines as both were having issues with stability, issues which I was pretty sure were attributable to the original hardware. After a very drawn out installation period attempting to get either system to boot at all, I eventually traced the issue down to a faulty displayport cable stopping the machines from entering POST, which amusingly enough could power the graphics card and light the motherboard LEDs off the monitor without the PC's PSU even having a power cable plugged into it. With the displayport cable gone, both machines were upgraded and by and large they work. However, neither machine is much more stable than the pair they replaced, but for totally different reasons (better in one case, worse in the other).
I find myself wondering, is there any likelihood that the stability issues affecting Skylake CPUs can affect regular usage (i.e. not running Prime95), or is it a technical impossibility? I wasn't sure whether the Prime calculation was one of the only rare cases to trigger the issue, or simply a test that would prompt it (for the same reason you might run it as a stress test to examine an overclock). Neither of the two PCs in question do anything particularly out of the ordinary and are used for entirely different purposes (both software and hardware configuration). The stability problem, however, manifests itself in exactly the same way for both PCs - the PC will 'freeze' - image on the screen remains static, sound stops and loops (at about 50Hz, effectively sounding like a 50Hz square wave, a very loud one at that, in can sometimes be loud enough to make me reflexively throw my headphones off my head due to the sudden increase in volume), the PC will no longer to respond to a ping on its IP address (this is allowed normally), and even the reset button, which normally works on both PCs, will do nothing. The only fix is to either hold the power button for 5s, or turn off the PSU at the back and then re-power the machine.
(Note I have not tested sound on the file server, but all other symptoms are identical)

The gaming PC is shut down every night so typically has a maximum uptime of around 15 hours. The issue very rarely comes up at the desktop, I think I may have seen it once. The only game which seems to trigger it is Warframe, which is the game I play most often, so not a very fair test, but it must have happened maybe a dozen or so times, over the span of 2-3 months. The file server was only rebuilt at the beginning of March and is not of course used for games, but runs 24/7, and so far has had this happen three times. The PC is occasionally used as a terminal server (hence running Windows 10 rather than a server operating system), but at present the only real applications running are FileZilla Server, Backblaze, NZXT CAM and Synology DiskStation. I closed NZXT CAM immediately after boot on one occasion in case it was CAM responsible, to no avail.

Perhaps unsurprisingly, there are no application event logs of note at the time when the issue occurs, other than the usual 'previous system shutdown was unexpected' after reboot.

Is this what people would expect to see from Skylake CPUs, or is the known bug something different?
The Windows 10 installation on the file server was clean onto a reformatted disk with a new license at the time the hardware upgraded.
Previous stability issues on both machines were BSOD related, 'Memory Management' on the gaming PC which had always been triggered by its overclock but started to carry on at stock speeds, hence replacement.
'Unexpected Kernel Mode Trap' came up on the server on average every 1-2 weeks, but could sometimes happen three times in a day, usually citing ntoskrnl.exe. I was less convinced this one was hardware, but as it happened more often in warmer weather, I figured given the age of the hardware I'd replace it anyway.
Clearly the previous behaviour was preferable as a PC that automatically reboots periodically is better than one that freezes and requires manual intervention to bring back.

Current Specifications

Voyager (Gaming PC)
i5 6600
16GB Corsair Vengeance LPX 2400 (2x8GB)
MSI Z170 Gaming Pro
Corsair RM1000i
Gainward GTX970
Crucial M500 480GB
WD40EZRX, WD2002FAEX
Windows 8.1 Pro x64

Intrepid (File Server)
i5 6500
16GB Corsair Vengeance LPX 2400 (2x8GB)
Gigabyte Z170-HD3P
Corsair RM650
Adaptec 71605E
Samsung 830 128GB
WD20EARS x2, WD20EARX x2, WD30EZRX x6, WD30EFRX x2
WD40EFX x4, WD60EFRX x2
NZXT GRID+ v2
Windows 10 Pro x64


Previous Specifications

Voyager (Gaming PC)
i5 750
12GB Corsair XMS3 (2x4GB + 2x2GB)
Gigabyte P55A-UD4
Zalman ZM850-HP
Gainward GTX970
Crucial M500 480GB
WD40EZRX, WD2002FAEX
Windows 8.1 Pro x64

Intrepid (File Server)
Core 2 Q9550
4GB Corsair XMS2 Dominator 8500 (2x2GB)
Gigabyte X48-DS4
Corsair RM650
Adaptec 71605E
Intel 320 series 40GB SSD
WD20EARS x2, WD20EARX x2, WD30EZRX x6, WD30EFRX x2
WD40EFX x4, WD60EFRX x2
NZXT GRID+ v1
Windows 8.1 Pro x64

Does anyone think there is any likelihood that the known issues with Skylake CPUs could be causing this? If I'm looking at something else, I'm not entirely sure where to start.
Environmental issues can probably be ruled out as the PCs do not freeze at the same time, and there are two other older systems in the same room which are entirely unaffected by these issues (i5 3470 based, and i5 4690S based).

I will stress that the BIOS has not been updated on the MSI board as of yet - I understand that can potentially improve this issue, IF it's the cause. The Z170-HD3P BIOS was updated during the initial install issue with the displayport cable, as at the time I was unaware what the issue was.
I'm reluctant to blame hardware damage from the displayport cable incident as both systems are capable of often (but not always) running for over a week without this issue occurring. Running stress tests do not seem to provoke it.

Grateful for any advice on this one! Thanks in advance

Sam
 
Last edited:
Cliff notes: Two Skylake builds with different hardware both experience a hard lockup where the screen freezes and sound loops. The only fix is to hold the power button for five seconds. What's up with that?!?
 
Are the computers plugged into the same outlet and/or sharing the mains with other high power draw things in your residence?

Edit: are you also reusing the psu from the original builds? I find it hard to believe that two independent systems will display the same symptoms.
 
Last edited:
1. No - the PCs are actually on separate trailing sockets, each fed from sockets on opposite sides of the same room.
2. In one case (Voyager) no, the PSU was replaced at the same time as changing the CPU/board (it was 7 years old and had suspected stability problems on two of its rails, then unused due to downsizing my graphics), in the other case yes, the RM650 is common to the previous build but was only purchased a year or so ago as the original PSU in that system (a Nexus NX-5000) had a very noisy fan bearing so was replaced before the fan failed (and to provide more molex connections to avoid the use of splitters).

I too find two machines with the same symptoms an odd situation, so given that they didn't before, I can only really point the finger at the things that have changed.
 
I believe a lot of motherboards bios settings require almost perfect bios setting. (Mine included) Voltage settings being to low or to high here or there...yea its not going to be stable simple as that..It could be a memory settings with timings to tight or to fast...it wont be stable
 
Are you saying that with respect to Skylake or just PCs in general? That's a given with an overclocked CPU, but on a CPU model with a locked multiplier that's effectively not overclockable to begin with, that seems crazy. Previous generations were able to do this without issue. Essentially what i'm trying to establish in this thread is if this is a potential liability with Skylake, I think I will try and get Haswell components to replace the Z170s in both machines and have done with it. However, that's a lot of effort and up-front expense to do without cause (The performance difference is not likely to be significant).
 
Are you saying that with respect to Skylake or just PCs in general? That's a given with an overclocked CPU, but on a CPU model with a locked multiplier that's effectively not overclockable to begin with, that seems crazy. Previous generations were able to do this without issue. Essentially what i'm trying to establish in this thread is if this is a potential liability with Skylake, I think I will try and get Haswell components to replace the Z170s in both machines and have done with it. However, that's a lot of effort and up-front expense to do without cause (The performance difference is not likely to be significant).
Believe what you want lol.....You mentioned having 2 out of 2 unstable pcs..whatever you do dont go changing any bios settings that might fix the issues. But yea im talking in general - Its almost always the bios settings! (Unless you have faulty hardware) and no i dont believe all skylakes are faulty lol
 
I believe your having bios issues. On the hd3 make sure you have latest bios,I was looking at getting that board and was warned about that.

"First off, I wish I could rate this at 4.5 stars as the board needs a beta UEFI to be stable, but other than that it is perfect for what you pay for. **Be sure to update to F5i for memory compatibility**"

From a post at micro center.

Might need to do bios update for both boards. Also possibly memory choice, only constant on both builds. Could be some incomparability issue.

Good luck.
 
I'm on my second Skylake system.
Both are rock stable.
However I had memory issues for a while with Corsair LPX ram like yours.

Keep your ram at 2133MHz and see if things improve. Work from there.
If the OS has corrupted due to memory problems or previous issues, you will have compound problems.
It might be wise to try a fresh install of Windows if problems continue.
 
My Skylake build has proven to be the most stable of any of my builds so far. Skyrim with 100+ mods not playing with each other nicely is the only thing that has crashed it.
 
I would find it very surprising if there's a wide-spread stability issue with Skylake CPUs. Intel does a very good doing engineering samples and working with PC manufacturers to ensure nearly everything related to CPU and platform is 1000% stable before releasing.
 
I too would find that very surprising, but it's hard to get the measure of exactly whether the 'Skylake bug' is in fact confined to specialist applications or otherwise. I'll definitely try the BIOS update on the HD3P machine and see what happens. One thing I have noticed on the file server is that after the last time it happened, Sleep Mode is now enabled in Windows even with the power settings disabling it across the board, so I've had to start running Caffeine to stop it going to sleep overnight. Not sure if the machine attempting to enter sleep state before would have been causing the freezes unknowingly. Doesn't explain the issue happening mid-game on the other PC though...
 
I got one of the first batches, and mine has been running super well. No problems whatsoever.
 
Freezing is now becoming more regular on the file server so I have updated the BIOS to version F5, removed a now redundant 2-port PCIe 1x SATA card and removed the NZXT GRID+ V2 from the system entirely (I tried just removing the USB cable but alas doing this isolates the power from the molex connector so fans don't spin at all!). I will see if this clears the issue - if not I probably still won't put the GRID+ back as I'm not happy that it needs to actually seek every hard disk once a second in order to see its temperature. Apart from the annoying light show that's just extra wear on the drives (and noise) that isn't needed, so I have ordered a Corsair Link Commander to potentially replace it. Reviews of the software seem equally scathing but I can only experiment, there aren't really any other software options out there and in a 4U case manual controllers aren't viable (not that I want them). Contemplating hiding some sort of micro-PC inside the case just to run fan controller software on that so if it crashes it doesn't take down the whole system but I wouldn't know where to start with wiring up its reset button externally, going inside the case to reset it every time would be a real drag.
 
On my second skylake 6700k build here, apart from a faulty soundchip on the first board everything has been smooth sailing and thats with a total of 9 hdds in one of them connected with an extra pcie controller to make up for the lack of inbuilt sata ports.

Rock solid stable, even at default bios settings I never had any real problems and if I recall Ive only reset the cmos once on each board.

In my experience skylake is not inherently unstable, far from it to be honest.
 
op exactly what speed is your memory running in the bios? There has been several skylake threads regarding memory speed settings in the bios causing instabality
 
I tried to help him sort out if its a memory problem but he doesnt seem to have a care.
 
I tried to help him sort out if its a memory problem but he doesnt seem to have a care.
I think its Safe to consider this issue solved and Maybe even just a troll thread
 
Last edited:
So a little update on this (no it is not a troll thread) - I'm somewhat closer to working out what the issues are here, or at least I believe I am:

With the file server I decided to remove the NZXT CAM software (not just close it) controlling the GRID+ V2 fan controller as I had read reports that the application could cause PCs to go unresponsive. No luck there. I then decided to remove the controller entirely and plug all the fans either directly off the PSU in the case of the 2-pin fans and off the motherboard in the case of the 3-pin and CPU fans. No stability issues on that machine for almost a whole month. Yesterday I decided to install the Corsair Link Commander unit (arrived a while back but never got round to installing it) - within hours, PC stopped responding again. So this gives me two ideas, either both of these units use similar controllers which are incompatible with either Windows 10 or my hardware config (both seem quite unlikely) or the front USB header(s) on the board are faulty. I'm going to try swapping the front USB header the device uses with the front USB ports and leave those disconnected (in case physical connection itself is enough to cause the issue). If that fails I might try getting a USB header to USB-A adapter and run the cable out the back. If that works I'll probably replace the board. If not, back to the drawing board I suppose!

On the gaming PC issue I haven't fully proven it yet, but it occurred to me that there is a strong correlation between having run GZDoom (Brutal Doom specifically) on the system and then running another game, and seeing the crash issue. I ran through a few weeks without playing it and don't recall seeing the crash happen. On a day I definitely did, I experienced the issue again, with a new title (Overwatch beta) so that leads me to think that perhaps there is something being left behind in memory from running it that causes issues when running other games (have never seen the crash while playing in GZDoom itself). That could be GZDoom, the Brutal Doom mod, or it could be Bandicam as BD is currently the only title I record. From now on I'm going to reboot the machine after running BD before playing any other title and see if the issue disappears. Not an ideal workaround but it'll have to do. It is nonetheless puzzling that this issue only came about when upgrading from i5 750 to the 6600, however, so the change of CPU must have had something to do with the change in behaviour... I intend to upgrade to a GTX1080 as soon as I can lay my hands on one, so that may eliminate the GPU as a potential suspect.
 
What actual memory are you running on those machines? Because last I checked the minimum DDR4 speed is 2133mhz not 1600.
 
Wow sorry, only just noticed that! should be 2400, not 1600 (too stuck with the memory I used in my previous build) - it's on the invoice as:
16GB (2x8GB) Corsair DDR4 Vengeance LPX Black, PC4-19200 (2400), Non-ECC Unbuffered, CAS 14-16-16-31, XMP 2.0, 1.2V

I will amend the first post.
 
Wow sorry, only just noticed that! should be 2400, not 1600 (too stuck with the memory I used in my previous build) - it's on the invoice as:
16GB (2x8GB) Corsair DDR4 Vengeance LPX Black, PC4-19200 (2400), Non-ECC Unbuffered, CAS 14-16-16-31, XMP 2.0, 1.2V

I will amend the first post.

Ok so here is my thought. There have been a lot of compatibility issues with DDR4 between the X99 platform and Skylake platform especially with early build BIOSes (more so with Haswell-E not liking the tighter timings of the Skylake-ready stuff but it goes both ways in my own experience, I had tremendous stability problems when I bought an EVGA 2400 kit for X99 and put it on a Gigabyte Z170 board). You have X99 kits there, and you have the same kit on both machines and both machines are displaying the same problem. Have you tried running them at default 2133 settings instead of XMP?
 
I'm not really sure both machines do display exactly the same issue - A similar problem maybe, but the symptoms aren't identical. With either of these fan controller units plugged into the front USB connection (and had the machine go down earlier proving that changing which header is used makes no difference), the file server PC will crash even when left to its own devices. The gaming PC does not - it will only happen in-game. However, I'll check the memory settings on the server in a moment.

Interesting that you say these are specifically X99 memory kits - what would the equivalent Z170 product be?


Edit: Checked, and the memory is already set to 2133Mhz, XMP disabled.

To save having to redo all the cabling yet again I've left the Corsair fan controller intact, removing the USB header but leaving the power attached. In the case of the NZXT controller that disabled the unit entirely and fans were not powered, this one seems to fare better and is powering the 2-pin and CPU fans at a constant speed. The 3-pin fans are hunting between two speeds continuously which is a bit strange, but livable for now (I can live with temporary extra noise, but not no fans!). On the assumption that this is stable for the next few days without incident, I will probably order another board, possibly not another Gigabyte as this being the second faulty Z170 in a row from them I want to avoid having to repeat the process a third time.

From a little further testing, although the system is unstable by itself with a fan controller attached, I can manually induce the crash by transmitting data to/fro a USB3 hard disk docking station, which implies that it may not be just the front USB headers that are dodgy. This same USB3 dock transferred multiple terabytes a day earlier without incident. It was only after I stopped to install the fan controller that its use caused stability issues, which makes me suspect all USB activity on the board in general.
 
Last edited:
Back
Top