4790k BSOD after delid (but normal use okay?)

Discussion in 'Overclocking & Cooling' started by cstrx, Jun 17, 2018.

  1. cstrx

    cstrx n00bie

    Messages:
    6
    Joined:
    Jun 17, 2018
    Posting this in OC rather than Intel since this issue has only come up since delidding/relidding with CLP. I'm hoping some of you guys with delidding experience might help me cover any troubleshooting prior to tearing it all back down again.

    Build: 4790k (always stock, yes, really), 32GB Mushkin Stealth (stock XMP 1.2 profile), ASRock Z97M OC Formula, GTX 750ti, Seasonic Prime Ultra Titanium 750W. Cooler is Cryorig C1. This is all in a well ventilated MicroATX case.

    This is a hackintosh, and I usually run various outboard audio gear and attached RAID over USB, since this PC is used primarily for audio production. However, Win 10 2016 LTSB is also installed for occasional gaming. All testing has been done in Windows so far. When I first set up the machine back in 2015, I tried Prime 95, but gave up when temps kept creeping above 90C, even with a Cryorig H7 (cooler at the time). Since the system ran fine for what I was doing, I let it go. The 4790k has been fine since 2015, but over the last year, I've noticed that Core 1 has been way hotter than the others under load. Idle with the C1 and about 23-24C ambient was mid-30s. Load with audio processing seldom crept above 70.

    So I had to go and delid. I used Rockit's tool and followed their procedure exactly but used CLP instead of CLU. I did particular research beforehand on the amount of liquid metal to apply and watched quite a few tutorials very closely, and my application results matched the (good) pics online perfectly. To cover the sensitive areas of the substrate, I used a thin coat of liquid electrical tape instead of nail polish. I left it in the relidding clamp for about 12 hours, ran the machine normally for another 12, then at about 24 hours began testing with Prime95. And here's where we get weird.

    Idle temp had evened across all cores and was now upper 20s/low 30s, Prime 29.4 started in the mid-50s, and gradually crept as high as 74, then hovered in the low 70s for about 15 minutes. Then BSOD (interrupt exception not handled). Upon reboot, the system ran for another minute or two, then another BSOD (system service exception). So I completely powered it off. Uttered all the bad words I know. Then powered back on and ran normally for a bit. Then tried Prime 26.6. Temps hovered around upper 50s/lower 60s. Awesome! Just kidding--BSOD (system service exception). Then the same BSOD shortly after reboot. Full power-off again and some more bad words and incoherent sounds of frustration. To make this story shorter, the same pattern repeated when I ran Intel's own IPDT. The initial run of that tool was a pass, so I decided to do the burn-in test and set it on a loop while I went to bed. But... BSOD again just as the second loop had gotten going.

    Right now, I'm typing this on the same machine, so if I'm not stressing things, I seem okay. If I had screwed up with the liquid metal, I'm guessing that I would not be running right now, but I don't get why I can't even run Prime95 now where before I could, just with stupidly high temps. Maybe this chip has actually been a turd since 2015 and I just didn't know it? The only thing that has changed in the BIOS over the past few years was the anti-Spectre microcode update a few months back, which I am running (ASRock's BIOS update, not manually updated).

    Sorry for the lengthy post, but if anyone has any ideas to offer before I get out the acetone and unlid that chip again to examine my work, I'd appreciate it. I kind of doubt that my application was a failure because I did everything so damn closely to all the guides, etc. Was very, VERY careful not to use too much!
     
    Last edited: Jun 17, 2018
  2. pendragon1

    pendragon1 [H]ardForum Junkie

    Messages:
    10,087
    Joined:
    Oct 7, 2000
    look in you minidump file to see what driver has crashed. maybe bump the voltage a bit. also test your cmos battery. I have a 4790k/z97 system here that was having all sorts of weird issues due to a dying battery.
     
  3. Brian_B

    Brian_B [H]ard|Gawd

    Messages:
    1,408
    Joined:
    Mar 23, 2012
    Wasn't there some particular version of P95 that just went bonkers on AVX or something?
     
  4. pendragon1

    pendragon1 [H]ardForum Junkie

    Messages:
    10,087
    Joined:
    Oct 7, 2000
    yup anything over 26.6
     
  5. cstrx

    cstrx n00bie

    Messages:
    6
    Joined:
    Jun 17, 2018
    Whoa... thanks for the quick replies!
    I should have saved the minidump but I was impatient this morning and reset the BIOS and also reimaged from a recovery image from last week, so I'll have to reinvoke the BSODs again (which shouldn't be much trouble).

    I wouldn't have expected the CMOS batter, but since this is still the original one, I might as well replace it with a fresh one. I'll give that a try in a little bit here.

    And exactly regarding prime 95. That's why I got the current version and 26.6 from their FTP. But both those and even Intel's tool killed me. I'll post an update in a bit on that minidump.
     
    pendragon1 likes this.
  6. Furious_Styles

    Furious_Styles Gawd

    Messages:
    689
    Joined:
    Jan 16, 2013
    Your problem sounds like you have been applying too much voltage for stock speeds or you're putting too much/too little thermal compound between the CPU/HSF or that mobo could just be bad.
     
  7. cstrx

    cstrx n00bie

    Messages:
    6
    Joined:
    Jun 17, 2018
    Okay, dump points to kernel and some driver, but I couldn't tell which one specifically.
    But, there are only so many possibilities, so I started by pulling the graphics card and rebooted on the igpu, and both versions of prime are running fine.
    My own results for the 4790k and Cryorig C1 in prime 29.4 are 84-89 peak and 70 peak in version 26.6. This is consistent with the 20ish degree drop under load that I've seen reported by other Haswell owners.

    Granted, that's running only a couple of hours, but no BSODs within the first few minutes like before. Since then, I reattached my soundcard (RME Babyface Pro for the sake of anyone who might look up this thread later) and no issues with current drivers or firmware.

    I'll reattach the 750ti later to confirm, but everything is pointing to that now. I also went ahead and replaced the CMOS battery afterward, just to hopefully prevent any future weirdness like pendragon was talking about.

    Furious_Styles--voltage is on auto by the motherboard, but given the temps, the thought had crossed my mind to try undervolting it since I don't have much to gain by overclocking in my use cases. But pulling the graphics card has definitely stabilized this system. I didn't realize running prime would have an effect like that on other components, thought it was just an isolated CPU stress test...
     
    pendragon1 likes this.
  8. pendragon1

    pendragon1 [H]ardForum Junkie

    Messages:
    10,087
    Joined:
    Oct 7, 2000
    your narrowing it down. if you put the gpu back in and it starts again, try another psu too rule that out, could be either.
     
  9. deaedius

    deaedius Gawd

    Messages:
    757
    Joined:
    Jun 18, 2014
    Check for a bent pin on the CPU socket.
     
    Chapeau likes this.
  10. Chapeau

    Chapeau Gawd

    Messages:
    728
    Joined:
    Jul 17, 2016
    This would be my bet too...
     
  11. cstrx

    cstrx n00bie

    Messages:
    6
    Joined:
    Jun 17, 2018
    Ugh, as much as I hate to say it, I think both of you may be onto it. Very frustrating, since the CPU has only been out of this board and back in once and it was done very slowly and gingerly.

    I've only attempted fixing bent pins once before, when a board I received on ebay had four or five out of whack. I failed. Still got my money back, since the seller shipped it stupidly, but I still remember how it just wouldn't work despite how "even looking" I got those pins.
    Crap. I'll wait until this weekend. I tried a different graphics card just now, but more BSODs. Different errors this time, but same overall routine.

    What would be the logic behind the pins? Something with a bad connection related to the memory controller or a PCI lane?
     
  12. pendragon1

    pendragon1 [H]ardForum Junkie

    Messages:
    10,087
    Joined:
    Oct 7, 2000
    pcie lane I assume, if that's the issue.
     
  13. Furious_Styles

    Furious_Styles Gawd

    Messages:
    689
    Joined:
    Jan 16, 2013
    Auto voltage is a mixed bag, your mobo might be applying too much voltage, have you checked how much it is at load?
     
  14. cstrx

    cstrx n00bie

    Messages:
    6
    Joined:
    Jun 17, 2018
    Well, it looks like those who suggested pins may be right. Took the thing completely apart and saw that my liquid metal application had been perfectly fine but I noticed three pins that were very slightly pressed, so I raised them back up a bit. But I also saw that somehow, there was TIM or something else that had gotten on a few of the pads on the bottom of the CPU, maybe at some point during my initial delidding, so I cleaned them off thoroughly with alcohol.

    I also took the opportunity to order and install Rockit's copper IHS, and I'm seeing slightly lower temperatures, about 3-5 degrees, stock now versus before. As far as voltage, after running prime 26.6 overnight and the current version for a couple hours yesterday, I decided to try undervolting. Appear to be totally stable at vcore 1.15, adaptive with 0.025 offset, and cache at 1.15, but I haven't done the overnight test yet.

    Between the thorough re-delidding with the copper IHS and a complimentary cleaning kit from Rockit (!?), plus the undervolting, even with the mediocre cooling from my Cryorig C1, my 4790k is running stock idle at 26-29 and prime 26.6 maxes out at 64-68 (core 4 is low). I'll probably try 4.5 or 4.6 at some point.

    Thanks for all the help, guys!
     
    DTN107 and pendragon1 like this.
  15. cstrx

    cstrx n00bie

    Messages:
    6
    Joined:
    Jun 17, 2018
    PS---Previous load voltage had been 1.23. Much more than my chip needed.