Titan Z Locking Up

Discussion in 'nVidia Flavor' started by CuriousGeorge, Apr 24, 2018.

  1. CuriousGeorge

    CuriousGeorge n00bie

    Messages:
    36
    Joined:
    Apr 8, 2012
    I'm a bit stumped by this one. I have a Titan Z that's locking up under SLI gaming loads. The symptoms are (all Windows 10 1709):

    1) Hard locks (no BSOD, have to reset) under the latest 391.35 drivers after a minute or two of Unigine Heaven.

    2) Runs Unigine Heaven for hours in single GPU mode with 391.35.

    3) Runs Unigine Heaven for hours in SLI mode with older 382.33 drivers.

    So it must be a driver issue with SLI, right? Except the same machine with 2x Titan Blacks in SLI can run Unigine Heaven for hours under the newest 391.35 drivers, and the Titan Z is basically a couple of Titan Blacks on a stick.

    Overall I don't much care if Unigine Heaven crashes, as I'm using these for DP compute. However I'm worried that the 391.35 drivers might be exposing some kind of hardware fault in the Titan Z, and I'd rather figure out what's going on now than have to troubleshoot crashy code due to flaky hardware later on. Nvidia support has been pretty useless so far, their latest suggestion was to try the Titan Z in different PCIe slots (which predictably changed nothing).

    Anyone have any thoughts or tests I might run to figure out if this is a software or hardware issue?
     
  2. CuriousGeorge

    CuriousGeorge n00bie

    Messages:
    36
    Joined:
    Apr 8, 2012
    It appears I'm not the only one having this issue:
    https://forums.geforce.com/default/...vlddmkm-sys-after-385-69/2/?offset=19#5332384
    The weird thing is that it appears to only affect dual-GPU cards (690 is also affected) but not separate cards in SLI. I've no idea what is "special" about a dual-GPU card, but obviously there's something.

    At any rate I've tested both Titan Z GPUs for over 24 hrs with different CUDA codes, and both GPUs appear to work fine, so I guess I'll just chalk this one up to bad drivers and move on. Hopefully Nvidia fixes this one soon, though. It's kind of embarrassing that some of their ultimate halo GPUs (low volume and aging though they might be) apparently aren't being tested much if at all.
     
  3. criccio

    criccio Fully Equipped

    Messages:
    11,522
    Joined:
    Mar 26, 2008
    Try the latest 397.31 driver?
     
  4. horrorshow

    horrorshow [H]ardness Supreme

    Messages:
    5,269
    Joined:
    Dec 14, 2007
    tenor.gif
     
  5. CuriousGeorge

    CuriousGeorge n00bie

    Messages:
    36
    Joined:
    Apr 8, 2012
    ^ :ROFLMAO:

    I did try 397.31, and they actually make things worse. Not only do they not fix the problem, but after their release 382.33 is no longer available from Nvidia's advanced driver search. So GTX690/Titan Z owners might no longer have access to working drivers.
     
  6. horrorshow

    horrorshow [H]ardness Supreme

    Messages:
    5,269
    Joined:
    Dec 14, 2007
    http://www.guru3d.com/files-details/geforce-382-33-whql-driver-download.html

    There you are sir ;)
     
  7. Neapolitan6th

    Neapolitan6th Gawd

    Messages:
    889
    Joined:
    Nov 18, 2016
    This is probably unrelated to your issue, but I think its something you should be aware of as a Titan Black owner. (Not sure if this applies to the Z however)

    There's an overclocker that goes by Buildzoid who warns of the memory inductors on the reference 780/780ti/titan/titan black/980ti/titan X maxwell cards like to bite the dust prematurely.

    Video link

    I'm not sure if there is a way to mitigate that issue however. Probably somehow keeping thermals in check. Something to keep in mind at least.
     
  8. CuriousGeorge

    CuriousGeorge n00bie

    Messages:
    36
    Joined:
    Apr 8, 2012
    Thanks! Just saved an extra copy in case this doesn't get sorted.

    Good to know. I did have a Titan Black die on me a while back; it'd boot, but just barely, and installing a graphics driver was impossible. It was under warranty so I didn't worry too much about it, but that could've been the cause - it didn't seem like a catastrophic failure so much as something minor but critical going bad.
     
  9. CuriousGeorge

    CuriousGeorge n00bie

    Messages:
    36
    Joined:
    Apr 8, 2012
    Update:

    It appears that PLX chips might be the issue. When 2x GPUs are on different PLX chips there's no problem, but on the same PLX chip they'll crash. So it might not really be a 690/Titan Z issue so much as a more generic SLI with PLX chips issue; I just encountered it with a Titan Z because in that case I'm forced to attach both GPUs to the same PLX chip.
     
  10. CuriousGeorge

    CuriousGeorge n00bie

    Messages:
    36
    Joined:
    Apr 8, 2012
    Update 2:

    It looks like Nvidia has figured out the problem; there's a fix posted in the GeForce thread linked above that should be baked into a future driver release.

    I also want to give some recognition to Nvidia's higher level tech support; once I fought through the Indian CSR drones and got to somebody in the US (or at least somebody that was fluent in English) they were pretty awesome.