VM with passthrough "freezes" entire ESXi box when shutdown/rebooting guest

Discussion in 'Virtualized Computing' started by praetorian, Apr 10, 2016.

  1. praetorian

    praetorian [H]Lite

    Messages:
    84
    Joined:
    Sep 9, 2003
    Hi all,

    VERY strange issue that I've come across and can't seem to get my head around this .. so far I've tried diagnosing this for nearly 12 hours straight without being able to get to the bottom of it.

    I'm running ESXi 6.0u2 build 3620759 (free license), on a custom build Asus P9x79 PRO running an E5-2660 Xeon and 64gb of DDR3 RAM. The system has four cards in it: IBM/LSI M1015 passthrough to a FileServer VM, an ATI X1300 boot graphics card, 2x Intel PT1000 Dual NIC cards and an AMD HD6450 Graphics card for passthrough. I have a Windows 10 VM which has been configured to use the AMD HD6450 configured in passthrough both Audio and Video to use as a test bench for a new type of system that we're currently building but don't have the actual hardware with us .. so we're shortcutting to get the development of the software achieved.


    The VM has the latest VMWare Tools installed (via the console) and runs absolutely perfectly ... except when you decide to either reboot or shutdown the VM, the actual ESX host becomes unresponsive. Stops responding to network traffic, ESXi thick and HTML consoles don't response (naturally) but there are absolutely no errors on the screen, like a PSOD. It still displays the normal yellow console which is also unresponsive, so a very hard lock.

    I've looked through the knowledgebase and found KB1030265 which I've followed and now have disabled interrupt mapping but this hasn't made a slight bit of difference.

    Can anybody point me in a direction to either get logs from this thing or any suggestions to try and debug this? Appreciate that it's a tough call, especially since the hardware not everybody will be running etc but any experiences that are similar and things I can change/tune would be appreciated.

    I'm tempted to drop back to ESXi 5.5 and see if that exhibits the same problem, which would indicate hardware faults, but I would have thought loading up the VM with 1080p graphics/sound would have caused a bigger issue than shutting down the VM.



    Thanks

    Dean
     
  2. rtangwai

    rtangwai [H]ard|Gawd

    Messages:
    1,236
    Joined:
    Jul 26, 2007
    I am running ESXi 6.0 w/M1015 in passthrough without shutdown issues. I have 3x Intel NICs (single, not dual) as well, but no video cards in passthrough. I would try starting up the hypervisor and *NOT* run the VMs that are passing through the video cards, then do a shutdown to see. That shoud help isolate what hardware is causing the lockup.
     
  3. praetorian

    praetorian [H]Lite

    Messages:
    84
    Joined:
    Sep 9, 2003
    Cheers rtangwai. Already tried that and it's looking like the graphics card that I'm trying to passthrough. Think I'll invest in another from eBay to see if that's the fault or not
     
  4. RyC

    RyC [H]Lite

    Messages:
    113
    Joined:
    Apr 20, 2013
    Try leaving out the audio device when passing through to the VM if you don't use HDMI audio
     
  5. praetorian

    praetorian [H]Lite

    Messages:
    84
    Joined:
    Sep 9, 2003
    Thanks RyC, I'll give that a shot as it could be the audio causing an issue but I kinda need that as well :( I'm also going to try reverting back to 6.0GA to see if it's a Update2 issue; if not, back down to 5.5 which seems to work without issue

    Just spoken to a colleague about this who's doing the same thing but on a SuperMicro board and is getting exactly the same behavior when passing through a GPU to a Windows VM. He's also running Update2 as well. Difference being, he's passing through a 290X.
     
  6. brocclee

    brocclee n00bie

    Messages:
    7
    Joined:
    Mar 31, 2016
    I have the same problem with a R9 Fury on a Supermicro X10SRH mobo. I'm convinced that the root of this problem is that when I reboot my Windows VM, the VM it is unable to release/reset the GPU. Thus when the Windows VM reboots and continues to load up again it tries to access a non-reset GPU, and the VM locks up. Performing a hard reboot of the physical machine of course releases the GPU and everything works again.

    My rationale for believing this is that my Sapphire R9 Fury has LED lights on it that only turn on when the GPU is active. In other words, the LED's are OFF when the VM is off but the ESXi physical host is ON. Once i power up the VM, the LED lights turn on as the Windows VM boots up.

    There is a bunch of literature around the 'net about using different methods of resetting the card using FLR, D3D0 reset, etc, but none of them worked for me either.

    --BroccLee
     
    Last edited: Apr 12, 2016
  7. praetorian

    praetorian [H]Lite

    Messages:
    84
    Joined:
    Sep 9, 2003
    I've been speaking to a user on ServeTheHome's forums, xienze, who gave me a good idea on how to check whether it's the PCIe reset that's causing the problem.

    VM with passthrough "freezes" entire ESXi box when shutdown/rebooting guest

    Unfortunately it didn't work for me, as the box hung even though I'd disabled the card inside the VM but it may work for others. Maybe a good idea to drop him a post in that thread so he can give you some guidance if his solution works.

    For me, I'm definitely going to be downgrading to 6.0 GA and then 5.5 to see if I can solve the problem. I'd raise it with VMWare but since I'm a free user, I doubt they'll pay any attention :(
     
  8. praetorian

    praetorian [H]Lite

    Messages:
    84
    Joined:
    Sep 9, 2003
    Finally got this working! Unfortunately had to drop back to ESXi 5.5u3 but at least the damn thing works and gives me everything that I need including FLING :) Shame that I can't get it working with 6 though :(
     
  9. Rody

    Rody Limp Gawd

    Messages:
    216
    Joined:
    Jun 14, 2003
    Thats awesome, I wonder what the deference is, I have read about some issues with KVM hard locking with a vm reboot, but not esxi, and not for everyone. Must be a flag or check box or something that we can configure if we fine the right way to go about it.
     
  10. MrGuvernment

    MrGuvernment Pick your own.....you deserve it.

    Messages:
    19,463
    Joined:
    Aug 3, 2004
    Just a note if anyone finds this thread,. pass through seems broken on ESXI 6 and up.

    I have 6.5 with January update and GPU and even enterprise level dual pot gigabyte NIC's from dell cause the same freezing behavious causing the entire ESXi server to reboot.

    https://communities.vmware.com/message/2670345#2670345
     
  11. MrGuvernment

    MrGuvernment Pick your own.....you deserve it.

    Messages:
    19,463
    Joined:
    Aug 3, 2004
    Latest ESXi 6.5 Update 1 and even on CentOS if you do a shutdown - ESXI host reboots entirely.
     
  12. Angry

    Angry Limp Gawd

    Messages:
    441
    Joined:
    Feb 27, 2006
    Im currently Running ESXi 6.0.0-20160302001
    And passing through Dell perc (flashed to IT )to a Freenas VM. With no issues.

    However, GPU pass through freezes the host.
    And thats on the version I listed above and up to the latest, which on another system I cant even pass a NIC through to without the host quiting responding. It doesnt freeze, but no PSOD. Have to smack f12 and make it shut down and or reboot. I assumed it was the x58 board causing the issue even though I have a Xeon 5640.
     
  13. Angry

    Angry Limp Gawd

    Messages:
    441
    Joined:
    Feb 27, 2006
    ESXi 5.5 passed through my r7 250x without a SINGLE issue.

    I tried ESXi 6.0,6.5, Unraid, Proxmox, Xen, and none could pass through the GPU on my test setup, a Giggy X58-UDR3 with Xeon 5640.

    ESXi 5.5, worked like a charm.
     
  14. koolaidkitten

    koolaidkitten Gawd

    Messages:
    625
    Joined:
    Apr 28, 2006
    That just seems to be the nature of ESXi and GPU passthrough. Some setups will work with 5.5, others with 6.0+ Some with only certain bios revisions of their chosen motherboard. I myself am running on 6.0u2 build 3620759 and its happy as can be. I tried updating to 6.5 a week ago. GPU seemed fine but 6.5 had a fit with my SSDs. Tried some patch that was released not long ago by vmware and that failed to fix the issue. So I rolled back to 6.0u2 and I guess it is where I will stay cause frankly I'm tired of messing with the darn thing lol.
    And actually, I'm kind of glad to be back on 6.0. I still hate the web config interface. I love the fat client.
     
  15. Hakaba

    Hakaba Limp Gawd

    Messages:
    135
    Joined:
    Jul 22, 2013
    Haven't tried Esxi, currently use Proxmox and have used KVM in the past but. In my personal experience, Fury (and reportedly Vega) wouldn't release the video card whenever you shutdown/restart the guest vm. Only way to get the video card back was to reset the whole box.
     
  16. danswartz

    danswartz 2[H]4U

    Messages:
    3,575
    Joined:
    Feb 25, 2011
    When I was trying an all in one appliance a few months ago (ESXi 6.5), I wanted to pass two samsung NVME drives through to a centos storage appliance. Worked just fine. Except rebooting the host would sometimes cause one of the NVME drives to disappear from the guest's HW list. Looking at the ESXi pass through settings, that drive was marked as not enabled for pass through. Given that the 'fix' when this happens is another reboot, I said screw it, and went with two separate boxes (one ESXi and one centos storage host) with dedicated 40gb infiniband enet between them, so performance was excellent anyway...
     
  17. Mackintire

    Mackintire 2[H]4U

    Messages:
    3,012
    Joined:
    Jun 28, 2004
    You should look into the various options of implementing vSGA verses vDGA and vGPU.

    VMware convienently only tells you about it in the Horizon documentation but there's nothing stopping you from using it without horizon licensing.

    You'd have to load the VIB manually as well.