Does vsphere support any kind of hardware watchdog?

Discussion in 'Virtualized Computing' started by danswartz, Dec 30, 2017.

  1. danswartz

    danswartz 2[H]4U

    Messages:
    3,584
    Joined:
    Feb 25, 2011
    Running a 6.5 host, Build 6765664. Running fine since update to this build on 12/10. This morning around 10AM, it became unresponsive. I was out of town. so I was unable to check anything until just now. The IPMI console showed everything apparently okay, except for the host being unresponsive (not even to pings.) I rebooted it, and it came up fine, but I'd kinda like to avoid hangs in the future. I've been searching via google, but I don't see any kind of hardware watchdog support. Is there such a thing? It's a single host, so HA won't help me here. Thanks!
     
  2. k1pp3r

    k1pp3r [H]ardness Supreme

    Messages:
    7,741
    Joined:
    Jun 16, 2004
    There are for some IPMI and iDRAC and iLO but its typically a vendor specific ISO you have to install.

    That said, your issue is likely not hardware. Do a fresh install of ESXi on the host.
     
  3. danswartz

    danswartz 2[H]4U

    Messages:
    3,584
    Joined:
    Feb 25, 2011
    This is a whitebox in my home lab, so there is no vendor involved. I'm not saying this is a HW issue - I'd just like insurance that it won't lock up again when no-one is here to push the reset button :)
     
  4. REDYOUCH

    REDYOUCH [H]ardness Supreme

    Messages:
    4,541
    Joined:
    Mar 17, 2001
    This is bad advice. You have no idea what the issue is until you review the logs and perform some level of diagnostics.
     
  5. k1pp3r

    k1pp3r [H]ardness Supreme

    Messages:
    7,741
    Joined:
    Jun 16, 2004
    Its not bad advice if your alerting is setup correctly.

    However OP's issue was the hypervisor locking up, unless you have outside monitoring form that host, its not going to alert you in any fashion.
     
  6. REDYOUCH

    REDYOUCH [H]ardness Supreme

    Messages:
    4,541
    Joined:
    Mar 17, 2001
    His host locked-up one time and you're telling him to re-install ESXi. He should at least spend a few minutes taking a look at the server logs to see if he can find out what occurred.
     
  7. k1pp3r

    k1pp3r [H]ardness Supreme

    Messages:
    7,741
    Joined:
    Jun 16, 2004
    I'm assuming troubleshooting has already been done, besides, reinstalling esxi can easily be done without affecting the VMFS volumes. Then you just re-import your machines and move one.
     
  8. danswartz

    danswartz 2[H]4U

    Messages:
    3,584
    Joined:
    Feb 25, 2011
    I'm not sure which logs to look at. I noticed a new build was available, so since I was suspicious something might have gotten corrupted, I installed that build. In case this happens again, where do you suggest looking? e.g. which logfiles? Thanks!
     
  9. lopoetve

    lopoetve Imhotep

    Messages:
    29,229
    Joined:
    Oct 11, 2001
    /var/log/vmkernel.log. If you're stored on reliable storage, that will still be there. See what hte last few messages were. /var/log/vmkwarning.log is a warn/error version of the same log.
     
  10. danswartz

    danswartz 2[H]4U

    Messages:
    3,584
    Joined:
    Feb 25, 2011
    Thanks! Vsphere is installed on a small (8GB) DOM, so the logs should still be available. Hasn't happened again, so far (fingers crossed...)
     
  11. lopoetve

    lopoetve Imhotep

    Messages:
    29,229
    Joined:
    Oct 11, 2001
    Eh, may be big enough and stable enough for it to have tagged it as stable.
     
  12. danswartz

    danswartz 2[H]4U

    Messages:
    3,584
    Joined:
    Feb 25, 2011
    Turns out it wasn't, so that's why I saw nothing useful in the logs :( I changed the syslog global settings to stash the logs on one of the NFS datastores.
     
  13. ChRoNo16

    ChRoNo16 [H]ard|Gawd

    Messages:
    1,216
    Joined:
    Feb 3, 2011
    I had sol many bugs and errors with ESXi 6.5 that I went back down to 5.5. Less support for newer stuff but its at least stable.
     
  14. Grimlaking

    Grimlaking [H]ard|Gawd

    Messages:
    2,012
    Joined:
    May 9, 2006
    Oh that statement isn't exactly heartwarming.
     
  15. Orddie

    Orddie 2[H]4U

    Messages:
    2,249
    Joined:
    Dec 20, 2010
    I do not see how you can tell a locked up vmware to reboot :)

    if anything another system would run API commands via your IPM to reboot the host.
     
  16. Grimlaking

    Grimlaking [H]ard|Gawd

    Messages:
    2,012
    Joined:
    May 9, 2006
    There are hardware management solutions that work with devices like iDRAC's in Dell servers to allow remote management and reboots and such. had to use one today because my stupid server suddently couldn't see it's onboard flash drives. Took a hard reboot. I did NOT want to fly to NY for something like that.
     
  17. lopoetve

    lopoetve Imhotep

    Messages:
    29,229
    Joined:
    Oct 11, 2001
    And a hardware watchdog, if one exists for your platform, will heartbeat to that IPMI solution and trigger a reboot if it goes away for long enough (generally several minutes, for obvious reasons). Not always a GOOD idea, mind you, but possible.
     
  18. mstaab

    mstaab n00bie

    Messages:
    32
    Joined:
    Mar 31, 2016
    I have never tried 6.5 but have been getting EOL advisements from vmware that we need to update. Only comment to this is we have been running 5.5+ at our office with MS guests: DC/Fileserver, Exchange with 40 mailboxes, SQL/ERP, RDP server with 15 users and one other Win7 guest. It has NEVER locked up or given any issues at all, we have logged over 1 year up time and only needed to restart the host because of Spectre/Meltdown patches for ESXi a couple months ago.

    This is my big 2 cents
     
  19. lopoetve

    lopoetve Imhotep

    Messages:
    29,229
    Joined:
    Oct 11, 2001
    6.5 later releases are great (U1 included). The early releases are always a bit iffy for any software company.
     
  20. Grimlaking

    Grimlaking [H]ard|Gawd

    Messages:
    2,012
    Joined:
    May 9, 2006
    I do not like not having thick client access to my hosts. That feature alone has bailed me out of mistakes my director has made in the past. "I shut down this vm... what was it for?" me.. "Oh just the Vsphere appliance." Director "Oh.. well shit." Me "It's fine I'm already booting it back up."

    Now I'll have to do the command line dance... and I dislike that.
     
  21. k1pp3r

    k1pp3r [H]ardness Supreme

    Messages:
    7,741
    Joined:
    Jun 16, 2004
    No you don't. . . .

    ESXi 6.5 hosts have a web interface built in now. You can log in via the web interface and boot a VM.
     
  22. Dan_D

    Dan_D [H]ardOCP Motherboard Editor

    Messages:
    53,416
    Joined:
    Feb 9, 2002
    This. You need some type of third party monitoring tool that would alert you when a server becomes unresponsive.
     
  23. cyclone3d

    cyclone3d [H]ardForum Junkie

    Messages:
    12,205
    Joined:
    Aug 16, 2004
    Try adding a second management port. And then try removing it. Locks up the whole web interface until you do a hard reboot. This is on the latest build.

    And the only way I have found to get rid of the second management port is to tell it to reset everything to default settings via the local interface (start over from scratch).

    Removing the vswitch fails, removing the management port fails, basically nothing you can do once you add a second management port.
     
  24. Grimlaking

    Grimlaking [H]ard|Gawd

    Messages:
    2,012
    Joined:
    May 9, 2006
    Ah that explains... so adding the second management port then trying to remove it breaks it. Understandable.

    I have noticed some odd behavior with the flash console as well.
     
  25. Grimlaking

    Grimlaking [H]ard|Gawd

    Messages:
    2,012
    Joined:
    May 9, 2006
    Well color me purple and call me barney. Hot damn! I dig that!