Hard freezes - vSphere 5 and Supermicro x9scm-f

madcoder

Limp Gawd
Joined
Oct 30, 2005
Messages
288
Ever since a fresh install to vSphere 5, I have random hard freezing of the host itself requiring a warm reset of the host. vSphere 4.x was fine.

My configuration is pretty simple, except for an Areca 1680ix-24 that is passed through in the advanced configuration.

Is anyone else experiencing lock ups with this board?
 
Not with that specific board but one of my R810s did that to me yesterday morning. All its VMs were fine, but couldn't manage the host by any means, totally frozen. Any insight would be nice.
 
/var/log/vmkernel.log.

Enable ssh on your boxes, when it happens, tell me what that log says.
 
2011-10-24T18:44:53.749Z cpu1:4745)WARNING: iscsi_vmk: iscsivmk_ConnProcessRejec t: Sess [ISID: 00023d000005 TARGET: iqn.1992-08.com.netapp:sn.142241528 TPGT: 7d 0 TSIH: 0]
2011-10-24T18:44:53.749Z cpu1:4745)WARNING: iscsi_vmk: iscsivmk_ConnProcessRejec t: Conn [CID: 0 L: 192.168.60.20:49624 R: 192.168.60.1:3260]
2011-10-24T18:44:53.749Z cpu1:4745)WARNING: iscsi_vmk: iscsivmk_ConnProcessRejec t: vmhba38:CH:0 T:0 CN:0: iSCSI pdu rejected: itt 0x2127, opcode TMF Request, re ason Immediate Command Reject
2011-10-24T18:44:53.749Z cpu1:4745)WARNING: iscsi_vmk: iscsivmk_ConnProcessRejec t: Sess [ISID: 00023d000005 TARGET: iqn.1992-08.com.netapp:sn.142241528 TPGT: 7d 0 TSIH: 0]
2011-10-24T18:44:53.749Z cpu1:4745)WARNING: iscsi_vmk: iscsivmk_ConnProcessRejec t: Conn [CID: 0 L: 192.168.60.20:49624 R: 192.168.60.1:3260]
2011-10-24T18:44:53.749Z cpu1:4745)WARNING: iscsi_vmk: iscsivmk_ConnProcessRejec t: vmhba38:CH:0 T:0 CN:0: iSCSI pdu rejected: itt 0x2128, opcode TMF Request, reason Immediate Command Reject

This repeats a few hundred times in my log.
 
/var/log/vmkernel.log.

Enable ssh on your boxes, when it happens, tell me what that log says.

You must be writing to C7J0yc3, as my original post in this thread is not helped by this advice. Obviously, if the machine freezes, the SSH port is also unresponsive when that happens, and it is not possible to check the log.
 
You must be writing to C7J0yc3, as my original post in this thread is not helped by this advice. Obviously, if the machine freezes, the SSH port is also unresponsive when that happens, and it is not possible to check the log.

Was hoping ssh would still be responsive. If not, setup a syslog server and set ESX5 to log to it - then you'll have a vmkernel.log file for that server.
 
You must be writing to C7J0yc3, as my original post in this thread is not helped by this advice. Obviously, if the machine freezes, the SSH port is also unresponsive when that happens, and it is not possible to check the log.

Once you restart your box you should be able to SSH in and get the logs. That's what I did.
 
madcoder, does it freeze before launching VMs? If not, what guest OS you running? Maybe the driver in that guest is causing the lockups.

Also, you can try removing/swapping RAM just to see if that is the issue.
 
madcoder, does it freeze before launching VMs? If not, what guest OS you running? Maybe the driver in that guest is causing the lockups.

Also, you can try removing/swapping RAM just to see if that is the issue.

there should be no way for a singular guest to lock up a host. That would violate a basic rule of virtualization - things run in isolation. It would be a critical bug.
 
Is that true in passthrough mode? Is it possible to completely lock up the PCIe bus when one of the devices is handed over to the guest, or does VMWare protect that path?

Although I guess even with that protection an errant driver could possible send improper commands to a PCI device that could cause the PCI device (through a firmware bug) to luck the bus, right?
 
Is that true in passthrough mode? Is it possible to completely lock up the PCIe bus when one of the devices is handed over to the guest, or does VMWare protect that path?

Although I guess even with that protection an errant driver could possible send improper commands to a PCI device that could cause the PCI device (through a firmware bug) to luck the bus, right?

Should be impossible. Passthrough mode takes the device away from the vmkernel entirely - the host, to put it simply, is unaware that the device even exists, or has ever existed. :)

This has happened once briefly with an ~extremely~ old and out of date OS that shall remain nameless, and it was fixed with extreme prejudice. VT-D is suppposed to assist with isolation as well, which is why it's required for passthrough - keep it isolated from the rest of the system.
 
Back
Top