Hard freezes - vSphere 5 and Supermicro x9scm-f

madcoder · Oct 25, 2011

Ever since a fresh install to vSphere 5, I have random hard freezing of the host itself requiring a warm reset of the host. vSphere 4.x was fine.

My configuration is pretty simple, except for an Areca 1680ix-24 that is passed through in the advanced configuration.

Is anyone else experiencing lock ups with this board?

danswartz · Oct 25, 2011

No, but I have an M1015 not the areca. Any chance you can try a different HBA?

C7J0yc3 · Oct 25, 2011

Not with that specific board but one of my R810s did that to me yesterday morning. All its VMs were fine, but couldn't manage the host by any means, totally frozen. Any insight would be nice.

lopoetve · Oct 25, 2011

/var/log/vmkernel.log.

Enable ssh on your boxes, when it happens, tell me what that log says.

C7J0yc3 · Oct 25, 2011

2011-10-24T18:44:53.749Z cpu1:4745)WARNING: iscsi_vmk: iscsivmk_ConnProcessRejec t: Sess [ISID: 00023d000005 TARGET: iqn.1992-08.com.netapp:sn.142241528 TPGT: 7d 0 TSIH: 0]
2011-10-24T18:44:53.749Z cpu1:4745)WARNING: iscsi_vmk: iscsivmk_ConnProcessRejec t: Conn [CID: 0 L: 192.168.60.20:49624 R: 192.168.60.1:3260]
2011-10-24T18:44:53.749Z cpu1:4745)WARNING: iscsi_vmk: iscsivmk_ConnProcessRejec t: vmhba38:CH:0 T:0 CN:0: iSCSI pdu rejected: itt 0x2127, opcode TMF Request, re ason Immediate Command Reject
2011-10-24T18:44:53.749Z cpu1:4745)WARNING: iscsi_vmk: iscsivmk_ConnProcessRejec t: Sess [ISID: 00023d000005 TARGET: iqn.1992-08.com.netapp:sn.142241528 TPGT: 7d 0 TSIH: 0]
2011-10-24T18:44:53.749Z cpu1:4745)WARNING: iscsi_vmk: iscsivmk_ConnProcessRejec t: Conn [CID: 0 L: 192.168.60.20:49624 R: 192.168.60.1:3260]
2011-10-24T18:44:53.749Z cpu1:4745)WARNING: iscsi_vmk: iscsivmk_ConnProcessRejec t: vmhba38:CH:0 T:0 CN:0: iSCSI pdu rejected: itt 0x2128, opcode TMF Request, reason Immediate Command Reject

This repeats a few hundred times in my log.

lopoetve · Oct 25, 2011

What version of Data Ontap are you on?

madcoder · Oct 26, 2011

lopoetve said:
/var/log/vmkernel.log.

Enable ssh on your boxes, when it happens, tell me what that log says.

You must be writing to C7J0yc3, as my original post in this thread is not helped by this advice. Obviously, if the machine freezes, the SSH port is also unresponsive when that happens, and it is not possible to check the log.

lopoetve · Oct 26, 2011

madcoder said:
You must be writing to C7J0yc3, as my original post in this thread is not helped by this advice. Obviously, if the machine freezes, the SSH port is also unresponsive when that happens, and it is not possible to check the log.

Was hoping ssh would still be responsive. If not, setup a syslog server and set ESX5 to log to it - then you'll have a vmkernel.log file for that server.

C7J0yc3 · Oct 26, 2011

madcoder said:
You must be writing to C7J0yc3, as my original post in this thread is not helped by this advice. Obviously, if the machine freezes, the SSH port is also unresponsive when that happens, and it is not possible to check the log.

Once you restart your box you should be able to SSH in and get the logs. That's what I did.

C7J0yc3 · Oct 26, 2011

lopoetve said:
What version of Data Ontap are you on?

8.0.2

lopoetve · Oct 26, 2011

C7J0yc3 said:
8.0.2

standard swiscsi or the BNX2i cards?

Rectal Prolapse · Oct 26, 2011

madcoder, does it freeze before launching VMs? If not, what guest OS you running? Maybe the driver in that guest is causing the lockups.

Also, you can try removing/swapping RAM just to see if that is the issue.

lopoetve · Oct 26, 2011

Rectal Prolapse said:
madcoder, does it freeze before launching VMs? If not, what guest OS you running? Maybe the driver in that guest is causing the lockups.

Also, you can try removing/swapping RAM just to see if that is the issue.

there should be no way for a singular guest to lock up a host. That would violate a basic rule of virtualization - things run in isolation. It would be a critical bug.

Rectal Prolapse · Oct 26, 2011

Is that true in passthrough mode? Is it possible to completely lock up the PCIe bus when one of the devices is handed over to the guest, or does VMWare protect that path?

Although I guess even with that protection an errant driver could possible send improper commands to a PCI device that could cause the PCI device (through a firmware bug) to luck the bus, right?

lopoetve · Oct 26, 2011

Rectal Prolapse said:
Is that true in passthrough mode? Is it possible to completely lock up the PCIe bus when one of the devices is handed over to the guest, or does VMWare protect that path?

Although I guess even with that protection an errant driver could possible send improper commands to a PCI device that could cause the PCI device (through a firmware bug) to luck the bus, right?

Should be impossible. Passthrough mode takes the device away from the vmkernel entirely - the host, to put it simply, is unaware that the device even exists, or has ever existed.

This has happened once briefly with an ~extremely~ old and out of date OS that shall remain nameless, and it was fixed with extreme prejudice. VT-D is suppposed to assist with isolation as well, which is why it's required for passthrough - keep it isolated from the rest of the system.

Hard freezes - vSphere 5 and Supermicro x9scm-f

madcoder

Limp Gawd

danswartz

2[H]4U

C7J0yc3

[H]ard|Gawd

lopoetve

Extremely [H]

C7J0yc3

[H]ard|Gawd

lopoetve

Extremely [H]

madcoder

Limp Gawd

lopoetve

Extremely [H]

C7J0yc3

[H]ard|Gawd

C7J0yc3

[H]ard|Gawd

lopoetve

Extremely [H]

Rectal Prolapse

Gawd

lopoetve

Extremely [H]

Rectal Prolapse

Gawd

lopoetve

Extremely [H]