ESXI 5.5 U1 Host non-responsive, some VMs still running

Joined
Feb 16, 2014
Messages
32
Hey guys, I'm in need of some advice on troubleshooting my ESXI host. I am running the following config:

Single ESXI host configured with PCI passthrough for freenas, multiple NICs, various linux & windows VMs.

this issue has happened several times now:

-Vswitch0 which has my primary VM network and Management stops responding.
-NIC lights flashing so link is up
-Vswitch1 & 2 still function properly & so does console to the host

When this happened a few weeks ago I thought VEEAM was to blame, so I stopped using it for a while. This issue has happened again twice today,

Vswitch0 was using a PCI intel NIC of type 85274L which is supported. Connected directly to a netgear desktop switch, along with other NICs which work fine.

If there was a bad cable that dropped then re-established link, would esx eventually force the link down?

I have since moved that network to a different physical interface to isolate the NIC in question.

Would love some pointers! Thanks guys.
 
Hey guys, I'm in need of some advice on troubleshooting my ESXI host. I am running the following config:

Single ESXI host configured with PCI passthrough for freenas, multiple NICs, various linux & windows VMs.

this issue has happened several times now:

-Vswitch0 which has my primary VM network and Management stops responding.
-NIC lights flashing so link is up
-Vswitch1 & 2 still function properly & so does console to the host

When this happened a few weeks ago I thought VEEAM was to blame, so I stopped using it for a while. This issue has happened again twice today,

Vswitch0 was using a PCI intel NIC of type 85274L which is supported. Connected directly to a netgear desktop switch, along with other NICs which work fine.

If there was a bad cable that dropped then re-established link, would esx eventually force the link down?

I have since moved that network to a different physical interface to isolate the NIC in question.

Would love some pointers! Thanks guys.
Depends on how the ports are configured on the v switch.
 
So without knowing exactly what i'm looking for - or which log might have the relevent data, I did find this error message throughout the vmkernel.log:

2014-04-19T08:12:07.099Z cpu5:35277)WARNING: LinNet: map_pkt_to_skb:2069: This message has repeated 93 times: vmnic3: dropping packet due to parsing failure

And since the problem occurred I moved mgmt network to it's own NIC and moved the VM network to another NIC - thus removing the intel PCI nic from the equation. Since then nothing unusual has occurred.
 
Sounds like a bad nic doing something stupid. Firmware on them can crash after too many errors - I've seen that on bad intel and broadcom nics when they're dying.
 
Agreed. Since it's been days since I moved to other NICs I'm satisfied the problem was related to that one NIC.
 
Now that this system has been up for 8 days without incident I'm going to consider the issue solved due to a bad NIC. Time to toss it in the parts bin.
 
Back
Top