• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

VMDirectPath Locking up HOST

luv2chill

n00b
Joined
Mar 2, 2008
Messages
35
Anyone who's utilized VMDirectPath ever seen where starting up a VM with a passthrough device immediately hard locks the host?

I'm seeing it on two different machines (both Dell--one is a PE T110 and the other a Precision T5500) with two different cards--one is a Brooktrout PCI fax card and the other a PCIe TV tuner card.

The only common thread between them is in looking at them in the "mark devices for passthrough" list they both are subordinate to a PCI bridge (which is automatically checked when I check the subordinate device). After rebooting, both cards show up properly in the list of enabled passthrough devices and I can assign one or both to virtual machines.

The problem comes when trying to boot the VMs. With the fax card the boot task gets to 100% but then the client loses connection (because the host locks up--permanently.There is no response even to the locally-connected KB on the host). After I power off and back on I can get back in the client, but if I try to start that VM with the fax card passed through I get the same behavior.

With the TV tuner it seems like the "power on virtual machine" task doesn't even complete--it gets to 95% and hangs there. Then, the same symptoms--client loses connection because the host is locked up and the only thing I can do is hard power off the host and power it back on.

I have tried the pciPassthru0.msiEnabled = "FALSE" thing in the vmx as suggested in a pretty thin official Vmdirectpath troubleshooting document but no change.

Now in case someone wants to point out the obvious I fully realize that a fax card and a tv tuner are a far cry from the types of devices VMWare supports and says will work, but I've had success passing through lots of other kinds of unsupported stuff including a Radeon card and lots of different USB adapters (both onboard and on pcie card). I'd just like to be able to figure out what's making these devices hang the host itself, which is baaaaaad. If the only symptom was that the VM wouldn't start or the passed through device wasn't usable it wouldn't bug me so much.

One thing both machines have in common is a Dell pcie storage adapter. a SAS6ir in one and a PERC6i in the other. I'm wondering if it could be some kind of resource (IRQ?) conflict?

tl;dr summary: Anyone ever try to passthrough a device and have it lock up the host? If so, did you ever solve it? Anyone have any general advice as to troubleshooting vmdirectpath other than the standard stuff covered here?: http://www.vmware.com/pdf/vsp_4_vmdirectpath_host.pdf

Thanks!

EDIT: I should have added this is ESXi 5.0 but I'm pretty sure I saw the same behavior on 4.1.
 
Last edited:
Normal to also have to check the PCI bridge.

Neither of those cards are supported for VMDirectPath. It's for things like NICs, HBAs, etc. Not surprising.
 
Neither of those cards are supported for VMDirectPath. It's for things like NICs, HBAs, etc. Not surprising.

I understand that--and even attempted to "pre-but" that in my first post. But my point was that VMDirectPath works for all kinds of things that aren't supported. Furthermore, when I've read of problems people have with it--it's almost always that the VM boots but the passed-through device doesn't work (or work properly). To me it just seems like a very severe side effect to have the entire host lock up to the point where it must be physically powered off and back on.

I'm merely trying to see if anyone else has had that kind of behavior from ESXi when passing hardware through. I'm trying to see if I can get some educated guesses that can help me troubleshoot this.

Thanks!
 
Not supported, to put it simply, VMware engineering won't really look at it.

that being said...

TV Tuner - can't help ya ;)

Fax card - there SHOULD be ways to get that to a guest that ARE supported. How does it show up to the host?

Does it PSOD, or just hard lock?

Set up a syslog server / collector on another machine. Do it again, and then get me /var/log/vmkernel.log from the syslog machine - that way I can see where it died.
 
Last edited:
Not supported, to put it simply, VMware engineering doesn't care.

I have to say, as someone who works on OS kernel stuff, I'd be embarrassed to have that said about me (that I don't care if something causes a crash, even if unsupported.)
 
I have to say, as someone who works on OS kernel stuff, I'd be embarrassed to have that said about me (that I don't care if something causes a crash, even if unsupported.)

Yeah..I have to agree...I think they care, after all it's a feature they included in their product, unsupported or not. They may not care to address any issues, but they certainly should care that this feature will allow the remaining 10% of servers, ie Fax Servers..etc to be virtualized and supported. For instance, we have several Call Monitoring systems with T1 boards...IVR's..etc that could be virtualized with the proper hardware passthrough support.
 
I'm sure there's a difference between caring personally and caring professionally.

Re: They don't get paid to care about that.
 
Adam, he was referring to Engineering collectively, no? I'm not sure I understand your second sentence. Maybe Lopo overstated the case, but if so, saying 'we do not care' doesn't, IMO, send the right message. To be as clear as possible: I don't expect them to support every random board around, but the idea that plugging in a card and having that cause a crash/lockup would bother me (and my management.) Then again, maybe I'm picky (our customers certainly are, and they have every right to be...)
 
I have to say, as someone who works on OS kernel stuff, I'd be embarrassed to have that said about me (that I don't care if something causes a crash, even if unsupported.)

If it happens with a supported device, they'll care a LOT. If it's not, well, to put it simply, other priorities. Welcome to enterprise software.

Once I find out what's causing the problem, I'll find out if it's possible to hit it on a supported platform. It may even be unique to the motherboard the OP is using, and if it's not certified, it may be because it ~couldn't~ pass the certification tests due to an issue with the board itself. Therefore, why fix something that wouldn't even be allowed to run it in the first place? Hardware vendors certify to ESX, not the other way around :)
 
I'm sure there's a difference between caring personally and caring professionally.

Re: They don't get paid to care about that.

Pretty much. VMware gets paid to make the software work on supported systems that have been certified by the hardware vendors. They'd LOVE to make it work on everything and anything for the entire world, but that isn't where the $$ is.

There are absolutely supported ways to get FAX devices through to a guest - ESX has been doing that in various forms since the 3.0 days. TV Tuners though? That's not something that is a normal enterprise use case, and as a reminder, ESX is an enterprise product. Now, if it was an issue with fusion/workstation, which are ~not~ enterprise software packages, that's a different story, but remember - ESX is built for businesses running servers, not really folks playing with it at home. Much like Microsoft isn't fixing weird use cases that people run into with the free version of Hyper-V, or Oracle the free version of Xen, there are things the software is built for, and things it isn't built for. You can get it to do a lot of things it wasn't built for, but if it doesn't do one of those well, there's only so much effort that will be spent looking at it.

Now, what I'd REALLY recommend, is file a feature request - those are taken VERY seriously. Make a justification for an enterprise level use case, and things will be very much looked at.
 
You can virtualize fax servers now, you just need to abstract the fax card. Look into dialogic media gateways...
 
Adam, he was referring to Engineering collectively, no? I'm not sure I understand your second sentence. Maybe Lopo overstated the case, but if so, saying 'we do not care' doesn't, IMO, send the right message. To be as clear as possible: I don't expect them to support every random board around, but the idea that plugging in a card and having that cause a crash/lockup would bother me (and my management.) Then again, maybe I'm picky (our customers certainly are, and they have every right to be...)

I clarified it, but effectively, few efforts will be spent on unsupported devices. :) especially when used with an experimental feature (note: Experimental means "We're working on this, you can use it, but if it breaks, all we can do is say "thank you for letting us know, please wait for it to be supported and let us know if it still doesn't work"")
 
You can virtualize fax servers now, you just need to abstract the fax card. Look into dialogic media gateways...

Why would I need to "buy" another product to virtualize my servers contiaining T1 Boards and Fax boards when the basic functionality is already built into the vSphere product? I'm looking at utilizing what's already there and paid for but fully supported.

I'm sure that VMware is always looking to get more hardware vendors certified. I'm not complaining per say about the functionality, I would just like it supported..and yes..I do understand that's pretty much up to the hardware vendors. If it were supported, we could be pretty much 100% virtualized on our environment as it sits today.
 
Why would I need to "buy" another product to virtualize my servers contiaining T1 Boards and Fax boards when the basic functionality is already built into the vSphere product? I'm looking at utilizing what's already there and paid for but fully supported.

I'm sure that VMware is always looking to get more hardware vendors certified. I'm not complaining per say about the functionality, I would just like it supported..and yes..I do understand that's pretty much up to the hardware vendors. If it were supported, we could be pretty much 100% virtualized on our environment as it sits today.

What type of fax cards do you have, and how do they show up to the hosts? There are ways to pass them through in many cases that does not utilize VMDirectPath, and I believe that passthrough for several fax cards is in the pipe or the like, if not already supported (they're really just a serial device).
 
For Call Recording and IVR we have:

Dialogic D/480JCT-2T1-EW

For our RightFax Servers we have:

Brooktrout TR1034+P24H-T1-1N
 
I guess it depends on how you look at this. In my world, customers don't like to have crashes/lockups even if you plugged in an unsupported board. I was trying to be as clear as possible about the distinction between 'we do not support this and if it does not work for you too bad', and 'if you try to use an unsupported board, we might crash, sorry...' In my world (which certainly IS the enterprise world), customers very much would find the 2nd statement much less acceptable.
 
Last edited:
I guess it depends on how you look at this. In my world, customers don't like to have crashes/lockups even if you plugged in an unsupported board. I was trying to be as clear as possible about the distinction between 'we do not support this and if it does not work for you too bad', and 'if you try to use an unsupported board, we might crash, sorry...' In my world (which certainly IS the enterprise world), customers very much would find the 2nd statement much less acceptable.

The support stance on experimental features is very simple - unless specified otherwise, if it doesn't work, there's very little that can be done. Request that the product you want to work with it be made to work, and it'll be investigated if there is a business case to be made. Otherwise, it's experimental - it may work, may not work, or may turn your server into a pumpkin. Results are unknown, it's often completely untested, it may even be something known to blow up your box. That's why it's labeled experimental. The only things not experimental with VMDP are some network cards and that's all that I know of.

Many experimental features are there as basic proofs of concept, or "we're working on this, play with it some" bits - and there's a reason it's not out of experimental yet.

Another thought - there's a significant difference to "I made it crash" and "This board you never tested, never approved, never worked with, on this motherboard you never tested, never approved, and never worked with, using this weird feature that is considered experimental only, made your software crash".

Fixing the first is a priority. Fixing the second would take an infinite number of programmers an infinite number of man-hours, as there are an almost infinite number of possible oddities that can be encountered with the almost infinite number of unsupported device combinations out there. We cannot list what is unsupported as that is an infinite set, so we list what is. If you stray from that list, weird may happen. Or it may not. ;) (Lord knows - I run ESX5 on Dell D630 laptops). If I can prove that the same crash can be encountered on supported systems using supported hardware and features, then we've got something that can be worked on. If not - well, that's what happens when you stray ;)
 
Another thing - someone pointed out that we're probably looking at this a bit differently. Does VMware care about the crash? Absolutely - crashes are important to look at. Will it be a priority to fix? Not unless it happens on supported hardware too.
 
Yeah, I get that. That said, I thought direct path was supported for at least certain nics? In any event, maybe I'm thinking too much like an kernel developer, but it seems to me that however this all works, the kernel should be able to map pci space safely. On the other hand, I freely admit I don't know how it is implemented in the software, so maybe that isn't possible to do 100% safely :)
 
Yeah, I get that. That said, I thought direct path was supported for at least certain nics? In any event, maybe I'm thinking too much like an kernel developer, but it seems to me that however this all works, the kernel should be able to map pci space safely. On the other hand, I freely admit I don't know how it is implemented in the software, so maybe that isn't possible to do 100% safely :)

It isn't ;) Especially since some things do REALLY weird things with IRQ and addressing space (most of the time that's what these are related to - your card assumed it was in a normal world and asked the PCI bus through the card to do something no one expected, and because directpath assumes (for lack of any other way) that you'll play nice, it let you sabotage the host :p).

It is supported and tested on a set of network cards, and partner supported on some FC HBAs for FC Tape. All others, experimental only :).
 
Boy this thread really blew up :D

Really appreciate the offer to help lopoetve--I will work on setting up the syslog collector here in the next couple of hours and will repeat to see what can be gleaned from the logs.

Good discussion too. Like I said, I have no sense of expectation for VMware to help with this since it's unsupported (which I why I haven't bothered to open a case with support about it and came here instead :)). But at the same time, as others have said, having the host itself hard lock is also not something I expected from ESXi and I would be interested in knowing what was causing it.

To answer your question lopoetve--mainly it's just a hard lock at the ESXi information screen (with the IP address, etc.) although once I did get the psod, wish I had taken a picture of it but it mainly mentioned lockup of a PCPU. I think that was when I tried booting with the TV tuner rather than the fax card--can't remember for sure. But now that this has happened so many times I really don't give it long after it locks up before I manually power it off and back on... maybe if I left it longer in that state the PSOD would eventually appear? I don't know.

Anyway, again thanks for the responses and opinions. To be clear (I don't believe I mentioned it before) this is a Brooktrout TR1034 PCI 4-port analog fax card. In the "Mark Devices for Passthrough" screen it shows like this:

00:1e.0 | Intel Corporation 82801 PCI Bridge
0b:04.0 | Motorola MPC8245 [Unity]

Will reply back ASAP with the logs.
 
Give it a bit to see for certain if it'll PSOD :) that'll be the main key, but the pcpu lockup from what I know is almost always that the card asked the bus to let it do something, it did, and vmkernel didn't like losing control of whatever it did :) I'll look as soon as you get them :)
 
Why would I need to "buy" another product to virtualize my servers contiaining T1 Boards and Fax boards when the basic functionality is already built into the vSphere product? I'm looking at utilizing what's already there and paid for but fully supported.

I'm sure that VMware is always looking to get more hardware vendors certified. I'm not complaining per say about the functionality, I would just like it supported..and yes..I do understand that's pretty much up to the hardware vendors. If it were supported, we could be pretty much 100% virtualized on our environment as it sits today.

Don't forget that you also gain vmotion capability if you use a DMG.
 
Back
Top