LUNs dropping in Nexenta

StarMonkey

n00b
Joined
Mar 10, 2011
Messages
30
Hi,

Can anyone help me diagnose and fix my homebrew SAN? When it works I'm seeing upwards of 700mb/s performance but it only works for half an hour to an hour before the LUNs appear as dead in ESX (which kills vSphere almost completely).

I have tried disabling Hardware Accelerated Locking and the Data Mover properties in ESX with no success. I've set the LUNs to Round Robin.

Nexenta is a VM (6GB RAM, multiple CPUs/Cores) and has NMDtrace disabled due to the amount of CPU load it generates (anyway to auto disable this on reboot btw?). I have 4 x 450GB SAS disks connected to an LSI HBA passed through to the VM.

The LUNs are presented to ESX through FC (card passed through to VM).

I'm new to Nexenta and not hugely experienced in Solaris so let me know how to run any diagnostics that may help.

Thanks in advance!
 
What does "zpool status" report while things are working versus when they're not?
How about "stmfdm list-lu -v", "stmfadm list-view $MYLU", and "stmfadm list-target -v"?
Does "fmadm faulty" show anything?
 
zpool status seems to freeze when bringing back config when it's not working. It shows online and then stops responding.

When working everything comes back ok and no errors found.

I'll try the other commands you suggested.
 
Flaky crap like this is the main reason I stopped trying to use nexenta. The GUI had some spiffy things, but just too flaky for my taste...
 
the CPU using anything available is annoying too in Nexenta.

I've tried OI and Solaris 11 but couldn't get the LUNs to show in ESX, I must have been missing a setting or package.

Nexenta is the first OS I've found that I can actually get LUNs presented to ESX on.

When I run the commands and Nexenta is responding I get all the expected OK responses. fmadm faulty returns literally nothing, just moves onto a fresh command line. Targets, views etc all showing as ok.

I'm just waiting for Nexenta to fall in a heap again before I can run the above commands in failed mode, should be over night! I suspect Nexenta will freeze when I try to run any commands.
 
Well decided to keep Nexenta running for a few days with no VMs running - it's stayed up with no problems.

Started one of my VMs today and within half an hour my LUNs are showing Dead or Error in vSphere.

Nexenta still responsive - all commands suggested above return normal results. fmadm faulty returns nothing.
 
(Not being facetious here) Switch to something else. A lot of the good people from nexenta seem to have been let go, quit, whatever.
 
Happy to but in OI, Omni and Solaris I can't get my Emulex cards to present visible LUNs to ESXi.

They show connected and I configure everything the same as I do on Nexenta but nothing is seen on vSphere.

Nexenta is the only product I've found that comes close to working so far :(
 
Have you tried passing the LUN's through regular gigabit? I do the same thing (nexentastor with VT-D passthrough on ESXi) but I just use the E1000 adapters and don't seem to have any issues. I know that won't get you 700MB/s, but stable is better than fast sometimes.
 
iSCSI works fine but it's not fast enough for my needs.

The SAN will be hosting Tier 1 storage for my VMs for both vSphere and Hyper-V and Tier 2 for file storage.
 
Does your FC card support Ethernet? I have an Emulex card which uses the oce driver and does 10gbe just fine. That might solve the problem.

I used iSCSI for a while, but found NFS a lot more flexible. It might be worth investigating that.
 
(Not being facetious here) Switch to something else. A lot of the good people from nexenta seem to have been let go, quit, whatever.

For the record, that's not only untrue, the reality is quite the opposite.

As for the question of the thread -- I'm going to just come out and admit that Nexenta does absolutely zero testing of FC in Target mode on Community Edition. You'll note there's no support for it, and that FC Target is a plugin costing extra cash on Enterprise Edition. There's no motivation to support this on Community, nor is it tested, and it definitely isn't tested via passthru devices with Nexenta as a VSA with only 6 GB of RAM. YMMV. :(

I have to go with hotcrandel. If you're looking for stability in that situation, use Ethernet, not FC in target mode on Community Edition.
 
"For the record, that's not only untrue, the reality is quite the opposite."

Well, I don't know, Over the last couple of years, it seems from what I had read that some of the people who had been there awhile and moved on. I know when I dabbled in nexentastor, I got frustrated (like a lot of folks) on the forum because there were known crashing and/or outage type bugs that never seemed to get addressed. I know a public forum is not the same as official support, but when you have paying customers complaining about a serious bug that was not fixed in 2+ years, it doesn't do a lot for perception of quality and etc...
 
"For the record, that's not only untrue, the reality is quite the opposite."

Well, I don't know, Over the last couple of years, it seems from what I had read that some of the people who had been there awhile and moved on. I know when I dabbled in nexentastor, I got frustrated (like a lot of folks) on the forum because there were known crashing and/or outage type bugs that never seemed to get addressed. I know a public forum is not the same as official support, but when you have paying customers complaining about a serious bug that was not fixed in 2+ years, it doesn't do a lot for perception of quality and etc...

First, I'm not here in any official capacity. My comments are my own. I am also not in Marketing or Sales. :)

There are unfortunate incidents, definitely. Especially in terms of public Community/forum engagement, as opposed to internal ticket-based support. Some of those issues might be resolved, and the problem is it has never been properly communicated publicly. I will say, commercial Nexenta has 1000's of active installs encompassing 100's of PB's of data that are not crashing, including at some very, very large companies in heavy production use-cases. That may not be visible externally, however.

One of the largest historical problems re: Community has been that the choice of hardware set for installs of it is almost universally stuff we won't even allow on Enterprise, because of known issues, stability issues, etc. I'm not saying that's always the case, and I'm not actually going to defend my employer's product re: perfection - it is NOT perfect :), and I won't argue that the attention to the Community edition has been lacking on any number of fronts, but I will say Enterprise edition on accepted hardware is generally solid in most environments it ends up in.

But my comment was more about the statement we've lost people. Every company has attrition, of course, but we have lost very few people over the 3 years I've been here that I think were unfortunate. On the whole, IMHO, the skillset and knowledge of the employees at Nexenta has increased significantly over the years. It definitely has not decreased.
 
You would certainly know better than I. All I can say is I saw a pattern over and over where people on the forum would complain about a showstopper bug or whatever and... crickets chirping. Then, we would get the spin that 'nexenta does not provide support in a free forum!'. True enough, only a bug someone runs into (UI freezing, luns dropping, whatever), should be cause for concern for any product. I've worked in sustaining engineering one place or another for many years, and I can tell you that the perception people get when someone reports some very bad behavior and is totally ignored, even 2 years later, is not a good one. And a couple of long-time posters on your forum called nexenta out on this and... crickets chirping. My experience has been that if you're going to have a public forum, it needs to be monitored and people need to be replied to, or shut it down. Otherwise, you risk the perception that 'they are not replying to post X because there IS an issue!' Anyway...
 
Valid criticism, I hear you. I'd rather not speculate on what, if anything, the company is (or is not) doing to remedy that (I do agree with you, though).

I will say that lack of response might be for any number of (possibly bad) reasons, but one of those reasons will not be an unwillingness to acknowledge a problem is a problem. Hah! No. NexentaStor has problems. Every software product has problems. Show me one that doesn't. :)

As a for example: you mention UI freezing. Known problem - NMS can get hung up, causing both the NMV web UI and the NMC SSH UI to cease responding. Root cause is architectural in nature, making a quick fix impossible. Long-term fix IS underway, no ETA available publicly at present. Documented workarounds and steps to recover are available to Enterprise customers. Most importantly: UI freezing does not in any way effect data services. NFS, CIFS, iSCSI and so on all still work fine even if the Nexenta Management System has completely hung up. Sometimes you'll see the UI hang up AND have data service issues, but that's two symptoms of a root cause that is not the UI itself, typically that'll be a hardware fault of some sort.

But I digress.

FC LUN drops are not a known issue on Enterprise, AFAIK, but as I said, considering the environment described is not supported even on Enterprise, and the functionality in question is not supported nor tested on Community at all whatsoever, I would advise avoiding it and sticking to iSCSI. Or, as danswartz suggested, try something else! ZFS is robust and portable (that's one of the main selling points!), you could export the pool, shutdown the Nexenta VM, boot up another VM running some illumos distribution that's more up to date than Nexenta 3.x (OmniOS, maybe?), and see if it has any improvement on the FC target situation. Or FreeBSD (totally don't know where FC target support sits on FreeBSD, tho). Or ZFS On Linux (though I dunno about that one, IMHO it isn't quite ready for heavy use, especially enterprise production use-cases, and I'm not personally aware of anyone doing FC on it either).

This ability to move your data off Nexenta is a selling point. It is one of the primary reasons I love ZFS and the 'open storage' movement. You will see me be sad when someone abandons Nexenta, but quite happy when they did so by simply exporting their pools and moving to another OS. Imagine the time saved there, versus if they'd been on something proprietary and got stuck migrating data using rsync or something. Ugh. Plus, as long as it's still on a ZFS zpool, I know that if Nexenta some day fixes or improves whatever functionality or behavior was important to you, maybe we could win you back, again without you having to do some migraine-inducing migration. :)
 
All valid points! In respect to the last one: I had my data pool (primarily for ESXi VM storage, via NFS) on nexentacore, then went to openindiana, then to illumos, then to zfsonlinux and (at the moment) on pc-bsd. All without changing anything (well, technically, I did upgrade the pool to use feature flags when I was on OI - LOL). I gave up on ZoL for the moment because it has issues with device presentation due to the asynchronous nature of the udev subsystem - if you are unlucky, your pool is imported (or attempted anyway) before udev recognizes some/all of your disks. I'm sure they will address that, but for now... I am on PC-BSD for now because it uses stable freebsd code base, with illumos-like boot environment capability, which has been a lifesaver more than once...
 
Back
Top