Let's talk about replacing an EMC Isilon

Thuleman · Jul 17, 2015

In all seriousness, is there scale out solution at all that comes even close to the resiliency of the Isilon and can deal with ~230 million files in ~10 million directories for a data volume of several hundred TB?

The most prominent Isilon issue is file locking. When using a share as a traditional mapped network drive in Windows, users often end up locking their own file. For example, I am editing an Excel file and I shut down Excel (X out) rather than closing the file manually. Then I change my mind and want to edit the file again. However, I can't because it says that it is locked by me. The Internet is full of complaints about this, it's a known issue and there's no solution other than removing the oplock flag.

The ideal replacement wouldn't have these file locking issues, would have versioning, and could deal with the object count.

Not sure whether there are any viable solutions that run on top of block storage which is formatted via the guest OS in the VM that servers up the shares.

It can't be a hacked together solution, there needs to be vendor support for it since this is for a 24/7/365 mission critical environment.

What do you guys use in these situations?

Concentric · Jul 18, 2015

Interesting, just in the process of setting up Isilon and wasn't aware of this issue. Have you contacted EMC?

Thuleman · Jul 18, 2015

Yeah, we had tickets in, we had SEs on site, etc. etc., there's no solution to the file locking. If you look at the EMC forums (or the Internet at large) you will find a multitude of complaints.

Let me say that file locking is a good thing, especially for MS Office documents which are not designed for distributed editing. The problem with the Isilon is that it doesn't release the lock fast enough.

Our helpdesk gets 2-3 tickets per day on people being unable to access the files they just closed not too long ago. The lock is eventually released which leads to a lot of frustration all around.

The user is pissed because s/he can't get to his/her own file.
The helpdesk is pissed because they can't unlock the file, which makes the user tell the helpdesk how much IT sucks.
The systems engineers are pissed because by the time they get around to unlocking it it's already unlocked, they call the user (who is pissed) to confirm, and get an earful from the user about how much IT sucks.

To unlock the file you have to have root access and the helpdesk won't be issued root on the Isilon. It's a mess, it's been like that for the past 4 years. Now it's time to either replace the old nodes with X410s, or maybe look at a different solution.

I believe (hearsay, didn't check this myself) that the locking is less problematic when using MS DFS (presumably on top of block storage shared out via SMB), but I don't think that DFS can handle the object volume we currently have.

danswartz · Jul 18, 2015

Did you mean 24/7/365?

Thuleman · Jul 18, 2015

danswartz said:
Did you mean 24/7/365?

LOL
Good find! Corrected.

kdh · Jul 20, 2015

I have a 650TB isilon install that I'm currently using as a backup target and I've had a handful of locking issues with commvault but for the most part I haven't had many issues.

You could make it fairly self serviceable. create a folder on the ifs share that the help desk can access. create a text file, and in the text file, put the name of the files that are locked. Then have a cron job on the isilon node that process that file, and removes the lock for it. have the cron job run every 5 minutes. I just checked, you can create cron jobs on the nodes as the root user.

You could even write a cron job that will scan the filesystem looking for locks and say any lock that is older then X days, auto remove the lock.

Hack? Yes.. But until EMC makes the lock time out value settable by the user, then this might be the best solution. Actually, it might be in a config file.. you just have to find it. I changed a lot of my isilon settings due to InfoSec requirements.

Other then netapp, I highly doubt you will find a scale out solution better then isilon. The closest you may find next to netapp, is some chumpy cobbled together ZFS solution.

Thuleman · Jul 28, 2015

Yeah we looked at Tegile for a moment but there are more cons than pros in comparison. The Internet believes that there are too many performance problems with ZFS, or perhaps Tegile in particular, to really use it as Tier 1 file storage.

Also looking at http://www.ddn.com products though not convinced that pricing will be substantially different from EMC and wouldn't be file storage per se.

kdh · Jul 28, 2015

ZFS does have a use for small to mid size orgs.. Outside of that? Call me stupid, tell me I'm wasting money.. but when I have critical data exceeding terabytes of storage.. the last place you'll find me putting my data on is some cobbled together zfs solution that runs on any commodity hardware. I personally want a system that runs on a limited hardware stack, with a known good operability matrix that doesn't have weird edge case issues. It needs to work, and I can't be bothered to baby sit it. On top of that.. I take pride in my environment to where I know if I get hit by a buss on the way to lunch, anyone who's been to emc training can sit in my seat, look around, say yep, I got this, and pick up where I left off.

_Gea · Jul 28, 2015

kdh said:
ZFS does have a use for small to mid size orgs.. Outside of that? Call me stupid, tell me I'm wasting money.. but when I have critical data exceeding terabytes of storage.. the last place you'll find me putting my data on is some cobbled together zfs solution that runs on any commodity hardware. I personally want a system that runs on a limited hardware stack, with a known good operability matrix that doesn't have weird edge case issues. It needs to work, and I can't be bothered to baby sit it. On top of that.. I take pride in my environment to where I know if I get hit by a buss on the way to lunch, anyone who's been to emc training can sit in my seat, look around, say yep, I got this, and pick up where I left off.

There are Petabyte installations with ZFS around as this is what ZFS was build for by Sun. If you need full support on dedicated hardware and want to skip cheap user configured SuperMicro configs check Oracle Solaris or NexentaStor. (with Nexenta you get SuperMicro as well).

Other option is, you ask OmniOS (free Solaris fork) about supported hardware and order OS support.

kdh · Jul 28, 2015

That's great for someone else, but for the work I do? Nope. I'm not in the business to save money with NexentaStor, or OmniOS. Solaris is dead in my environment and it isn't coming back into it. When solutions like that fall over and fall over hard.. You are forced to read message boards, or call a support line where you run the risk of support telling you are in an unsupported config or a config that is so unique that they can't help you.

In my world, I don't have time for that. I pay a premium for the hardware I maintain and support.. Some call it a waste of money, I call it insurance. I know I will have support from a dedicated human resource with in 4 hours.. No hoping, more like wishing someone responds to my call for help on a message board with most other zfs solutions. Honestly, the money I spend on support from a major vendor is still cheaper then any amount of money lost due to a downtime caused by ZFS solution running on cheap servers and hardware.

_Gea · Jul 28, 2015

Then, call NetApp
This is sort of ZFS with all what you want.

SuperMicro is not cheap server or hardware, its premium quality.
Its a toolset for high-end vendors or end-users to build top class server systems.

Solaris and its free forks are the prime ZFS system, followed by BSD and Linux.
This is how I classify them. Storage solutions are mostly based on one of the three.
You can use them vanilla or with full hardware support - depends on your knowledge,
downtime restriction or on your IT department.

If you have enough money, you can outsource storage knowledge,
but this does not mean you get better, faster, cheaper storage,
it only means others are responsible on problems.

My position is:
For some use cases, this is right, for others not.

kdh · Jul 28, 2015

^^ Your view is not wrong, at all. But mine is just a little different then yours. Which isn't wrong either.

I 100% agree with your position. What makes sense for you, may not make sense for me and vice versa.

In my current gig, a ZFS solution makes zero sense in my environment. I could switch gigs and that view could completely change.

_Gea · Jul 28, 2015

I work in the edu area.

I cannot comment about Oracle and their database users but as I see it
ZFS is a strong player with very high capacity storage in the edu area, at media producers,
internet providers or others wth high capacity needs like processing meteorological data.

Other pro users are quite random - this is probably a fault of formerly Sun, now Oracle, not ZFS.
If Apple would have bought Sun (what was a possibility), we would eventually see ZFS everywhere
as it is a killer invention, ZFS is a huge milestone regarding storage!

kdh · Jul 28, 2015

Totally hear and agree of ZFS in edu, media producers, and internet providers. Most times, those folks just need a huge dumping group of cheap and deep storage. As long as it stays online and is accessible, then you don't need a monster like EMC to come into your environment.

I work in the financial sector. I can't risk being down unexpectedly for more then a few seconds a year. And when I am, I need to have a vendor and a config that I can get me back online very quickly with minimal effort. I also need a solution that enables me to do non-disruptive upgrades. Most, but not all ZFS solutions fall on their face when it comes time for an code upgrade. So that makes them not a good fit for me.

_Gea · Jul 29, 2015

I understand this.
Your problem is not cost or the quality of the hardware, OS or the filesystem but
quality of the HA solution with a service level agreement.

spankit · Jul 29, 2015

Nexenta user here with an HA setup. Getting it up and running was fairly pain free. When you purchase the HA license it comes with support for the setup. Essentially they login and configure/test everything for/with you.

We had some issues early out of the gate caused by a faulty CPU in our main SAN node. It would essentially cause a Kernel panic and Nexenta would fail over to the secondary node. At the time these issue started we were running a 3 Hyper V servers in a cluster with an iSCSI target as the cluster shared storage. For one reason or another whenever the SAN would fail over to the secondary node our Cluster shared storage would go belly up and all our VM's would shut down and have to manually be brought back up. HyperV was getting very close to being on the chopping block at the time and this last hiccup was our chance to finally make a change. We took the opportunity to make a case to switch to VMware and as a result we never really took the time to find out what was causing the issue with Hyper V. I wouldn't be shocked to find out it was something simple that was missed with our iSCSI setup in Nexenta or with our Hyper-V cluster shared storage configuration.

That being said our VMware (NFS for storage) setup has been rock solid. The issues we were having with Hyper V (VM's all shutting down during a SAN node failover) are non-existent in VMware. We had the oppertunity to test this on multiple occasions while we fixed the CPU in our main SAN node. Overall we've been very happy with the performance and stability of it. It sure beats the pants out of the HP Lefthand SAN we were running before. Nexenta has been very quick to help us out when we had questions/issues and overall pretty good to deal with.

I'd highly recommend looking into them for your storage.

Thuleman · Jul 29, 2015

It's good to hear that Nexenta has good support, but I am with kdh that I am running an environment where I can't have the whole "well we had an issue early on". Can't have issues that would cause interruption in service.

We looked into it and came to the conclusion that we are sticking with the Isilon. Will be replacing about a dozen nodes with new X series shortly and then not think about it for a while.

kdh · Jul 30, 2015

I have a fair amount of experience with X nodes, NL nodes, and accelerator nodes as well. Feel free to ask me any questions you may have. I'm also running 6.5.X and 7.X branch of onefs. I really like what they did with the 7.X branch.

The only advice I have is if you get 6 nodes, the'll sell you an 8 port infiniband switch, shell out a few out extra bucks and get the 16 port infiniband switch. The cost difference between the 8 and 16 port is minimal. Upgrading to 16 port from an 8 port switch is doable, leaves you in a degraded state during the upgrade.

Let's talk about replacing an EMC Isilon

Thuleman

Supreme [H]ardness

Concentric

[H]ard|Gawd

Thuleman

Supreme [H]ardness

danswartz

2[H]4U

Thuleman

Supreme [H]ardness

kdh

Gawd

Thuleman

Supreme [H]ardness

kdh

Gawd

_Gea

Supreme [H]ardness

kdh

Gawd

_Gea

Supreme [H]ardness

kdh

Gawd

_Gea

Supreme [H]ardness

kdh

Gawd

_Gea

Supreme [H]ardness

spankit

Limp Gawd

Thuleman

Supreme [H]ardness

kdh

Gawd