Ever deploy a ZFS set up for a production environment?

jin0 · Nov 23, 2011

If you have, please share the pro's and con's.

We have around 60+ VMs currently connecting to a 2k3 server with an iSCSI initiator target. Everything was fine at first but as we add more and more VMs, problems arise: disconnection issue, slow boot up....etc

We're looking to upgrade our storage and ZFS is one of the option, except we're not sure about long term reliability/usages. Keep in mind each VM has it own Oracle DB running on 2k8 R2.

thanks

TCM · Nov 24, 2011

First I would ask myself why on earth I need 60+ databases on SIXTY DIFFERENT OS INSTANCES running off of a SINGLE SERVER.

Maybe then ZFS wouldn't even have to solve the mess.

olavgg · Nov 24, 2011

Wouldn't it be a lot cheaper to place each DB in its own schema for a single Oracle instance?

You would save so much when it comes to what the licenses cost, that you could have a ZFS san for each of your VM

I don't really think ZFS will help with this problem. You need more disk pools / iops

BENN0 · Nov 24, 2011

This might be obvious but SUN/Oracle still sells storage solutions based on ZFS,
http://www.oracle.com/us/products/servers-storage/storage/nas/overview/index.html

davros123 · Nov 24, 2011

Holy crap! Who designed that?

It's unclear from the post : Where is the bottleneck and what problem are you trying to solve?

Depending upon the database sizes/usage, consider running ssd's for the target array.

wrt. your orig. question, "except we're not sure about long term reliability/usages", you should do some reading. It's an enterprise solution - it's reliable as you make it and imho far far more so than Server 2k8.

mikeo · Nov 24, 2011

Running oracle in any "soft partitioning" virtualization software is pointless for most setups. You pay more in licensing costs.

"Soft partitioning is not permitted as a means to determine or limit the number of software licenses required for any given server."

http://www.oracle.com/us/corporate/pricing/partitioning-070609.pdf

Soldier101 · Nov 24, 2011

consolidate all of that stuff....no reason for 60 separate vms each with it's own oracle instance when you can most likely have one oracle server running several databases on it that stuff connects to

TCM · Nov 24, 2011

Smells like a hack to cheat around licensing.

jin0 · Nov 24, 2011

LOL calm down guy. We do virtual hosting for medical offices. Each medical office ha it own virtual server, hence the reason why each server has it own Oracle db.

60+ VMs are spread out on (4) HP ProLiant DL380 G7 with dual X5670 and 128gb of RAM.

We know the bottle neck is from I/O as currently everything is handle by one big 8TB storage. We're looking around for a storage solution, so we're open to all solutions that would be best dollar/performance ratio.

sorry if my post is a bit unclear...it's Thanksgiving

Happy Thanks Giving ya'll

olavgg · Nov 24, 2011

There are plenty of ZFS success stories around. Including myself. I've had drives failing, writing corrupt data and ZFS taking care of this.

ZFS would help a SAN with deduplication(Which I belive your VMs will benefit greatly from), compression and an additional cache and logzillas.
If you have 60 VMs I would expect to see a 10x dedupratio, maybe all you need are a few SSD drives

mwroobel · Nov 24, 2011

olavgg said:
There are plenty of ZFS success stories around. Including myself. I've had drives failing, writing corrupt data and ZFS taking care of this.

ZFS would help a SAN with deduplication(Which I belive your VMs will benefit greatly from), compression and an additional cache and logzillas.
If you have 60 VMs I would expect to see a 10x dedupratio, maybe all you need are a few SSD drives

There may be HIPAA implications for the deduplication due to medical privacy and comingling of data among the VMs.

HDClown · Nov 24, 2011

I would not put ZFS into production unless it was a turnkey product sold with full 24x7x4 support. That being said, I know plenty of guys who run businesses off open source, have nothing to rely on but their internal guys, and they are happy as can be.

But for me, I want someone I can get on the phone, and in general, the guys running the companies I have worked for, and consult with, want a major company who their own people can go to if they get stuck.

There is a huge cost to buying those turnkey solutions and support, but I've seen, more than once, where that premium was paid off, in full, and many times over.

On the flip side, if you go a full DIY route, you can have more layers of redundancy, replication, etc., so make not having that turnkey support mechanism to fall back on as critical. Ultimately, there are trade-offs one way or another.

mwroobel · Nov 24, 2011

There's an old saying in the computer business... No one ever got fired for buying IBM. It means that if you are responsible (and your job is the one on the line if something goes wrong), you are safest by going with a tried and true product from a reputable and reliable company (who will stand behind it, with 24x7 support and an SLA if possible) and not by something you built yourself. That being said, as above, there are many people and companies built and running on open source software with no problems. That said, before you bet your job and 60+ medical practices on a solution, I would be much more sure than just having some questions answered by a few people online in a thread.

aitoribarra · Nov 24, 2011

Can you give a bit more detail about your current storage. OK, it's Win 2003, software iSCSI (is that the MS target, i.e. an old version of WSS?). What RAID controller? How many drives, RAID level. 8TB - is that used capacity, useable capacity, native capacity?

I run a similar number of VMs off a software iSCSI target. I don't have a storage bottleneck. BUT...

1) I have two iSCSI boxes doing synchronous replication (i.e. RAID-1 over the network. I can kill an iSCSI box and the VMs don't skip a beat. It does mean that I need double the number of drives of a single box solution though.
2) I'm using system RAM in the iSCSI boxes for WB cache. This helps a lot with databases, and means that the 512MB cache on the RAID cards doesn't get eaten up
3) I use 10Gbit Ethernet between the boxes and the VM hosts, dedicated to iSCSI traffic. And another 10Gbit Ethernet between the two boxes.
4) VM's with high I/O get their own iSCSI LUNs, which map to dedicated RAID sets on the iSCSI boxes. I have a mix of drives, including cheap 2TB in RAID 6 for low iops/high capacity, expensive 300GB SAS 10Krpm for medium stuff in RAID 1 or 10, and SSD for high end.
5) The iSCSI boxes are Windows 2008 R2, run Starwind iSCSI target, and have 50W Xeons. They are not taxed very much even with lots of i/o so I run a few VMs on them as well!

Whether you go for ZFS or whatever, make sure that the storage hardware and network are up to the job. I would dedupe the VM's OS volumes (as they will be nearly identical) but not the data volumes. Ideally give each VM its own spindles for data - even if it's just a RAID-1 pair. If you can't do that then at least try something like LSI CacheCade which uses a RAID of SSDs to keep hot data in.

If your clients can afford to pay you enough for all the oracle licenses, surely they can also afford a couple of SSDs each as well? Last time I bought Oracle (a long time ago) they charged by the Mhz!

jin0 · Nov 25, 2011

I dont remember the exact model of the storage, as we have it hosted at a colo location. From the top of my head all i remember it's HP Storagework P2000 or something. We're running raid 5 with 8TB of usable storage space. All 4 esxi boxes are connected to this storage box via a 2 switches that are configured for fail-over.

We have about 2TB left, so space is not really an issue...at least not yet. But as we're growing...and growing fast, performance is becoming an issue. SSD is not an option as it's too expensive

(according to upper management).

We've talked about FC and 10gig iSCSI but once again, we're unsure if it will take care of our performance issue. Should we put all the DBs on it own raid array (like a 2nd storage?). Co-worker suggested not more than 5 VMs on each LUN on the new storage box to offload the i/o issue.

Currently we're not doing any duplication. Only Veeam for back up.

All suggestions are welcome. Thanks.

octoberasian · Nov 25, 2011

I'm not sure how your upper management works, but my friend and I were talking about it and came up with the following:

As someone suggested above, switch to a single DB instance of Oracle. My friend suggested maybe a single DB instance per server since you have 4 Proliants.
Spread the databases across all 4 servers using a storage system of your choice like ZFS. Move the DB or the OS handling the DB onto its own drive, and the rest of the databases to the ZFS drives. There is a great thread here regarding ZFS and given that each Proliant you have has 16 drive bays, you could expand storage when needed through ZFS pools and what not.
Double the RAM per server from 128 GB to 256 GB. Someone in another thread here and another forum suggests that databases needs more RAM. Those Proliants can support 384 GB max. Each X5670 supports a max of 288 GB.
I/O issue... you'll have to talk to someone else here that can help with that. The only thing I can think of is something that Facebook or EVE Online does with their servers but that can be costly. Lol.

http://www.anandtech.com/show/4958/facebooks-open-compute-server-tested
Friend also tells me that you guys are really taxing those Xeon CPUs with 60 VMs (and growing). You should really consider the previous poster's suggestion from above.

brutalizer · Nov 25, 2011

olavgg said:
ZFS would help a SAN with deduplication(Which I belive your VMs will benefit greatly from), compression and an additional cache and logzillas.
If you have 60 VMs I would expect to see a 10x dedupratio, maybe all you need are a few SSD drives

ZFS dedup is broken. Never use it until it is fixed. You need 1GB RAM for each TB disk. One guy deleted a dedupped snapshot, and it took several days.

Never use ZFS dedup. But ZFS compression works fine.

mikeo · Nov 25, 2011

ZFS clones would likely be a lot faster than dedup in this case.

REDYOUCH · Nov 25, 2011

Ditch the HP Storageworks/Win2k3 server. Get something like a NetApp or EMC VNX with the proper split of SSD and SAS drives. Upgrade your network to 10gig for iSCSI. Problem solved (well this problem solved--your environment sounds like there is fail everywhere).

HDClown · Nov 25, 2011

Alternative to 10g networking is an HP P2000 G3 MSA, SAS connected. SAS connection will eliminate any possible concern about your network I/O to your storage causing your problems. Plus, it's going to save a TON of money. Unless your current P2000 is < 1 year old, then you don't have the most current mode, which is vastly different than the G1/G2 stuff. It's really an impressive unit and somewhat unique in the enterprise marketplace from a major OEM.

Downside to SAS connection to your storage is you can only connect a maximum of 4 hosts if you want redundancy (each host having 2 SAS connections to the P2000 G3, one to each controller). If you are OK without redundancy on some or all of your hosts, you can have up to 8 hosts connected. 8 is the maximum # of SAS connections.

If you outgrow the # of hosts, and want to invest in 10g iSCSI or FC, you can swap out some parts to go this route as well.

However, I think some of your issues may be with design, not hardware. You should investigate ways to optimize your software design first, then determine if you need new hardware.

Atomicslave · Nov 25, 2011

Your issue is most definitely IO and CPU, I would look into a new array and more servers.

The dell equallogic stuff is nice for ISCSI, I just bough 2x PS4100XV's for my shop.

jin0 · Nov 25, 2011

Current set up was not designed by any of us here and admittedly it's not the best design. Single storage hook up to 2 switch running 2 esxi. We just put in two more to handle additional VMs. The plan is once we have a new storage system put in, we'll migrate most of the VMs over and use the old storage system for back up/HA...etc.

Someone mention to combine the db, this cant be done as each medical office has to have their own server and their own db. I like the idea of SSD and SAS. I brought that up with my co-worker but still havent talk to my boss about it.

Thanks for all the suggestion guys, keep them coming.

agrikk · Nov 25, 2011

I after reading the thread I'm still not sure of the reason that "each medical office has to have its own server and db.

Creating a single DB server with sixty instances and IP addresses bound to single instances with access to each DB locked with account policies per instance (and you could go even further with ACL's on your switches and routers to restrict traffic) rids you of the overhead of sixty OSes running.

You should double check your security requirements on that one.

But to answer your question, you can also so something like two servers with 14-drive DAS trays attached (similar to a Dell R210 with a MD1000 attached) running Openfiler replicating between the pair could work. Although since this is a production system, your hardware needs 24/7/4-hour support and a pair of servers and their das and warranties and support contracts might put you in the range of a smaller NetApp (like the FAS 2050) or a Equalogic or Compellent solution.

jin0 · Nov 25, 2011

each server = dc. Due to privacy/practices and the way our software is designed, it needs it own db.

tdg · Nov 25, 2011

60 VM's on a single server is insanity to me, however if you're sure your limitations are IO then ZFS can definitely help. ZFS's is perfectly suitable for production use, I use it at home and work without an issue. You didn't say how much storage or what kind of redundancy you need though. Stripped mirrored Vdev's (similar to RAID10) is very fast and good for database use. Also creating a dedicated mirror for ZFS's ZIL will boost performance.

lightp2 · Nov 25, 2011

Hi, just an operational observation. No objection to all the comments.

One possibility of the original scheme is that at a short moment of notice

1. You can vmotion 60 VMs in such config running on 2 hosts split to 30 physical hosts in short notice. Each of the VM is independent.

2. As such, in future when you are ready, you can float 60 VMs to various Private/Public Cloud Platforms.

3. Obviously there are some steps involved based on current arrangement. It is true it is not so convenient currently.

4. If you tied to single giant database for 60VM, the VMs cannot float freely because they must find way back to the database host.

5. When the arrangement is ready, 60 independent VMs can float freely to various platforms. Flexibility.

Ever deploy a ZFS set up for a production environment?

n00b

Gawd

Limp Gawd

Weaksauce

Weaksauce

Gawd

Gawd

Gawd

n00b

Limp Gawd

Supreme [H]ardness

Limp Gawd

Supreme [H]ardness

Weaksauce

n00b

2[H]4U

[H]ard|Gawd

Gawd

Supreme [H]ardness

Limp Gawd

n00b

n00b

Gawd

n00b

2[H]4U

Gawd