Ceph Storage with OpenNebula or OpenStack

KapsZ28

2[H]4U
Joined
May 29, 2009
Messages
2,114
The question is more about using ceph and any pros/cons or recommendations. This is for a large website that is currently running on about 50 physical servers. The servers are web, db, and elasticsearch servers. Only 5 servers are KVM which are connected to an EqualLogic. There are some custom storage servers too for media.

The build so far would be custom built Supermicro servers running CentOS 7 with KVM and either OpenNebula or OpenStack. Both OpenNebula and OpenStack support ceph storage. This would do away with any central storage since we would be using erasure-coded pools.

Anyone here have experience with this type of setup and have opinions or recommendations?
 

4saken

[H]F Junkie
Joined
Sep 14, 2004
Messages
11,756
I have only played briefly with Ceph. I am currently in the middle of a RiakCS POC but not for the reasons you are looking. We are intending to use across multiple DC's, some medium latency, and also require a good S3 compatible API as we have a fairly substantial AWS presence as well. There are other reasons we picked RiakCS over Ceph and the other choices as well, A lot of it has to do with our vmware/aws/other orchestration(terraform, custom etc, chef) in multiple locations and the ability to setup RiakCS right on top of our Ubuntu LTS servers, instead of file as block, made it stupid simple to drop in a POC and get it going. From what I have seen Ceph is just about as easy to manage and deploy though.

My opinion is that this shit is fun as hell to play with and it definitely impresses folks once they see it in action. Openstack is on our roadmap in the next year, things change though. I'll be interested to see what you do here.
 

obrith

Limp Gawd
Joined
Jun 11, 2004
Messages
267
We're running a production CEPH + OpenNebula (KVM) cluster - nothing mission critical yet though. All SuperMicro, compute and storage are separate machines.

We're very happy with CEPH. We've made a couple pretty silly mistakes as we learned how to manage it, we've abused the crap out of it as we shifted nodes to different DC's (same campus), and we changed our drive layouts in the servers and 'live' upgraded them a few disks at a time. We have not lost any data and CEPH recovered gracefully, and much quicker than our ZFS based SANs do in similar situations.

The SSD write caching works well. We're eagerly awaiting good (safe) read caching. Our tests show that, somehow, a two tier SSD/spinner setup is slower than an SSD write cached spinner pool by a good margin.

Sequential speeds are impressive and a moderate number of systems putting load on it doesn't seem to stress it at all. We haven't been able to get around to doing thorough high IO random tests from a large number of clients - this is imperative to know it's limits. Individual clients see faster disk benchmarks than a high-end Nexenta+VMware setup (a couple years old).

As far as OpenNebula - I don't hate it, but if I was doing it again I would have given Openstack a closer look. Probably my biggest complaint is pretty meh logging - KVM sometimes just turns off VMs if they're asking for more resources than are available on the machine. KVM nor OpenNebula log anything other than "oh, I turned that VM off for you" without explanation.
 
Top