DIY Private Cloud

Beta4Me

Weaksauce
Joined
Nov 3, 2011
Messages
103
My company has been providing SaaS to clients for some time now with us receiving HaaS & on-demand storage from our data centre and then slapping on vSphere and the required software for the client etc. etc.

Now, we're looking at doing a JV deal with another partner that's been doing a similar sort of thing, but we want to build vCloud infrastructure and wholesale IaaS to other resellers (and to ourselves--each of our other companies).

Before we get into bringing in partners that are more experienced in this sort of hands-own actually build-your-own cloud CapEx type thing, I wanted to reach out to HardForum and get your ideas on what you actually need to do to build one.

I have a basically unlimited budget. So, for the hardware, up-front software, SPLA software & support contracts, I have up to $800K/year to play with.

I'm thinking we'd use TwinNode Blades in Blade Enclosures for Compute and HA Nexenta ZFS with the requisite number of drive shelves for Storage.

What do I need VMware-wise and any other software or hardware aside from the servers (obviously there'd be networking, but what)?
Any thoughts on an overall design plan?

We're just looking at working out what we need (from a high-level perspective) and then we can scale the nodes and drives to the capacity we need and work out some rough costs and a basic plan.

Thanks! :)
 
This is a pretty big question and there is a lot of different ways to do everything you described. If you don't know exactly what you want I'd suggest you find a company that has experience with it; not everything scales up nicely. I'm not a large scale expert but I have a small company and and even smaller budget, so take what I say with that grain of salt.

For back-end storage, I would take a look into HP P4000 SAN units. The HP product is relatively simple to administer, and has a lot of cool features, is very fast, easy to expand, supports storage vmotion and has some cool features [Network raid levels, block level replication, de-dupe]

I think you can have your pick of the litter with your front-end hardware, but if you're serious about scaling blade clusters seem to be the way to go, with infini-band back-ends.

I know only a little about ZFS and I think it would work okay if you have the expertise, but as you scale up you may run into issues, either with performance ceilings or available manpower for administration.

Keep in mind if you don't already have power and hvac infrastructure you will need to get it, or supplement what you have. If you're going on the scale that I imagine this can take a fairly big chunk of budget.

Finally, don't forget disaster recovery planning while building the system.
 
This is a pretty big question and there is a lot of different ways to do everything you described. If you don't know exactly what you want I'd suggest you find a company that has experience with it; not everything scales up nicely. I'm not a large scale expert but I have a small company and and even smaller budget, so take what I say with that grain of salt.

For back-end storage, I would take a look into HP P4000 SAN units. The HP product is relatively simple to administer, and has a lot of cool features, is very fast, easy to expand, supports storage vmotion and has some cool features [Network raid levels, block level replication, de-dupe]

I think you can have your pick of the litter with your front-end hardware, but if you're serious about scaling blade clusters seem to be the way to go, with infini-band back-ends.

I know only a little about ZFS and I think it would work okay if you have the expertise, but as you scale up you may run into issues, either with performance ceilings or available manpower for administration.

Keep in mind if you don't already have power and hvac infrastructure you will need to get it, or supplement what you have. If you're going on the scale that I imagine this can take a fairly big chunk of budget.

Finally, don't forget disaster recovery planning while building the system.

Yes, we'll be looking at bringing an expert in, but before I start dishing out $ to get advice, I need to have a basic plan I can cost to assess viability and present to our minority-share partner.

I'll put that SAN on the list to check out. I'm partial to ZFS, and I'm pretty sure I can get the redundancy and scalability that I want with Nexenta using HA & clustering (with expanded namespace).

One thing I should have said...we'll be co-locating all equipment in the same data centre as previously. IIRC, we can get a 42U rack with 10kW for AU$2800/mth.

40Gbps infiniband, take a good long look at it.
Already planning to do it :) See below...

These are the specs that I have been playing with:

Compute Resources
TwinNode
SuperMicro SBI-7226T-T2
The following per node (there are 2 per chassis):
2 x Intel Xeon E5645 Hex-Core 2.40GHz CPU
8 x 16GB RAM (128GB RAM)
2 x Intel 320 SSD 40GB
2 x SuperMicro Dual-Port QDR InfiniBand Card
Enclosure
SuperMicro SBE720E-R75
2 x SuperMicro QDR Infiniband 20i/16e Port Switch
1 x SuperMicro Mini-CMM (for Dual Infiniband Switches)

Storage Resources
Head Nodes (there would be 2)
Same specs as the above but a 2U Chassis
A few SAS HBA's
Disk Shelves
A mix of 2U 2.5" & 3U 3.5" chassis
A mix 7.2K SAS 3.5", 10K SAS 2.5", SSD 2.5" (to create Bronze, Silver & Gold tiers)
Other
Pair(s) of SAS switch(es) for dual paths to dual servers.

I'd be half-filling the enclosures (5 blades) to meet the resource demands of a coming contract, plus however many blades I need to run the vCloud & other VMware software.

The part that I'm not really sure about is how to setup vCloud...the number of servers to dedicate to it, what VMware products I actually need, a server/network design plan for it, its VMs, how many VMs, which products go where, blah blah etc etc...

Thanks.
 
No offense, but if you need to go asking in an Internet forum how to properly spend $800K, you're doing it wrong.

Hobby is hobby and business is business. You're basically asking to have your job done for you.
 
No offense, but if you need to go asking in an Internet forum how to properly spend $800K, you're doing it wrong.

Hobby is hobby and business is business. You're basically asking to have your job done for you.
Thanks for being a dick. This is the sort of reply I absolutely wanted when coming to HF for SOME INITIAL THOUGHTS BEFORE FOR PRELIM COSTING BEFORE I BROUGHT IN EXPERTS COSTING ME BIG $$$. From an individual curiousity, I'm interested in how to build a vCloud setup as much as I am from a business perspective. If you haven't got anything constructive to say, please don't bother.

SmartOS is a OpenSolaris derivative, including the must-have-cloud-thingy DTrace. It is built for Clouds.
http://wiki.smartos.org/display/DOC/Why+SmartOS+-+KVM,+DTrace,+Zones+and+More
Thanks, I'll look into it. Can you get support contracts for it?
 
I'd drop the SuperMicro blades and go with something a bit more..enterprise. I do a lot of Cisco UCS and they are fantastic. I'd also caution against Infiniband for your interconnect with vSphere. 10Gb Ethernet is far more standard these days and far fewer driver/config issues. Cisco UCS, some Nexus 5500s or so for 10Gb networking to the SAN. I wouldn't do HP for that..Nexenta isn't bad.

Before you start throwing down specs and parts you need to put down your requirements. What are you trying to achieve here? What are your constraints? What's your timeline? All this needs to be done BEFORE any thought goes in to gear.
 
Also, what about management? Security? Segregation? Provisioning? If you're going to do IaaS then these are very important. To be honest, $800K is FAR from an unlimited budget on something like this.
 
I'd drop the SuperMicro blades and go with something a bit more..enterprise. I do a lot of Cisco UCS and they are fantastic. I'd also caution against Infiniband for your interconnect with vSphere. 10Gb Ethernet is far more standard these days and far fewer driver/config issues. Cisco UCS, some Nexus 5500s or so for 10Gb networking to the SAN. I wouldn't do HP for that..Nexenta isn't bad.

Before you start throwing down specs and parts you need to put down your requirements. What are you trying to achieve here? What are your constraints? What's your timeline? All this needs to be done BEFORE any thought goes in to gear.
Thanks for that feedback. There's pros and cons to "it" (with it being 'enterprise hardware'). Really, with cloud stuff, economical commodity hardware is part of the pluses with redundancy, "lots of it" and SPLA'd software holding it all together to keep everything as affordable as possible. That said, I see your point...worth looking at the very final bottom line including support, provisions for failures etc. etc. before you see whether a more 'expensive' solution is actually more expensive.
Regarding the Infiniband, totally understand and agree with that, but there are some nice benefits with it including lower latency and higher bandwidth and some niceities from the way it encapsulates traffic vs 10GbE...it's something definitely to be aware of and fully evaluate.

Also, what about management? Security? Segregation? Provisioning? If you're going to do IaaS then these are very important. To be honest, $800K is FAR from an unlimited budget on something like this.
Yes, a lot to get down on paper out of our heads.
I should mention that my OP does say $800K/year which is a reflection of ~$1.9M financed over 3 years.
 
Any further more specific thoughts? Especially on vCloud as that's what I really want to know/learn about? Thanks :)
 
Thanks, I'll look into it. Can you get support contracts for it?
I dont know. But I think SmartOS is what you are looking for, it is designed for Clouds. Read this SmartOS deployment of Linux:
http://opusmagnus.wordpress.com/2012/02/14/discovering-smartos/

Background, KVM and SmartOS:
http://dtrace.org/blogs/bmc/2011/08/15/kvm-on-illumos/

Visualize the Cloud with DTrace
http://dtrace.org/blogs/brendan/2011/10/04/visualizing-the-cloud/

SmartOS is using KVM and OpenSolaris. It has also Zones for very secure light weight virtualization. In each Zone, you put one KVM virtual machine, for instance Linux. If a hacker compromises Linux, he breaks out into a Solaris Zone - and they are very safe. Then he needs to break into the global Solaris installation, which is additional safety layer. And you can use the unique DTrace to monitor every VM.

SmartOS is a new open source project. You should email them, and I suspect they are willing to provide support. Really nice guys.
 
Thanks for that info, I'll definitely look into it. That said, we do have experience with VMware and a partiality to it and the others of the big 3.
 
Does anyone have any experience with Eucalyptus, OpenStack, some of the other SmartOS applications/overalys? Especially in comparison to vCloud?
 
I stood up vCloud on 3x 8-node ESXi5 Clusters (using HP BL460 blades) and it's been working great for the past 4-5 months.

IMO the vCloud interface isn't sufficient for selling IaaS wholesale. You basically can't customize anything about look/feel/text aside from removing options/features with roles/permissions.

Be prepared to build your own frontend w/ the API.
 
I stood up vCloud on 3x 8-node ESXi5 Clusters (using HP BL460 blades) and it's been working great for the past 4-5 months.

IMO the vCloud interface isn't sufficient for selling IaaS wholesale. You basically can't customize anything about look/feel/text aside from removing options/features with roles/permissions.

Be prepared to build your own frontend w/ the API.

Thanks for that info :) It's a really relevant and on-topic response that is actually useful.
 
I knew I should have included a flame :)

PM me if you have any specific questions.

My general opinion on vCloud is that it's designed to just work, until it doesn't. When it has a problem, it's quite difficult to figure out what went wrong, since it's kind of like a black box that you can't see the inner workings of.

You're pretty much not supposed to touch the VMs without using VCD, and you get a popup any time you even open them up in vCenter.

The answer to a number of "how do you do XXXXXXX in vCloud?" is "You can't do that through the provided interface. You'll have to use the API."

That being said, *end users* really seem to like the interface and its stripped down nature, but sysadmins find it too limited.

Oh and if you need to make changes to the sysprep.inf, you have to do it on the VCD server and rerun the script to build the sysprep package.
If you want to change unattend.xml it's a similar process but there's no script so you'll have to repackage it yourself using their utility. PM me if you want more info on this cause it was a huge pain in the ass to figure out at the time, just to add skiprearm=1.
 
If you don't have a lot of experience and you have budget you may want to look at some of the "off-the-shelf" virtualization solutions like:

Vblock - VMWare - Cisco UCS and Nexus switching - EMC storage

or

Flexpod - VMWare - Cisco UCS and Nexus switching - NetApp storage

Most of them are designed to scale but let you start with a "small" foot print. They also are already tested together and tend to come with management software for the entire stack.

If you're trying to get an idea of the what's involved before spending money on the experts then reach out to the vendors directly or through a sales partner. If you're looking to spend reasonable amounts of money they'll happily come in and discuss their approach with you as well as do some up-front engineering to find the proper fit with their products. Go through that with a few vendors and you should start to get a pretty clear picture of what'll be involved, all without spending a dollar.
 
Thanks everyone for your replies.
We're exploring vBlock, VSPEX and FlexPod as well as full homebrew SuperMicro compute/storage setup and Cisco UCS compute + SuperMicro storage options through a VAR.
I'm currently leaning towards the last option and we should be able to get under the $1M for kick-off with multi-DC redundancy.
 
I was about to propose Vblock/Vspex (in fear of being called a d*ck ;)).

I think it's pretty doable as DIY, but do take accountability into account. If something goes horribly wrong, the amount of time you spend on troubleshooting might exceed the time it takes to shoot at the supplier. If the private cloud is truly private and internal, then divisions will shoot you for downtime, if you have a hybrid cloud and provision for customers, then customers will shoot at you.

In complex situations where storage, computing and networking converge, it sometimes comes down to driver compatibility and software versions that make or break a configuration. Companies like HP, EMC, Netapp have a huge library on compatibility and have multiple teams dedicated on keeping this up-to-date.

Training is an extra cost that you need to take into account. with a DIY box come self written knowledgebases, procedures and tutorials. Don't expect a new team member to be able to understand the big picture without training. Most of the time, training is a hidden cost.

I don't want to discourage you because a DIY private clouds sure sounds like a nice challenge, but don't loose sight of the "enterprise dance". There are reasons beyond hardware and software that justify the extra cost of lets say a 20TB VNX against a 20TB DIY box.
 
Back
Top