$15k to spend - What would you do?

KuJaX · Sep 6, 2016

I've been given a budget to spend up to $15,000 (this isn't set in stone, could potentially go a little higher) on equipment as it relates to a production work environment for a business that is out of business during any amount of downtime.

Priorities go:

UPTIME/REDUNDANCY. Limiting the amount of downtime, even to the extent of almost non-existant is the primary goal.
AUTOMATION. Once fully configured will be able to automate backups/snapshots, etc.
LOW MAINTENANCE. It is understood that there will be some maintenance (generally software related) but that for the most part the equipment will hum and buzz on itself after the initial upfront time investment.
PERFORMANCE. This is the least important but for specs-wise there will be 20 employees constantly using services offered by the equipment.

Total used hard disk space is less than 200GB although would like to create daily/week/monthly backups that rotate on a quarterly basis. So i'm thinking total free space should be capped at around 20TB for backups.

I'm comfortable with VMware. I've also heard that Veeam is a great product to get some backup automation. I'm not sure what versions are needed.

Do we go with Dell/HP/Lenovo for the servers (I would imagine we need two identical ones with two power supplies)?

Do we go with NAS/iSCSI/SAN, etc for the actual production run AND/OR daily backups/snapshots?

Do we go with something proprietary such as Netgear/Dell or something like FreeNAS?

What is run is a Windows Server 2013 and three or four Ubuntu dishing out servces such as DNS, file sharing and internal wiki hosting. The file sharing VM and Windows Server 2013 services are the most critical.

What would YOU do with a $15,000 budget where uptime is the most important?

ChristianVirtual · Sep 6, 2016

Maybe moving to AWS ... Depend on what application/services need to run ...

cyclone3d · Sep 6, 2016

What equipment do you already have?

$15k is not a lot to work with if you need a complete setup.

Quartz-1 · Sep 6, 2016

DNS and AD are trivial to duplicate; neither should be virtualised. You can put the file services and Wiki on a pair of servers clustered for high availability. You'll need to buy three servers for the cluster, not two, so you have a set of spares. The clustered servers should not be on the same boxes as the DNS or AD servers. If you can suffer the bandwidth consumption, I'd go for offsite online backups, subject to security considerations.

bds1904 · Sep 6, 2016

Starting from scratch? You are going to need a lot more than $15k if you want redundant shared storage... Plus if you go VMware you are going to need the licences too, you will need licencing for 3-6 processors (3 hosts), vshere licencing, and most likely vsan licencing too if you want redundant storage with support.

Licencing alone will be close to $10k, for the VMware products only.

Could you build an entire setup for $15k? Sure, would it have 99.999% uptime & a support contract? Nope.

I've said it many times before and I'll say it again. Under no cercumstance should someone put their neck out at a business and build a custom setup. The day the first downtime happens will be your last day of employment.

To do it right you need $60k.

KuJaX · Sep 7, 2016

Offsite isn't an option.
DNS isn't anything mission critical.
AD doesn't exist

The equipment running right now is a single Dell tower server with RAID 1 running esxi free version. Does backups manually to a nas box but it is a manual process. We then have an offsite backup daily of the actual file server files themselves (not the VM's).

Maybe the most economical situation is having a few 100% identical machines to plug in if something happens with the main, but the restoring of the VM downtime is what I'm trying to solve. Maybe we setup a SAN configuration and then put all of the redundancy in place for the SAN?

bds1904 · Sep 7, 2016

You are WAY overthinking this entire setup if you don't have AD and DNS isn't mission critical.

#1 DNS - if you don't have AD and DNS isn't mission critical then you should be running it on a decent router. You aren't using it for anything a simple DNS resolver can't handle.

#2 File Sharing - if you aren't running AD then file sharing can be on anything running anything

I would be willing to bet a pair of Synology DS3615xs NAS units with 7x 6TB drives (6 disk raid6 and a spare) and a good ssd for cache would be more than enough. I say a pair because you could automate backups from one to the other easily. Add an access switch with a 10gb uplink and you'll be good.

When you were talking about backups initially you implied that it would all be on the same server. 100% bad idea there. It's a business, 7 days backup on server, 14 days on separate server, 28 days off-site. Simple formula, and that's the minimum. Either way you need 2 backup targets on-site that are separate servers. It's okay if the first one is located on the main file server.

You could also setup the synology in HA

Honestly, without an exact layout of exactly what you are using ESXi (and the associated VM's) for we can't give you an 100% accurate recommendation.

Edit: remember that the 10gb synology units handle iscsi and nfs just fine too. I've used them with ESXi before. They will also snapshot iscsi lun's, another layer of backup if you went that route.

bigdogchris · Sep 7, 2016

$15,000 is not going to be enough for what you need to do if you want to eliminate a single point of failure.

You could look at VMware High Availability which will keep a VM ready to take over if one goes down but that requires licensing, multiple servers, shared storage, etc.

_Gea · Sep 7, 2016

Asuming you understand the problem of missing support and SLA and that a professional HA or failover solution is much more expensive, you can

- use 2 VM server for redundancy, each up from 2000$=4000$
ex 2 x HP/Dell/Lenovo/SuperMicro systems with at least 32GB RAM and a Xeon
- use ESXi free

- use 2 shared storage server for redundancy with NFS, each up from 2000$=4000$
ex 2 x HP/Dell/Lenovo/SuperMicro systems with at least 32GB RAM, CPU is not as critical
- use a ZFS storage appliance like FreeNAS or my napp-it based of an OpenSource Solaris fork

- prefer 4 identical systems beside CPU ex with a 16 or 24 x 2,5" or 3,5" backplane (and adapters for 2,5" SSDs) ex professional SuperMicro systems with 10G and LSI HBA onboard as a custom build to order option from a qualified Supermicro system house.

add a high performance VM storage pool (2000$ each=4000$)
- use SSD only storagepools each, ex Samsung PM/SM 863 or Intel, can be a raid-z from 960GB or mirror of 2 TB SSDs

add storage for backup, 16-20TB usable=1000$ each=2000$
- add disks to both storage appliances ex HGST 4-10TB disks in a raid-z2 or mirror
use both storage systems for alternate backup (async replication or sync network mirror)

- use 10G network (prefer mainboards with 10G onboard)
and a 10G switch 750 $,( for redundant cabling 1500$)

= around 15000$
as an option, you can use 2 bigger systems where you virtualize the SAN part
plus a dedicated server for backup
http://napp-it.de/doc/downloads/napp-in-one.pdf

some build examples
http://www.napp-it.org/doc/downloads/napp-it_build_examples.pdf

This does not inclue a HA or failover solution. You should be careful about as HA can complicate a setup enormously. Often its better to allow 30min downtime wirh an easy manual restart of services on a backup or failover-system.

For napp-it and ZFS I am working on a simple failover cluster solution with a first preview available. At least the "realtime sync backup" and the manual or auto "storage and NFS/SMB service failover" is working. Beside such a realtime sync solution you can use async replication on ZFS what allows that your storage appliances are in sync even with open files with a delay down to a few minutes.

With ZFS you have enterprise class high performance storage with best security and unlimited snapshot capability.

SolarBeam · Sep 8, 2016

Try looking at StarWind HyperConverged Appliance, that solution should 100% meet your needs.

Storage redundancy is provided by their in-house made vSAN that does HA storage and shares it via iSCSI. VM redundancy is handled via either Hyper-V or VMware cluster, they can work with both hypervisors.

Automation is a part of their solution since the system comes to your side preconfigured and they use Veeam (in bundle) as a backup solution which as we all know has great backup job schedule capabilities, you can even create a certain recovery scenario and by pressing a single button you will have your VMs recovered and started up in a certain sequence.

From the maintenance/support standpoint, they have a single point of contact for any issue or question you might have.

Performance, of course, depends on the drives, CPU and even RAM if you are planning on configuring any caching. Usually they accept up to 10% performance drop comparing to local storage.

As for the price, I heard you can get a node for $10K, with your requirements you will need two of those.

Outlaw85 · Sep 8, 2016

For the above. IF that were a real possibility, look at Cisco's StorMagic. If for nothing else to see if you can get better pricing on them. I would also mention Nutanix but they are super expensive.

Maybe it's just me, but it sounds like you have a very small virtualized environment that shouldn't require a crazy setup to support. I do agree that licensing may be of concern though.

A mid-grade 2 node system (for HA) with a SAN attached would probably be enough for your daily functions. Then it would be however you want to do your backups.

I know a place that used 2x HP blades with attached SAN. These provided the same service as you described in the OP and needed 100% uptime or as close as possible. They supported 5 VMs of various sizes and duties. The key is to make sure that each blade/node is capable of running all your VMs as expected, then add at least 1 for HA (n+1). A 3node would be ideal but can be added later if funds become available.

Unfortunately, I don't know what this one would cost, but StorMagic also seems like a good candidate (and replaced the above 2xHP blade/san above) if within budget and sounds like they are near identical to the above StarWind. SvSAN with Cisco. Hyper-converged Architecture - StorMagic (middle option- UCS C-Series + StorMagic SvSAN).

Can you give generalized info?
Number of VMs
Size of VMs (or avg size at least)
Any reservations (cpu/mem) or special requirements
Spec of dell tower (host)

Since you are currently running a single host, esxi free is perfect. Once you go to a 2+ host, you'll want to look at licensing for vmware or an alternative to manage them together and allow HA/vmotion to work.

Hope this helps at least give some other ideas,
Outlaw

nk215 · Sep 9, 2016

I think we were over thinking the solution.

Current setup:

1 ESXi host, 1 NAS

To minimize run time, I would just double what the current setup is.

+ 2 Synology NAS units in HA setup. With the current budget, I would just use SSD in raid 6 or 1.

+ 2 ESXi hosts. There's no real local storage in these hosts. Everything is stored in the NAS including the VM. Get the host with redundant power supply. They all should have IPMI.

+ 10G switch

+ UPS (APC units with external banks).

+ Backup plan (run cobian backup on a Win VM, backup to another directory on the same NAS, sync this directory with a remote NAS. Cobian keeps the time stamped etc).

I wouldn't bother setup the ESXi hosts in fail-over or HA mode so save license fee.

Since the VMs are stored in the NAS, if ESXi host1 goes down, just mount the NAS iSCSI to the ESXi host2, import the VMs and there you go. Very little down time.

The whole process takes a few commands. It's best to practice first and put all the commands into a script. It's very important to have fail-safe in the switch. If you lose connection between the NAS and the ESXi, you pretty much have to reboot the host.

The chance of a ESXi host goes down for reasons beside power supply failure is extremely low. Motherboard, CPU and RAM don't go bad often. Your second host may not be used for many years. Keep the case clean, change dust filters often etc.

Zero down time is very expensive to setup in both hardware and software licenses. The above setup in the worst case gets you a few minute downtime.

Olga-SAN · Sep 9, 2016

nutanix has kvm based community edition which is 100% free

starwinds have free version as well - no capacity limitations, production use is ok

stormagic is performance hog

50k iops per node all-flash is ridiculous in 2016 ,my macbook literally runs circles around their "storage" !!

i'm not sure why cisco didn't kick stormagic out of their channel like they did with nutanix some time ago

because cisco is in bed with springpath sds

hp vse is anotherr very solid option imho , if your mgmt allows time bombed software in production of course

they have 1tb production allowed free version as well

Outlaw85 said:
For the above. IF that were a real possibility, look at Cisco's StorMagic. If for nothing else to see if you can get better pricing on them. I would also mention Nutanix but they are super expensive.

Maybe it's just me, but it sounds like you have a very small virtualized environment that shouldn't require a crazy setup to support. I do agree that licensing may be of concern though.

A mid-grade 2 node system (for HA) with a SAN attached would probably be enough for your daily functions. Then it would be however you want to do your backups.

I know a place that used 2x HP blades with attached SAN. These provided the same service as you described in the OP and needed 100% uptime or as close as possible. They supported 5 VMs of various sizes and duties. The key is to make sure that each blade/node is capable of running all your VMs as expected, then add at least 1 for HA (n+1). A 3node would be ideal but can be added later if funds become available.

Unfortunately, I don't know what this one would cost, but StorMagic also seems like a good candidate (and replaced the above 2xHP blade/san above) if within budget and sounds like they are near identical to the above StarWind. SvSAN with Cisco. Hyper-converged Architecture - StorMagic (middle option- UCS C-Series + StorMagic SvSAN).

Can you give generalized info?
Number of VMs
Size of VMs (or avg size at least)
Any reservations (cpu/mem) or special requirements
Spec of dell tower (host)

Since you are currently running a single host, esxi free is perfect. Once you go to a 2+ host, you'll want to look at licensing for vmware or an alternative to manage them together and allow HA/vmotion to work.

Hope this helps at least give some other ideas,
Outlaw

KuJaX · Sep 11, 2016

Thank you all for the replies. It is nice to see so diverse options presented.

The current single Dell server runs ESXi free version with 4 VM's none of which take up significant memory or cpu usage. There are only ever 10 people or so using any given service at a time with a maximum of 20 which would almost never happen.
1. Windows Server 2013 for an server application that only runs on Windows that some employees have a client software to access. The main backup here is a combination of files and MS SQL database.
2. File server running Samba. Again, don't use AD, so as others have mentioned this can be ran on basically anything. The VM itself is a very small foot print and then it mounts a 200gb for the actual files themselves that employees use, add to, etc.
3. Simple web server for internal tools/wiki/etc and Simple internal messaging server in same VM.
4. DNS bind9 for only a handful of internal addresses and is absolutely not mission critical.

As previously mentioned, the automation of backing up the file server FILES THEMSELVES (not the vmdk single file) is already being done to a local NAS box and then off-site NAS box. ghetto vcb (google it if you aren't familiar) is what is being done MANUALLY every few weeks to backup the actual vmdk's themselves.

Let's not go too overboard when I said that the number 1 goal is uptime. It absolutely is, but even 30 minutes down time isn't going to cause significant loss in productivity ("okay people, take an early lunch or work on something else"). It would be most ideal to have a good value between cost, simplicity and uptime. I don't want to stress simplicity. Having to call "experts" from some proprietary black boxes doesn't sound like fun if a hardware failure takes place and the business operations is solely dependent upon this one vendor and phone call. 5 minute hold time, 50 minutes, etc? Big variable.

In other words, $15,000 for 10 minutes of down time is acceptable. $35,000 for 5 minutes isn't worth it. $80,000 for 5 seconds down time isn't worth it. There can be SOME down time in a hardware failure/corruption event just not hours or days. Right now, if the single Dell server goes down there is a whimpy server with ESXi already loaded on it that can serve as a band aide and grab from the local stored NAS box but there would probably be in excess of 30 minutes down time with pulling the data/vm's off of the NAS box into the whimpy server hard disk and then spinning up the VMs.

Does the above information help? $15,000 was a ball park figure that I figured this could be accomplished with licensing. I am certain that they can spend a lot more, but again, it then becomes more of "if spending twice as much [$30k] what difference in potential down time are we talking about?"

Rison · Sep 12, 2016

For anyone who is recommending HA for Synology, have you actually set this up (and have it work correctly), because it's a hot mess.
Synology support is lacking in it's advanced features. I love the little boxes for the SMB market, but trying to do something like the SHA thing, falls flat.
Specifically, if you get two identical units with the same drives (i'm talking, the same revisions, brand new) same BIOS and connect them via heartbeat AND it actually shows as 'Healthy' than count yourself lucky to begin with. Buy a damn lottery ticket even. We tried everything to keep this setup stable, but things went offline if you looked at if funny. We replaced heartbeat switches, tried direct xover cabling but data pools kept messing up / dropping / not updating regularly.
When we did have it running (for about a week) with ESXi using them as hosts, the latency to the primary unit wasn't cool at all. I've setup QNAP / Synology NAS's as vhdx stores in ESXi and HyperV lots of times, but our implementation of SHA did not work well.
After a few RMA's and lengthy conversations with the "top level tech support" over email and phone, we scrapped the idea and went with a single unit data source and used the second one as a cold backup.

If I were in your shoes with $15k, i'd probably just get a new dell/HP server for the virtualization, ESXi if that's your thing. Over provision on the hard drive for future expansion (maybe 4x 600G sas 10k RAID5, 1.2TB volume - i'm not sure what your total data needs are) I'd put Symantec Backup Exec (or whatever your flavour is) on the Windows OS and plugins for the other VM's to backup the files nightly in incremental to a backup server. Full Backup Exec backup on Friday nights. You then have granular recovery for everything.
Use the old server as your backup server. Have a data pool for Backup exec to write to (buy 8TB sata drive?), and throw Veeam on it. It's backup, you don't need fast drives as you're limited to 120MB/s network, and it's a backup so large SATA drives are fine. Mirror the drives if you want, or make a RAID 5.. perhaps 3x 6TB WD Red Pro's. Then the backup server takes an export image every 2 weeks, month, whatever you want manually for the main server. Other than checking logs that should be your only maintenance issue. Veeam Free can't be scheduled, last time I checked.
If you got a bunch of extra money left over when you're done, buy 10G SFP+ cards for the servers and a switch, will cost ya $1000 US new (give or take) and throw in two 10G Ethernet ports to connect your server switch to the main network switches.

Throw a Quantum RDX 1TB system in the backup server for offsite storage and Bob's your uncle. (or just run whatever you're running now..) Run robocopy to copy the latest backup data to the external drive. Use scheduler for this task. My guess is that 20TB is absolutely nuts for 200G of backups.. I have about 300G on my network and use a 4TB data pool for backups (12 weeks daily incremental / weekly full Backup Exec, 12x monthly Veeam backups) and I'm only at 2TB. Deduplication is fun.
Someone deleted something on the server? 2 minute recovery with Backup Exec. New ESXi server catch fire? Setup a temp workstation and import VMs from backup server in 30 mins (I keep a spare system already configured with ESXi on a shelf) and run it until a new server shows up. Import the VM's, then recover files via Backup Exec from the latest backup.
I like Dell and HP onsite support, totally worth it for 4-12 hour parts replacement.

There are some purists who will tar and feather me for saying to use local storage on a VM environment, but your network isn't large enough to have that many network devices running and there's really no point imo. I purely use Synology / Qnap boxes as backup devices now, i've run into so many issues trying to use them as vhdx repositories and latency bites you in the ass. We had a customer who tried to use them as their main file server, only to realize no database (access or otherwise) should ever run on the devices - the windows oplocks won't work correctly.

Mackintire · Sep 12, 2016

For the price I would consider placing some of those services in the cloud and operate them from there. Build a tiny redundant HA pair for your local services and fork over most of the cost into the cloud costs. AWS or a private cloud would be the way to go. Make sure you have 100mbit of bandwidth to those services and connect to them using something bulletproof like a ASA 5512X.

I talked our company out of doing a cloud deploy last year ONLY because we had over $100k of servers sitting on hand and don't have to pay for electricity. Otherwise...cloud deployments are usually safer, ,more flexible and makes for a smarter long term strategy.

nk215 · Sep 14, 2016

QUOTE="Rison, post: 1042531897, member: 233900"]We replaced heartbeat switches, tried direct xover cabling but data pools kept messing up / dropping / not updating regularly.
[/QUOTE]

That' maybe your issue. You can't use the switch between the 2 heart beat LAN ports (they must be the same LAN port on both units, they must be on an un-used subnet). You are not suppose to use xover cable between them, Regular good network cable is what's called for. You can't use hybrid RAID on the 2 NAS, must be the old fashion RAID 1,5,6 etc.

I didn't have any issue when I had my setup.

ChRoNo16 · Sep 15, 2016

Hookers and blow.. lots of both.

DermicSavage · Sep 19, 2016

I'm a little late to this thread, but this sounds like a fairly small deployment. If you have a small footprint on your servers, it may be of benefit to go with a cluster in a box type solution. I sell these to quite a lot of SMB type customers, and they work great for people who don't need a large cluster of servers. I would say it can scale to support 5-25 VMs based on size and need.

Supermicro | Products | SuperServers | 2U | 2028R-DE2CR24L (2U version)
Supermicro | Products | SuperServers | 3U | 6038R-DE2CR16L (3U version)

These are pretty impressive hardware to work with. In a single 2U package, it contains 24 shared hard drives (or 16 in 3U form) and two server nodes. We design them to run Windows 2012R2 and utilize clustered storage spaces plus Hyper-V to provide a complete solution in a single box. This provides RAID redundant data plus a high availability Hyper-V cluster in a small package and scale. It comes a little bit cheaper than individual servers and a SAN because it is in a single package. The biggest boon to be had here is the fact that you don't need any major networking hardware to support it on the backend, just the network connections for the hosts and guest VMs to access to company network.

We can usually price out the hardware and licenses for about $10K. I like to recommend these for customers who need an on premise solution, and also leverage having Veeam onboard as well to run backups and replicate off site.
You do get some harsh diminishing returns on scalability here, but a configuration with 128GB RAM and 24 CPU per node is pretty powerful for most offices and can come at a reasonable price. Most clusters we sell are 12 CPU per node and 64GB RAM.

As a note, this system plays on some pretty advanced Windows services, namely clustered storage spaces. You really need to be comfortable with powershell and the storage spaces cmdlets to really administer and provision the storage properly to ensure ideal performance and redundancy.

Let me know if you want more info and I can fill you in more

KuJaX · Sep 23, 2016

With such a small environment why not simply NOT VIRTUALIZE and then just have like 2 to 3 spares of identical systems ready to boot up if need be?

So junk the one dell server (or use it for some type of backup system) and then get one machine per VM plus 2 or 3 spare machines. Then have a NAS attached like FreeNAS that rsync's these Ubuntu boxes config files and whatever Windows backup software of choice to go the NAS as well? 5 minutes of down time, even 30 minutes of down time isn't a big deal. We just can't have hours or days.

mwarps · Sep 25, 2016

Your requirements just aren't going to spend that much money. You can make it, but .. why? It's not enough for serious hardware with service contracts..

Build a 20TB storage server (most of your cost is in drives) with some redundant nics and a layer 3 switch or two.

If your services are scalable, you can loadbalance with HAProxy for free.

Get a few off-lease Poweredge servers and load them up with hardware. Cheap and easy.

LightningCrash · Sep 25, 2016

Mackintire said:
For the price I would consider placing some of those services in the cloud and operate them from there. Build a tiny redundant HA pair for your local services and fork over most of the cost into the cloud costs. AWS or a private cloud would be the way to go. Make sure you have 100mbit of bandwidth to those services and connect to them using something bulletproof like a ASA 5512X.

Yes.

Heck, AWS Route53 could replace his BIND VM for pennies, and everything else sounds like it could run in t2-sized instances.

DermicSavage · Sep 26, 2016

KuJaX said:
With such a small environment why not simply NOT VIRTUALIZE and then just have like 2 to 3 spares of identical systems ready to boot up if need be?

So junk the one dell server (or use it for some type of backup system) and then get one machine per VM plus 2 or 3 spare machines. Then have a NAS attached like FreeNAS that rsync's these Ubuntu boxes config files and whatever Windows backup software of choice to go the NAS as well? 5 minutes of down time, even 30 minutes of down time isn't a big deal. We just can't have hours or days.

Yes, virtualizing is the way to go for any system hands down. Still, you would be wasting your money on a system that didn't provide clustering and HA capabilities. If you have several disparate systems, you will spend a ton of time ensuring you have copies of them accessible by at least one other host to ensure that you can recover in the event of a failure. With HA and shared storage, this is all automatic and is done in minutes. If you build it right, it will do the work for you so you can focus on other tasks.

As a note, with the cluster in a box configuration, you can RAID the disks together and have the physical servers act as a file server cluster as well for NFS, iSCSI, and CIFS/SMB and throw dedupe on top of it without needing a massively powerful system to manage that feat. I've worked a lot with the ZFS NAS distributions, and I do think they are useful at a small scale, but ZFS really falls short when you're trying to use high end features at scales larger than 2TB.

$15k to spend - What would you do?

[H]F Junkie

[H]ard DCOTM x3

[H]F Junkie

Supreme [H]ardness

Gawd

[H]F Junkie

Gawd

Fully [H]

Supreme [H]ardness

n00b

[H]ard|Gawd

n00b

Limp Gawd

[H]F Junkie

n00b

2[H]4U

n00b

[H]ard|Gawd

[H]ard|Gawd

[H]F Junkie

Supreme [H]ardness

2[H]4U

[H]ard|Gawd