Virtualized storage server or bare metal? Pros/Cons?

brutalizer · Mar 27, 2012

There are some people running a virtualized Solaris derivative in ESXi. Apparently, it is very successfull and stable, and everything works fine. This is called a Type-1 virtualizer. A thin layer of hypervisor, starting up other OSes. You dont need an OS. Solaris and Windows are all run as clients, and virtualized.

Myself is running Solaris on bare metal, and then use VirtualBox for running virtualized Windows, playing games, MS Office, etc. Everything works fine 99% of the time. I can play Quake3, and older games in VirtualBox (it uses 3D source code from Wine). This is called Type-2 virtualizer. You install the virtualizer ontop an existing OS. Examples are VMware Workstation, VirtualPC, etc. Solaris is host OS, and Windows is client and virtualized.

So, what are the pros and cons of each solution? What are your experiences? Let us discuss this a bit?

1. I am not really comfortable running virtualized. To me, it introduces another uncertainty. What happens if the virtualizer is buggy or something? The less code, the better. I know Solaris is tested, but the combo hypervisor + Solaris is not as tested? I mean, there is a reason you always run bare metal in production, if you talk about big server systems. I believe Solaris on bare metal is more stable, than running virtualized Solaris?

2. How good is 3D performance in ESXi? Can I play games without any problems at all? In VirtualBox, the 3D performance lags, and only.... parts of Directx 9 (?) is supported I think. There is much left to do in VirtualBox.

3. Using VMware Workstation / VirtualBox is not stable at all. I had severe problems with this solution, at work. My guest would crash or hang, depending on the underlying OS. I used Windows 7 as underlying OS, and as we all know, Windows is not that stable. Bad idea. At work, I am now going for ESXi solution, it should be more stable than Workstation.

_Gea · Mar 27, 2012

1. Stability of virtualization depends on stability of virtualizer and regarding this
ESXi is more stable than any full featured OS, needs less resources for itself and
delivers better performance for guests than any application-level virtualizer on top of OS.
Only KVM is capable of similar performance.

From my experience, virtualized servers on ESXi are similar in stability than on real hardware,
mostly better because they are seeing only a base and always the same hardware.

Often you was told, you can virtualize everything but gaming and storage server,
this is not true with storage and pass-through. If you ask why its not used in production,
I would say, the installation is too big or special or its not wished because there is no interest to do,
not from Dell, HP, Cisco, Netapp side and not from Vmware. They want to sell expensive and big.

2. 3D performance is weak or not existent unless you pass-through a real graphic-adapter to a guest.
Not yet really stable, but next on agenda

3. For stability I would use Esxi, Xen or KVM

ESXi and Xen are type 1 virtualizer on hardware.
KVM is virtualizing on kernel-level. If your OS is stable, this is an option
With SmartOS and OpenIndiana you can get even ZFS and KVM.

Type 2 virtualizer as an application on top of an OS is fine if you mainly
need the base OS and its features and guests for testings or for smaller tasks

Regarding a virtualized storage server/ All-In-One i see the following

Advantages
- Less hardware that can fail, less power consumption, less costs
- Highspeed connect (up to 10 GB) between SAN and ESXi guests in software
(less costs, less complexity, less problems with cabling)
- High priced and complex SAN networking not needed for guest performance
- SAN is not a single point of failiure for multiple ESXi (each one has its own SAN)

Disadvantages
- CPU and RAM must be divided between them, not for real high performance configs
- On larger installations, more complex than a HA SAN solution
- More complex on updates (ESXi and SAN are always down together)
- No ESXi snaps for Storage OS (not a problem with ZFS)
- fixed RAM assignment for storage OS (due to pass-through)

Equal
- most other problems are equal compared to a dedicated ESXi + dedicated SAN solution
- if you need HA, this is independent

Others
- with a All-In-One and ZFS you can do only Cold-snaps (best after shutdown or halt).
If you need hot snaps, you must care extra (example do ESXi hot snaps and then ZFS snaps)

brutalizer · Mar 28, 2012

_Gea said:
ESXi is more stable than any full featured OS,

Agreed. But, an OS ontop ESXi might be unstable. If you only run ESXi with no OS, it is stable. But no one runs only ESXi, they also run an OS ontop.

ESXi: stable
OS: stable

ESXi + OS = Unstable?

_Gea · Mar 28, 2012

brutalizer said:
Agreed. But, an OS ontop ESXi might be unstable. If you only run ESXi with no OS, it is stable. But no one runs only ESXi, they also run an OS ontop.

ESXi: stable
OS: stable

ESXi + OS = Unstable?

ESXi emulates a common standard-pc with quite old but well tested emulated hardware.
If the emulation is stable, there is no reason for the os above to become unstable under this condition.

Mostly its more stable because the emulated hardware and the needed drivers are known to be stable.
If you use any real hardware, the possibilty of some needed drivers for newer hardware that is not as stable is more likely.

Problems occur mostly if you want to speed up this base hardware emulation with optimized and newer
drivers or if an ESXi drivers itself is not stable enough (The reason why ESXi does not run on every hardware)

McTurkey · Mar 28, 2012

A virtualized OS is going to be as stable as non-virtualized. If it weren't, then you wouldn't have such a large majority of businesses moving to virtualization for mission-critical servers.

If anything, it adds stability, since if you have a troublesome piece of software, you can stick it on a VM and not have to worry about it bringing down the whole system.

brutalizer · Mar 29, 2012

McTurkey said:
A virtualized OS is going to be as stable as non-virtualized. If it weren't, then you wouldn't have such a large majority of businesses moving to virtualization for mission-critical servers.

Yes, but for large server systems, where stability is a must (such as NASDAQ stock exchange) they always run bare metal. Maybe it could be because such large servers need maximum performance, and virtualization adds a performance penalty? Maybe it is just a performance question, not stability?

_Gea said:
Mostly its more stable because the emulated hardware and the needed drivers are known to be stable.
If you use any real hardware, the possibilty of some needed drivers for newer hardware that is not as stable is more likely.

Problems occur mostly if you want to speed up this base hardware emulation with optimized and newer
drivers or if an ESXi drivers itself is not stable enough (The reason why ESXi does not run on every hardware)

So drivers could be a stability problem. Running bare metal needs a driver for each different hardware vendor. But, as you point out, servers in production are only run on some specific hardware - every hardware configuration is not supported. Thus, the drivers are as tested as ESXi drivers.

This is really strange. How can running more software, be more stable than running less software? How can running hypervisor + OS be more stable than only OS? Something does not add up. Why you program, you want to keep it simple, and less source code. The more code, the larger the attack vectors for hackers and more possibilities of bugs.

Much code => more bugs
Less code => less bugs.

Dangman · Mar 29, 2012

brutalizer said:
Yes, but for large server systems, where stability is a must (such as NASDAQ stock exchange) they always run bare metal. Maybe it could be because such large servers need maximum performance, and virtualization adds a performance penalty? Maybe it is just a performance question, not stability?

Could be that. Or it could be due to the disadvantages that Gea pointed out. And it could also be due to whatever software or device they're using aren't allowed to run or capable of running in a virtualized environment. Or it could be very well be that their IT guys aren't interested or see a need to virtualize as Gea pointed out.

brutalizer said:
This is really strange. How can running more software, be more stable than running less software? How can running hypervisor + OS be more stable than only OS? Something does not add up.

You should re-read what Gea wrote again as it answers your question:

ESXi emulates a common standard-pc with quite old but well tested emulated hardware.
If the emulation is stable, there is no reason for the os above to become unstable under this condition.

Mostly its more stable because the emulated hardware and the needed drivers are known to be stable.
If you use any real hardware, the possibilty of some needed drivers for newer hardware that is not as stable is more likely.

Problems occur mostly if you want to speed up this base hardware emulation with optimized and newer
drivers or if an ESXi drivers itself is not stable enough (The reason why ESXi does not run on every hardware)

Generally you try to use the hypervisor on supported hardware. That increases your chances of stability. It's akin to the Mac approach: Mac OSX only works on a limited set of hardware. As such, they don't have to deal with as many driver issues that Windows and even Linux PCs do trying to support a wide variety of configuration and hardware. So that's similar to ESXi: It's guaranteed to work on a certain selection of hardware and hardware configurations. Which means less driver or bug issues. And then it presents virtual hardware that's, again, well supported by ESXi.

brutalizer · Mar 29, 2012

Danny Bui said:
It's akin to the Mac approach: Mac OSX only works on a limited set of hardware. As such, they don't have to deal with as many driver issues that Windows and even Linux PCs do trying to support a wide variety of configuration and hardware. So that's similar to ESXi: It's guaranteed to work on a certain selection of hardware and hardware configurations. Which means less driver or bug issues. And then it presents virtual hardware that's, again, well supported by ESXi.

For instance, Solaris 11 is only supported on certified Oracle/Sun hardware, just to rule out weird hardware and driver problems. Just as ESXi does.

I mean, the more code you have, the more bugs. And ESXi introduces another potential source of bugs. To a developer, something does not add up: how can more code be more stable than less code? That is just weird. Assume there are bugs in ESXi, now we have bugs in ESXi and in the OS. Also, maybe ESXi does some funny things which the OS does not support.

I suspect that ESXi is only certified on some OSes, just because ESXi team has tested that it works well with the OS. For other OSes it has not been tested. Maybe the tests were not complete, maybe there are some bugs? Every software has bugs, that is a fact. The less code, the fewer bugs.

Aesma · Mar 29, 2012

The thing is, most companies have similar needs, similar applications, etc. And of course don't run games or beta software on production machines. I mean, from the start you're talking about your PC, probably not server grade, being Solaris based + Vbox + quake3, that is way outside what Solaris, Vmware or any other virtualization tech is meant to do.

_Gea · Mar 29, 2012

If you try a mix of any Dell, HP or other servers with various disk and graphics from 5 years old to newest models and install any server OS with some server applications on them, you must expect more troubles due to software problems compared to a ESXi based solution that offers always the same and minimal base hardware.

If your servers dies some years later und you have it virtualized, you can start in on any other system.
On bare metal you need to reinstall completely because its not working on the spare machine even if you have the disks.
Think also about snaps and high availabilty functions.

All together I would say, providing stable IT services without virtualization is a nogo. Virtualized server is current state of technology.

patrickdk · Mar 29, 2012

Nasdaq isn't a fair comparison. The stock markets are always striving for lower latency transactions, and virtualization adds latency.

Tiporaro · Mar 29, 2012

Just a couple thoughts from my very limited knowledge.

First, why people sometimes opt to run bare-metal could be due to one of any of the reasons listed above, as well as licensing concerns, or even the fact that the application demands enough resources that it requires the complete resources of a given server(s), so there's not a real advantage to virtualizing.

As to stability and running more code vs less code, I think it often comes down to the ability to isolate each instance of code that can be useful. If you're comparing running several different services and applications under one single OS as opposed to multiple vm's each serving individualized purposes, consider what often introduces instabilities and issues in a standard install. Its not the base install itself of a given OS, but usually the combination of several third party or concurrent services running side by side that generates a conflict.

By separating out these processes into isolated environments where they have no direct OS-level interaction with each other, presumably you would have less issues crop up. Further, if you do have a problematic application/service, whether that be from an update or otherwise, the vm isolation now means that only those services on that one vm are compromised, and the rest of your setup continues to function. Contrast that with installing everything under a baremetal OS where a single crash can bring down all processes.

Does running ESXi introduce another layer - yes. And so conceptually it is another point of failure. But arguably it is very minimalistic and well supported, such that using it as the basis for several vms has some very real advantages.

Child of Wonder · Mar 29, 2012

If you plan to ever run more than 1 host, running a separate physical box for the ZFS SAN is a no brainer.

nOrVow · Mar 29, 2012

Since KVM runs off a kernel within an OS, it'd be type 2 right? How does this offer similar performance to type-1 hyper-visors like ESXi?

FreakinAye · Mar 29, 2012

A few thoughts, not responding to anyone in particular

1) Virtualization has *very* little overhead nowadays, and a virtualized OS can get close to 99% of the performance of bare metal on it.

2) ESX is considered rock solid in terms of stablity. In addition it supports a number of HA features and clustering to ensure uptime of VMs. It is used for major enterprise applications in probably 95% of Fortune 500 companies. Running a VM on ESXi is considered more stable than bare metal IMO.

3) If you have a VM, you know exactly what hardware the OS thinks it has, since it's all virtual and based on the ESXi version. If you were to upgrade your server in the future and switch to *completely* different physical hardware, your VM would not care--it doesn't have to worry about drivers or anything else. You just modify the number of vCPUs and RAM and it's good to go. You can even migrate it over to a new ESXi server if you need to do maintenance.

stevebaynet · Mar 29, 2012

my .02,

For home, i think virtualized all in one is fine. for work, i will use bare metal install for storage. why? for me, virtualized servers is all about being able to switch hosts when hardware goes awry. This relies on shared storage, but the storage box IS the shared storage. If the host that houses it dies, you can't just bring it up on another host, you would need to move the hardware like HBA's, disks and other storage dependent components with it. (unless we are talking a complex setup with SAN switches, etc etc) This is doable @ home, but @ work i would either just fail over to another SAN node or if its really a disaster, remount ESXi hosts to the last hourly snap located on the DR SAN.

_Gea · Mar 29, 2012

nOrVow said:
Since KVM runs off a kernel within an OS, it'd be type 2 right? How does this offer similar performance to type-1 hyper-visors like ESXi?

A type 2 virtualizer runs on top of OS at application level (same level like other applications like your Browser or Texteditor). KVM virtualizes at kernel level.

Think of KVM like you would add functionality of a full featured OS into ESXi.
Stability depends on the used OS and its stability and of the quality of separation between kernel, base core OS functionality and VM's.
With Illumian/ OpenIndiana and KVM you can use zones to isolate the core OS from applications or from guests

FreakinAye · Mar 29, 2012

stevebaynet said:
my .02,

For home, i think virtualized all in one is fine. for work, i will use bare metal install for storage. why? for me, virtualized servers is all about being able to switch hosts when hardware goes awry. This relies on shared storage, but the storage box IS the shared storage. If the host that houses it dies, you can't just bring it up on another host, you would need to move the hardware like HBA's, disks and other storage dependent components with it. (unless we are talking a complex setup with SAN switches, etc etc) This is doable @ home, but @ work i would either just fail over to another SAN node or if its really a disaster, remount ESXi hosts to the last hourly snap located on the DR SAN.

yeah I agree with this. You have to assume your SAN is always going to be there. The way to protect against failure is to have a replicated DR SAN.

The All in one is awesome for a single server for home use, but shared storage and clustered hosts is definitely better for real use cases.

FreakinAye · Mar 29, 2012

_Gea said:
A type 2 virtualizer runs on top of OS at application level (same level like other applications like your Browser or Texteditor). KVM virtualizes at kernel level.

Think of KVM like you would add functionality of a full featured OS into ESXi.
Stability depends on the used OS and its stability and of the quality of separation between kernel, base core OS functionality and VM's.
With Illumian/ OpenIndiana and KVM you can use zones to isolate the core OS from applications or from guests

is a KVM like an LPAR? I'm not familiar with much *nix administration out of webservers and personal stuff

nOrVow · Mar 29, 2012

_Gea said:
A type 2 virtualizer runs on top of OS at application level (same level like other applications like your Browser or Texteditor). KVM virtualizes at kernel level.

Think of KVM like you would add functionality of a full featured OS into ESXi.
Stability depends on the used OS and its stability and of the quality of separation between kernel, base core OS functionality and VM's.
With Illumian/ OpenIndiana and KVM you can use zones to isolate the core OS from applications or from guests

Ah, I see. That's pretty awesome. Thanks for the clarification.

dexvx · Mar 29, 2012

_Gea said:
3. For stability I would use Esxi, Xen or KVM

ESXi and Xen are type 1 virtualizer on hardware.
KVM is virtualizing on kernel-level. If your OS is stable, this is an option
With SmartOS and OpenIndiana you can get even ZFS and KVM.

Type 2 virtualizer as an application on top of an OS is fine if you mainly
need the base OS and its features and guests for testings or for smaller tasks

A more correct term is type 1 Hypervisor. The three major type 1 hypervisors: Xen, VM-Ware, and Hyper-V. However, VM-Ware has a rather unique architecture (unique as in different; it has its pros and cons). Hyper-V and Xen are both similar in their architecture.

_Gea said:
Advantages
- Highspeed connect (up to 10 GB) between SAN and ESXi guests in software
(less costs, less complexity, less problems with cabling)

AFAIK, the emulated path in VM-Ware (even though it displays 10 GbE) is just dependent on CPU power, similar to Hyper-V. I have not run any throughput tests on ESX5 recently, but for Windows 8 beta, you can pull around 35 Gbps bi-directional going guest to guest, with a Xeon X5690.

_Gea said:
Disadvantages
- CPU and RAM must be divided between them, not for real high performance configs
- On larger installations, more complex than a HA SAN solution
- More complex on updates (ESXi and SAN are always down together)
- No ESXi snaps for Storage OS (not a problem with ZFS)
- fixed RAM assignment for storage OS (due to pass-through)

Another obvious disadvantage is overhead. Because you're in a virtualized environment, any packet has to be DMA'ed twice. Once to the host memory and once to the VM memory. This is unless you are using pass-through mode. But VM-Ware's pass-through mode leaves much to be desired and markedly weaker than Hyper-V's.

_Gea · Mar 29, 2012

Child of Wonder said:
If you plan to ever run more than 1 host, running a separate physical box for the ZFS SAN is a no brainer.

If you have several ESXi servers with shared storage, you need a redundant storage
or you have a massive single point of failure. You need also a fast redundant SAN network between.

Such a config is usual but expensive. I do not like the complexity and do not like to spend the
money and I can accept a downtime of a ESXI-SAN pair of about 30 min to plug the pool to another
All-in-one to start the VM's there. I use 6 of these pairs, I use the free NexentaCore/ OpenIndiana and use
the free ESXi. And i use dedicated ZFS machines for filer and backup.

It depends..

brutalizer · Mar 29, 2012

FreakinAye said:
2) ESX is considered rock solid in terms of stablity. In addition it supports a number of HA features and clustering to ensure uptime of VMs. It is used for major enterprise applications in probably 95% of Fortune 500 companies. Running a VM on ESXi is considered more stable than bare metal IMO.

We also use ESX for testing and such stuff. But not in production. We run bare metal in production. But we have very high demands, at my company. We chase microseconds.

3) If you have a VM, you know exactly what hardware the OS thinks it has, since it's all virtual and based on the ESXi version. If you were to upgrade your server in the future and switch to *completely* different physical hardware, your VM would not care--it doesn't have to worry about drivers or anything else. You just modify the number of vCPUs and RAM and it's good to go. You can even migrate it over to a new ESXi server if you need to do maintenance.

I dont really understand. How many drivers are there? The ESXi server needs a Gigabit NIC driver AAA for Dell server. The Solaris VM needs another driver BBB?

Now I move the Solaris VM to a HP server. The HP server needs a NIC driver CCC. But the Solaris VM is still using the same BBB driver, or is it another BBB'' driver?

How many drivers are there? If I have a Dell server, and then move to HP server, what happens?

I mean, Solaris bare metal uses one driver for the Gigabit NIC. ESXi also uses one driver? Or two drivers?
ESXi -> bare metal needs one driver. Solaris VM -> ESXi needs another driver?

Child of Wonder · Mar 29, 2012

brutalizer said:
I dont really understand. How many drivers are there? The ESXi server needs a Gigabit NIC driver AAA for Dell server. The Solaris VM needs another driver BBB?

Now I move the Solaris VM to a HP server. The HP server needs a NIC driver CCC. But the Solaris VM is still using the same BBB driver, or is it another BBB'' driver?

How many drivers are there? If I have a Dell server, and then move to HP server, what happens?

I mean, Solaris bare metal uses one driver for the Gigabit NIC. ESXi also uses one driver? Or two drivers?
ESXi -> bare metal needs one driver. Solaris VM -> ESXi needs another driver?

How often do you refresh your bare metal servers? How big of a pain is it to set up a new server, install an OS on it, install all the drivers, update the firmware, plan a migration of the apps from the old box to the new, etc.? And that's just ONE physical server. What if you have dozens you have to refresh?

Now imagine you refresh only the VMware hosts and then migrate all your VMs from old hardware to new and you're done. The guest OS doesn't need to be updated, no apps need to be migrated, no IPs need to change, very little involvement is required from the teams that manage the apps. Just drag from one cluster to another and done.

What if a physical box fails? How long does it take to rebuild from backups on a new server? With VMware if a physical server fails, your VMs simply boot up on another server.

What if you want to perform maintenance on a physical box? In the bare metal world, this requires a maintenance window and shutting down the server and all its apps. In VMware it's something you do during the day while you surf Youtube and no one's apps go down.

I could go on and on and on but I hope you're getting the point why the servers only seeing one type of hardware matters and how much easier it makes managing them.

x-cimo · Mar 29, 2012

_Gea said:
Others
- with a All-In-One and ZFS you can do only Cold-snaps (best after shutdown or halt).
If you need hot snaps, you must care extra (example do ESXi hot snaps and then ZFS snaps)

Can this be explained in more details? I don't quite undersand..

Do you mean that it's not possible to create snapshot of Guest VM while they are running?

danswartz · Mar 29, 2012

Take an ESXi snapshot of a running VM is not 100% guaranteed to give a consistent image. The sequence you would do is:

take esxi snapshot including memory&quiescing guest FS.
take ZFS snapshot
delete esxi snapshot.

stevebaynet · Mar 29, 2012

Child of Wonder said:
How often do you refresh your bare metal servers? How big of a pain is it to set up a new server, install an OS on it, install all the drivers, update the firmware, plan a migration of the apps from the old box to the new, etc.? And that's just ONE physical server. What if you have dozens you have to refresh?

Now imagine you refresh only the VMware hosts and then migrate all your VMs from old hardware to new and you're done. The guest OS doesn't need to be updated, no apps need to be migrated, no IPs need to change, very little involvement is required from the teams that manage the apps. Just drag from one cluster to another and done.

What if a physical box fails? How long does it take to rebuild from backups on a new server? With VMware if a physical server fails, your VMs simply boot up on another server.

What if you want to perform maintenance on a physical box? In the bare metal world, this requires a maintenance window and shutting down the server and all its apps. In VMware it's something you do during the day while you surf Youtube and no one's apps go down.

I could go on and on and on but I hope you're getting the point why the servers only seeing one type of hardware matters and how much easier it makes managing them.

I agree with everything you said, but OP is talking about virtualizing the storage tho, which is a bit different than what your talking about. For failover and any kind of HA, ESX servers will need shared storage. If the storage itself is virtual and only lives on one host, if that host dies there is nothing to failover. (unless you're using that ESX VSA thing that distributes shared storage across multiple ESX hosts)

ghostdunks · Mar 30, 2012

stevebaynet said:
I agree with everything you said, but OP is talking about virtualizing the storage tho, which is a bit different than what your talking about.

There might be some confusion here, or I may be mistaken myself. AFAIK, the OP isn't talking about virtualizing the storage, thats just a point some others in the thread have brought up.

brutalizer said:
I dont really understand. How many drivers are there? The ESXi server needs a Gigabit NIC driver AAA for Dell server. The Solaris VM needs another driver BBB?

Now I move the Solaris VM to a HP server. The HP server needs a NIC driver CCC. But the Solaris VM is still using the same BBB driver, or is it another BBB'' driver?

How many drivers are there? If I have a Dell server, and then move to HP server, what happens?

I mean, Solaris bare metal uses one driver for the Gigabit NIC. ESXi also uses one driver? Or two drivers?
ESXi -> bare metal needs one driver. Solaris VM -> ESXi needs another driver?

To answer OP's latest question --> I don't have any experience with ESXi, so take everything I say with a grain of salt. I think the answer of which driver its using if you move from a Dell Server to a HP server, is that the ESXi server may need to change its NIC driver from AAA(Dell) to CCC(HP). But to the Solaris VM, the network interface has been virtualised to a standard network interface so that it'll always see interface BBB for which it'll always need a BBB driver, regardless of what hardware the host(ESXi) is running on. Hope that makes sense. Hence why if you run virtualised servers, it doesn't really matter if you move the host(or hypervisor) from one set of hardware to another set of hardware, the hardware that is presented to the VM is always going to appear the same, simplifying maintenance, so you can pretty much plug and play between different hardware.

_Gea · Mar 30, 2012

stevebaynet said:
I agree with everything you said, but OP is talking about virtualizing the storage tho, which is a bit different than what your talking about. For failover and any kind of HA, ESX servers will need shared storage. If the storage itself is virtual and only lives on one host, if that host dies there is nothing to failover. (unless you're using that ESX VSA thing that distributes shared storage across multiple ESX hosts)

From network view, a barebone SAN server and a virtualized SAN server are identical, they deliver shared storage. The point is, with All-In-One, active VM's are usually located on a "local" SAN to avoid expensive high-speed network traffic and to have it working independently from other ESXi or SAN servers. HA or moving VM's is not affected. They work identical to a dedicated hardware SAN.

This approach of a decentralized infrastructure is an option if your installation is not big enough
for two or three centralized dedicated and redundant SAN storage server with a redundant high speed SAN network.

brutalizer · Mar 30, 2012

Child of Wonder said:
How often do you refresh your bare metal servers? How big of a pain is it to set up a new server, install an OS on it, install all the drivers, update the firmware, plan a migration of the apps from the old box to the new, etc.? And that's just ONE physical server. What if you have dozens you have to refresh?

I totally agree with this. Absolutely. There is no question that VMs are easier to administer than bare metal.

But that is not the question. I am talking about stability and performance.

Performance can be much better on virtualized hardware, because, for instance, SmartOS is a Solaris derivative with KVM as hypervisor - allows you to utilize much RAM an 10gbit nic. This is beyond WinXP. Thus, running WinXP virtualized in SmartOS with lots of RAM and fast NICs give better performance.
http://www.theregister.co.uk/2011/08/15/kvm_hypervisor_ported_to_son_of_solaris/

"With I/O-bound database workloads, he says, the SmartOS KVM is five to tens times faster than bare metal Windows and Linux (meaning no virtualization), and if you're running something like the Java Virtual Machine or PHP atop an existing bare metal hypervisor and move to SmartOS, he says, you'll see ten to fifty times better performance - though he acknowledges this too will vary depending on workload."

But this does not applies to high end setups. No one believes NASDAQ stock exchange would run faster virtualized. This only applies to low end servers that does not have much RAM, fast NIC, etc. In essence, it allows your low end server to utiliize high end hardware - which makes it faster.

Regarding stability, the more code -> the more bugs. This can not be disputed. Thus, running virtualized should be more unstable.

ghostdunks said:
To answer OP's latest question --> I don't have any experience with ESXi, so take everything I say with a grain of salt. I think the answer of which driver its using if you move from a Dell Server to a HP server, is that the ESXi server may need to change its NIC driver from AAA(Dell) to CCC(HP). But to the Solaris VM, the network interface has been virtualised to a standard network interface so that it'll always see interface BBB for which it'll always need a BBB driver, regardless of what hardware the host(ESXi) is running on. Hope that makes sense. Hence why if you run virtualised servers, it doesn't really matter if you move the host(or hypervisor) from one set of hardware to another set of hardware, the hardware that is presented to the VM is always going to appear the same, simplifying maintenance, so you can pretty much plug and play between different hardware.

Can someone confirm this, or chime in?

And yes, you are right, I am talking about not only virtualized storage, but all use cases.

iroc409 · Mar 30, 2012

brutalizer said:
Regarding stability, the more code -> the more bugs. This can not be disputed. Thus, running virtualized should be more unstable.

Yes, but virtualization has been around for quite some time now, and is pretty stable. I'm just a hobbyist with it, but it's used regularly in enterprise and getting more use everyday. I guess what are you looking for someone to say?

In my modest use at home, I've never had any bugs or crashes with virtualization--either in testing systems, or the virtual WHS I run on Hyper-V for a couple years now to do my network backups.

Untangle uses virtualization of sorts apparently in some of its applications, and my Untangle box has been running for over 5 years with no stability issues whatsoever.

_Gea · Mar 30, 2012

brutalizer said:
Regarding stability, the more code -> the more bugs. This can not be disputed. Thus, running virtualized should be more unstable.

This is correct if code quality is always the same.
But the few supported hardware and needed ESXi drivers + the some basic emulated hardware
with their default drivers are well tested, have less bugs and are at a higher quality than an average Windows
or Linux driver for any modern hardware.

In most cases when not in all cases an average virtualized server has a better stabilty than an average hardware server.

FreakinAye · Mar 30, 2012

I was just looking at my current home ESXi box, all desktop class hardware (Asus mobo w/ Q6600)

Uptime is currently 275 days.

stevebaynet · Mar 30, 2012

ghostdunks said:
There might be some confusion here, or I may be mistaken myself. AFAIK, the OP isn't talking about virtualizing the storage, thats just a point some others in the thread have brought up.

Yeah you're right, i see that now.

apnar · Mar 30, 2012

brutalizer said:
Regarding stability, the more code -> the more bugs. This can not be disputed. Thus, running virtualized should be more unstable.

You need to look at the system as a whole. For that additional code you gain more stability in other ways. If you were talking about two simple systems, one where the app runs on bare metal and another where the app runs virtualized on a single server your argument would be correct. But that isn't the comparison folks in the enterprise make, they are comparing the app running on bare metal hardware against the app running virtualized on a cluster of ESXi servers with shared redundant storage.

In the case of running the app virtualized on the cluster you now have tons of advantages for this simple app that increase overall stability and cost a fortune to do on bare metal and likely require app development. Most of those advantages have been mentioned elsewhere in the thread, but some of them are ease of hardware maintenance, ease of upgrading, HA, DR, network redundancy, storage redundancy, ease of backup, ease of upgrade roll backs, etc. Over the life of the app it will require much less downtime in the virtualized environment then it would on bare metal.

And to confirm the question on the drivers. Within reason you should almost never have to upgrade the driver within a VM regardless of underlying physical hardware upgrades (ignoring pass through). In VMware for example, each virtual machines configuration file states the version of the virtual machine for that VM. When the VM is started VMware will present that VM with "hardware" that matches that level of virtual machine. Newer versions of their software continue to support older versions of the virtual hardware. The only down side is sometimes you don't get access to the latest features and performance without upgrading the level of your virtual machine hardware.

Argentum · Mar 30, 2012

brutalizer said:
So, what are the pros and cons of each solution? What are your experiences? Let us discuss this a bit?

Yes, I did skim the posts so far, but I just wanted to add my own experiences.

I ran Solaris 11 Express on ESXi for a ~6 months with no other VMs. The primary benefit I see to ESXi virtualization (besides just the "playground sandbox" it offers) in terms of just the storage server is the ability to backup your ESXi VMDK and plunk it down on other ESXi supported hardware without really having to change the core OS. I do have multiple LSI based controllers including at the time a SAS3801E-R connected to a Chenbro SAS expander in an external case (having fully populated the primary case) on a SuperMicro X9SCM-F-O w/16GB ECC.

That said, with Solaris 11 came along, I found the upgrade didn't work so well (not related to ESXi) so I had to reinstall the OS from scratch anyway and import my zpools, which I did on a VM.

Fresh with that experience (i.e. seeing how easy it is to install Solaris 11 in the event of a complete wipeout), a couple months later, I converted to bare metal, with 500GB 2.5" boot disks in a zpool mirror. I don't even bother to backup my OS partition since I have scripts and documentation now that would let me pop in a new boot HDD and get the server backup in probably less than 30 minutes.

Going back in time, before hindsight, what I had found was that from time to time when on the ESXi platform, even with PCIe passthrough on for all the LSI cards, I was having spurious errors on the external pool (yes, I know one should avoid SAS Expanders, especially with SATA disks... but I'm trying to build one big server with chained arrays... partly just to have everything on one server, partly just for the challenge

). It would work most of the time but sometimes during scrubs all of the disks would show errors, and only in the pool that was connected to the expander. Naturally I had tried different cables, even changing the cards - first the SAS expander (actually I'd had an HP SAS expander originally), then to an LSI 9201-16e, but while the Chenbro seemed to have fewer occurrences of errors, I still had problems.

After switching to bare metal, these errors completely disappeared. Oddly, though, I then began having random lockups, where the entire OS would seize about every 10 days or so, forcing a reset, with nothing in the syslogs.

Since then, I've disabled option ROMs on the cards and it's been 100% stable since. Maybe that would have helped in the ESXi environment; I don't know. (BTW, yet another vote for ECC RAM because I never felt the need to spend hours running RAM tests to prove a negative.)

One very minor issue under ESXi was I never figured out how to get my VM more than 12GB of RAM (I only have 16GB but I think this would have been true if I had 32GB or 48GB). There was probably something I was missing, but I'd get errors from vSphere indicating I had to change some configuration if I went past 12GB, and errors another way if I changed that configuration. Didn't really have to worry about that bare metal.

Finally, even though I had paid for a vCenter 4.1 license, I finally got fed up with VMWare's licensing model for version 5. Definitely didn't have to worry about that bare metal either. Yes, I could try other hypervisors, but that's when the fun ran out and I decided to play with other stuff.

brutalizer · Apr 1, 2012

_Gea said:
This is correct if code quality is always the same.
But the few supported hardware and needed ESXi drivers + the some basic emulated hardware with their default drivers are well tested, have less bugs and are at a higher quality than an average Windows or Linux driver for any modern hardware.

Yes I agree that ESXi drivers should be stable and well tested.

But, inside the guest OS, do I need to install another drivers? Or, will the same ESXi drivers be used by ESXi, and also be used in the guest VM? In that case, it should be stable because there is only one tested driver.

But if there is one ESXi driver, and also another Windows driver - then there are two drivers. And two drivers are always more unstable than one driver.

In most cases when not in all cases an average virtualized server has a better stabilty than an average hardware server.

I am not entirely convinced on this yet. I mean, ESXi is basically the Linux kernel. And as we all know, the Linux kernel is not the most stable thing out there. There are lot of sysadmins that consider Linux to be a toy OS, riddled with bugs and unstable. So you could kinda say that you run Solaris, on top Linux. And running only Solaris, should be more stable than running two or three OSes, no matter how well tested they are.

Argentum,
If you want to run Solaris on bare metal, and use virtualized guests ontop, then maybe you should consider SmartOS. Which is just Solaris kernel acting as backend and storage server, to guest OSes. It uses KVM.

PointandClick · Apr 1, 2012

brutalizer said:
But, inside the guest OS, do I need to install another drivers? Or, will the same ESXi drivers be used by ESXi, and also be used in the guest VM? In that case, it should be stable because there is only one tested driver.

But if there is one ESXi driver, and also another Windows driver - then there are two drivers. And two drivers are always more unstable than one driver.

The ESXi installation will have it's drivers that it uses to communicate with the hardware. Each OS installation will be presented with virtualized hardware for which it will have it's own driver.
In VirtualBox (don't have much experience with ESXi) for example, I have a Solaris VM. VB presents a virtualized NIC, which is an Intel Pro/1000 MT by default, to Solaris.

I guess in a sense you have to worry about two drivers failing, but really as long as your host hardware is supported by ESXi you should have nothing to worry about. The Intel Pro/1000 has been tested time and time and time again.

I understand what you mean. In reality you are introducing an extra point of failure, but honestly if it were even possible to quantify any difference in reliability the power and benefits of virtualization outweigh the difference 1000 times over.

iroc409 · Apr 1, 2012

brutalizer said:
There are lot of sysadmins that consider Linux to be a toy OS, riddled with bugs and unstable.

And a lot of sysadmins considered Linux to be far ahead of Windows for a very long time.

What else are you going to use? There aren't a lot of options, only Solaris and *BSD come to mind (though I guess you could include AIX, HP, OS/400 for the big iron).

As you know, Linux has a lot of distributions. Many of them are toy-ish distributions with little support for a "unique" and quirky home user. Some of them however are stable, enterprise-class distributions (such as VMWare). Our Xerox plotters at work have Linux-based control systems (based on RedHat, I think).

VMWare would not be in the position they are in if they were distributing a toy-like OS. There are certainly--in some applications--performance and stability issues that are not appropriate for virtualization. Those conditions are becoming much fewer as time (and hardware horsepower) go on. You just have to evaluate it for the specific application you are using it for.

FreakinAye · Apr 2, 2012

brutalizer said:
There are lot of sysadmins that consider Linux to be a toy OS, riddled with bugs and unstable.

Tell that to Red Hat, who has built a $1B company on this 'toy OS'

Virtualized storage server or bare metal? Pros/Cons?

[H]ard|Gawd

Supreme [H]ardness

[H]ard|Gawd

Supreme [H]ardness

Limp Gawd

[H]ard|Gawd

Ninja Editor SuperMod

[H]ard|Gawd

[H]ard|Gawd

Supreme [H]ardness

Gawd

[H]ard|Gawd

2[H]4U

Gawd

n00b

Limp Gawd

Supreme [H]ardness

n00b

n00b

Gawd

[H]ard|Gawd

Supreme [H]ardness

[H]ard|Gawd

2[H]4U

Limp Gawd

2[H]4U

Limp Gawd

n00b

Supreme [H]ardness

[H]ard|Gawd

[H]ard|Gawd

Supreme [H]ardness

n00b

Limp Gawd

Weaksauce

Weaksauce

[H]ard|Gawd

Limp Gawd

[H]ard|Gawd

n00b