ESX, Diskeeper, and VM backups

Thuleman · Dec 24, 2008

Objective: Minimize disk I/O during backups (open snapshots)

Potential Issue: Diskeeper defrag
Diskeeper doesn't integrate with vCenter, so it is not aware that the VM is being backed up.
Does anyone have any experience with how much of an issue this actually is? Conceivably Diskeeper could create substantial disk I/O while the snapshot is open.

Potential Solution: Disable Diskeeper during backups
One can schedule to disable Diskeeper for a period of time (i.e. backup window). This is somewhat of a rough solution as the exact time frame needed for a backup is not known, but it's probably better to miss a few hours of automatic defrag than clog up an open snapshot.

Thoughts?

lopoetve · Dec 25, 2008

VCB will still backup, the problem you'll face is threefold:
1. Snaps are gonna grow freaking MASSIVE, FAST.
2. Performance is gonna tank like mad, and you may not be able to commit the snaps unless you stall diskkeeper anyway.
3. VSS / sync driver with DK running will never quiesce. You may never get the backup to kick off.

I'd skip it on a VM. It's gonna hammer things a lot more than you want anyway, as it's doing block level IO all the time and ESX was never really designed to anticipate that kind of stuff. Same problem as block-level real time virus scanners - more than a few VMs, and you'll fill the SCSI queue on the host.

Thuleman · Dec 25, 2008

As virtualization moves into mainstream, both, the virtualization providers, as well as the application providers will have to sit down and figure this out.

For the time being defrag is useful, and no matter what AV will certainly be needed for eternity to come.

One work-around for DK could be to only run it for X minutes per day and stagger it with other VMs so that there's not too much load on shared storage. Then schedule backups outside of DK running. I am sure it call all be done, dismissing DK entirely isn't the solution imho unless there's a viable alternative.

Disk I/O is the main performance limiter of VMs, anything to speed I/O up is a good thing in my book.

lopoetve · Dec 25, 2008

Defrag isn't as useful as you think in a virtualized environment - your storage isn't a monolithic device as it once used to be, it's getting more and more spread across multiple spindles in multiple raid groups, and sometimes even spread across arrays! An IBM SVC may present a lun that is made of of space from a Clariion, a FAS 3050, and an old MSA! How does defrag work for that? How do you make space contiguous when you've got 15 disks across 3 different arrays from 3 different manufacturers involved in your virtual disk?

Are you sure it would even help, since the arrays are doing it themselves as well?

You're thinking in the physical realm, not in the virtual realm. Things are different here. Storage isn't a monolithic device that you can optimize the same way, nor is memory or even CPU power! You have to look at it differently.

Disk IO is only the limiter if you have a cheap array too

lopoetve · Dec 25, 2008

Honestly, given that VMFS is optimized, and that your array is - your vdisks are not disks that have heads that have to move to find data, so it's not like a normal physical server in that sense. The disk is already spread across spindles! It's not like a physical machine!

Thuleman · Dec 25, 2008

I see what you are saying about fragmentation and distributed storage, although even across n spindles you ought to be able to optimize the allocated space per file per spindle. Tough to say how much that matters.

I wish there was an easy way to put some numbers to that. Test out what the impact of running DK actually is, whether it slows things down, or improves performance.

I know that VSS is not as much of an issue as DK has an VSS-aware setting in there somewhere (from memory).

Speaking of AV, you have a recommendation for ESX ran Windows Server VMs?

lopoetve · Dec 25, 2008

Thuleman said:
I see what you are saying about fragmentation and distributed storage, although even across n spindles you ought to be able to optimize the allocated space per file per spindle. Tough to say how much that matters.

I wish there was an easy way to put some numbers to that. Test out what the impact of running DK actually is, whether it slows things down, or improves performance.

I know that VSS is not as much of an issue as DK has an VSS-aware setting in there somewhere (from memory).

Speaking of AV, you have a recommendation for ESX ran Windows Server VMs?

If only that were actually true

The problem is that you need to do the storage management and fragmentation from the host, not the VM - the VM has no idea that it's not looking at a single disk

that kind of access is abstracted away.

DiskKeeper wasn't ever built to think about that kind of storage. The technology to "Defrag" distributed or even array based storage of a vdisk isn't yet really out there. This is why I'd avoid DK on a VM - all you'll do is slam the host, and you honestly won't see more than a % of improvement, if even that. Chances are, you'll see lower performance actually, since you still only have the one scsi queue to work with. One host worth of hardware, multiple hosts worth of software trying to "optimize" something that isn't really there, through that one host's hardware. ^_^

Seriously, almost every normal utility is useless in a VM - you need to optimize from the host level, not the guest. The only improvement is from aligning data disks, and that gets you, at most, 1-2%. And only on data disks.

VSS driver implementation in vmware tools is a bit buggy, to be honest. Lots of things that keep stuff from quiescing the filesystems. It's actually quite annoying.

Now, this is only true for hyper-V and/or ESX. Workstation/Xen/Server/etc where you've got a true Host OS, then absolutely - defrag away, host and guest

My recommendation for AV is:
avoid Trend Micro - block level scanning is BAD for ESX vms.

McAfee is ~ok~, but stagger.
haven't seen problems with norton AV or eset.

No matter what, STAGGER THE SCANS.

Thuleman · Dec 25, 2008

So if I get this correctly then the only time defrag would currently make sense in ESX is if you use raw device mapping, and set DK to only defrag that drive.

lopoetve · Dec 25, 2008

Yes, although it depends on how the RDM is presented.

If, for instance, you present it from a RAID1 mirrored pair then sure, it'll make sense, especially if the VM gets the full lun. Even a RAID5 set. I wouldn't bother on a virtual lun though or metalun, and most things are heading that way. And I don't know if I'd bother with something that's a part of a VG.

Thuleman · Dec 26, 2008

I dug around a bit and found a DK issued whitepaper. As with most whitepapers out there, they do maintain their business interest throughout, and they do not address the issue if DK induced I/O saturation, but here's how they reason the importance of local defrag in virtualized storage:

DK said:
Ive implemented storage virtualization, so do I still need to defragment my local file system?

Storage Virtualization is commonly used in SANs. This technology
essentially abstracts logical storage (what the operating system
sees and uses i.e., the file system) from physical storage (the
striped RAID sets). The key differentiator in virtual storage is that
the multiple physical storage devices (e.g., a RAID array) are
combined into one large grouping, on top of which a virtual storage
container is created.

SAN file systems (a.k.a. cluster file systems) such as VMFS from
VMware or EMCs Celerra, are a third and different category of file
system known as shared-disk file systems and are the backbone
of storage virtualization (different from previously defined Local or
Remote file systems). An operating system defragmenter, such as
Diskeeper, only recognizes the local disk file systems that it
natively supports. Vendors of proprietary files systems typically
include specialized technologies to optimize performance. These
file systems are the foundation for storage virtualization.

I/O Mapping and Redirection

Storage virtualization uses metadata to properly channel I/O. Software
on a storage virtualization device (such as a SAN Switch) will translate
logical disk locations to physical disk ones.

Here is an example:
1. A storage virtualization device gets a request for a logical location
of LUN#1, LBA 32
2. It then performs a metadata lookup for that address and finds it
actually maps to LUN#4, LBA167.
3. The device then redirects the request to the actual physical location
of the data
4. Once it retrieves the data, it passes it back to the originator without
the originating requestor ever knowing that the request was completed
from a different location than what it knew.

The fact that there is not a one-to-one mapping of file system clusters
to LBAs (due to LUN virtualization) is not an issue. Logical, file system
level fragmentation causes the operating system to generate additional
I/O requests to the virtualization software. Using metadata, the software
then redirects I/O from the logical disk to its physical location.

The local disk file system (e.g., NTFS) does not know of, nor control the
physical distribution or location in a virtualized storage environment, and
as a result of fragmentation, NTFS has to make multiple requests
regardless of the physical or virtualized storage environment.

In SAN file systems, block size (the smallest addressable virtual unit) is
a configurable metric and varies based on the software used. Vmwares
VMFS, for example supports 1MB to 8MB blocks. Logical Cluster
Numbers (LCNs) are a file system construct used to map a file in an
index table (e.g., Master File Table in NTFS) to LBAs. Disk Controllers
take those logical blocks and make the appropriate translation to a
physical location. Disk controllers do notno matter how smart they
areindependently map fragmented file I/O into consecutive or linear
block requests. They cannot pool incoming block-based data back
into a file.

This means that regardless of the fact that the file system does not
map directly to a physical location, file system fragmentation will
create the exact same kind of phenomenon on RAID as it does on
virtualized storage (multiple RAID arrays group together).

SANs can offer extremely efficient and high-performing data storage,
but it is not the job, nor within the scope of ability for a SAN system
(hardware or software) to address file system level fragmentation.
Proprietary technologies employed by one vendor can be more
efficient at retrieving data blocks than another. Architectures can
vary as well. No matter how efficient data retrieval can be, and how
much physical disk limitations can be mitigated, the overhead on
the operating system that is retrieving the file is beyond the scope
of SAN technology and is impacted by file fragmentation.

So, to answer the question, yes local disk file defragmentation is
still necessary.

I can see what they are saying. Basically, and to simplify it, no matter how spread out your VMDK file is across physical devices, there is fragmentation within the VMDK file.

I do have some equipment available to test this out, but I don't have enough insight into how to properly set up a test environment where the performance difference can be measured. What I would need is a fragmenter. Some tool that creates a consistent level of fragmentation for each test setting so that effects of defragementation can be quanitfied.

I/O saturation would remain a big problem either way though. If local defrag would lead to actual and desirable performance gains, then what they need is to hook into vCenter and have some sort of round-robin defrag going on where only one VM is being defragged at a time to take it easy on the I/O of the host.

lopoetve · Dec 26, 2008

Fragmentation within the VMDK is meaningless though.

The point of defrag is to make consecutive writes and reads require little to no seek time. There is no way to do so from the guest OS side - you have no idea where the next consecutive block is located!

They're thinking from a host side - the guest will issue the SAME number of IO requests no matter how the vmdk is fragmented or the guest FS inside - the host is what has to deal with fragmentation for a RAID device, and DK does not run on ESX (nor should it - VMFS is not a normal filesystem). The software does NOT do the metadata lookup - the SAN does. We simply request the data, it does the mapping as to where that's located. The Host has no more clue about the storage than the guest for that matter, when used with a SAN! It's not RAID in the same sense, it's a RAID group and a LUN - the two are totally different. The next block may not be at all where you expect it to be.

They're really thinking about a host/guest model like Workstation or hyper-V, NOT ESX. Again, this makes sense there, but from our side, it does not, especially not on ESX.

They're trying to sell you something you don't need. Defrag from the array. Save money. If you need max performance, use RDMs on 1+0 sets and then defrag those with an aligned partition (and you get maybe 5% for that on extremely disk intensive programs (exchange/SQL)). Don't bother defragging inside of a VMDK though.

Thuleman · Apr 22, 2009

Just figured I would post some continuation of my Diskeeper saga.
They had send me a quote to renew maintenance on server licenses we hold, and I sent a note back saying I won't be renewing the licenses as I am all ESX now and don't need them anymore.

After some rather meaningless back and forth between me and the sales folks (them sending me white papers saying I do need it, I am saying that I don't), the whole thing got escalated to upper management of Technical Support & Sales Engineering (gonna leave the exact title and name out).

DK Technical Support & Sales Engineering said:
Thank you for your information. Currently Diskeeper supports the Microsoft
Windows operating system platforms. VMware ESX uses a Linux based operating
system platform. [Thank you, Captain Obvious! -Thuleman]

With new disk storage technologies such as multiple spindled disks, RAID,
SAN, NAS, SSD, and even virtualization, you need to think beyond the
physical placement of data as the sole source of performance issues
associated with fragmentation.

Your assumption of defragmenting the virtual disk from within the virtual
guest Windows operating system of being no use or benefit is incorrect.

I recently visited a large pharmaceutical manufacture who uses a large
number of virtual systems throughout their production environment. In their
environment I was given access to a typical virtual system (Windows 2003
Server) running on an ESX host. They had not yet defragmented or installed
Diskeeper. I installed Diskeeper, but disabled the automatic defrag
operation so I could get a true fragmentation analysis before Diskeeper had
a chance to move any files. The result was that there were 28,809 Total
Files. 6,228 of these files were fragmented into a total of 55,393
fragments. I then used a tool from the internet http://www.winimage.com
<http://www.winimage.com/> called Readfile.exe which measures the time and
throughput to perform an open, read to end of file, and close of a given
file.

I targeted the top 10 most fragmented files from the analysis previously
taken and noted the results. I then defragmented this virtual system using
Diskeeper. Once the defragmentation was finished I again used Readfile.exe
to take another performance measurement.

Before Diskeeper After
Diskeeper
Improvement

ReadFile.EXE Most Fragmented Files

Fragments

Kb/Sec

msec

Fragments

Kb/Sec

msec

Fragments

Kb/Sec

msec

File 1 49MB

12,187

1,079

48,018

0

66,655

777

100.00%

98.38%

98.38%

File 2 163MB

2,883

12,827

13,333

0

68,911

2,481

100.00%

81.39%

81.39%

File 3 116MB

2,810

10,298

11,759

0

68,119

1,777

100.00%

84.88%

84.89%

File 4 114MB

2,489

10,891

11,018

0

68,932

1,740

100.00%

84.20%

84.21%

File 5 81MB

2,469

8,468

10,037

0

69,508

1,222

100.00%

87.82%

87.83%

File 6 8,203KB

1,299

2,015

4,166

0

56,374

148

100.00%

96.43%

96.45%

File 7 8,151KB

1,294

1,934

4,314

0

56,017

148

100.00%

96.55%

96.57%

File 8 57MB

1,115

11,489

5,240

0

67,732

888

100.00%

83.04%

83.05%

File 9 6,953KB

1,092

2,023

3,518

0

54,768

129

100.00%

96.31%

96.33%

File 10 6,143KB

1,080

1,768

3,555

0

48,386

129

100.00%

96.35%

96.37%

The data above clearly documents that there is a performance benefit in both
elapse time and throughput of a defragmented virtual Windows System as
compared to the same exact system in its fragmented condition.

I do agree with you that in certain configurations Diskeeper may not be able
to detect the workload of the Host system or the storage device. This is
currently being looked into by our product development group, but you can
still schedule Diskeeper to run at pre-determined times to obtain the best
balance of resources to achieve the huge benefits as I've documented above.

My reply to the above was peppered with lopoetve's facts from this thread. Will advise on how this will play out.

lopoetve · Apr 22, 2009

And we're not linux based! >_<

lopoetve · Apr 24, 2009

I'm posting my response here. Thule, I gave you more details in PM.

Fragmentation inside the VMDK is meaningless - the blocks are abstract, there is no point to defragmenting them.

Windows may think that a block is on opposite ends of a "disk", but it has absolutely no idea where they actually are located unless you are on a physical mode RDM - they may already be contiguous, for all you know, or they may be on 2 different SAN volumes. How does defragmenting a file inside a volume split as such have any effect?

In addition, here is the vmware policy on defragmenting:
There are 2 types of fragmentation: internal and external.

Internal fragmentation is when the file is allocated a file block by the file system, but it doesn't use the entire block. For eg, the file system block size may be 1024kB, so when you make a 1 kB file, you'll be allocated 1024kB by the block allocator. In this case, internal fragmentation is 1023kB.

External fragmentation is when blocks belonging to the same file are scattered all over the volume, thus increasing disk seek and rotational latencies and hurting performance.

When customers ask about fragmentation, they usually mean external fragmentation. External fragmentation is usually brought up in the context of performance, and internal fragmentation is usually brought up in the context of efficiency of space utilization.

The above concepts apply to any block abstraction: file, LUNs, memory, pages in a notebook, etc.

Is external fragmentation relevant to VMFS performance?

The answer is largely no, because:

1. Fragmentation causes performance degradation when IO request from application spans multiple blocks and these blocks are discontiguous. In our case, VMFS block size is so large, that most of the IO requests do not straddle block boundries. So even if blocks are discontiguous, IO requests execute to locally contiguous regions.

2. Because disks are preallocated, and are really large files, the gaps (if any) are really large too. So using a gap is unlike using a gap in a small blocksize file system.

3. Disk arrays have huge caches, and most of our writes are absorbed there. It is very difficult for fragmentation to have a noticeable impact when it comes to SAN devices. Not true of local disks, but oh well...

What about internal fragmentation?

1. VMFS-3 addresses this issue by offering a sub-block allocator. Small files use sub blocks instead of file blocks (a sub-block is 1/16th the size of a file block on 1MB file blocksize volumes, and 1/128th the size of a file block on 8MB volumes).

2. Storage media is getting inexpensive, so a few kB here may ignorable.

3. We don't create many small files.

Tell them to stop trying to sell you snake oil. All their product will do is saturate the SCSI queue with meaningless commands - any performace benefit is simply due to the data now being in the SAN cache or by sheer dumb luck - you have as much a chance of making data dis-contiguous as you do in improving seek time. You'd probably end up opening a ticket for performance issues when it blows the queue out of the water too.

Ask them this one question - by what method do they determine the actual block storage location of a file when abstracted onto a SAN(s) through the VMware virtual HAL? I'll be curious to see what they say-
The answer is that they can't - it's proprietary, and it cannot be done. Windows thinks the disk is a local volume - it has no idea what is backing it, and that's the whole point.

Now, on an RDM, sure - this will work great - windows knows that it's a san volume, and has actual control over the storage. On a VMDK? Run a scrubber from the SAN to defragment the storage - that'll get you some improvement, if you really want it, but only a bit. The only benefit from defrag you can find is from recovering sub-blocks, and all that does is recover a bit of space, if you absolutely have to have it.

I'm betting their vms were on local storage, which might benefit slightly from it, not on a SAN.

lopoetve · Apr 24, 2009

Ok, one other caveat:

If you're on a HOSTED product. Workstation, Server, et all - this will work GREAT. Spectacular even. ESX is different.

lopoetve · Apr 24, 2009

You know what, from looking at their product description and mission statement, I think I see what they're trying to do.

They're assuming that if windows believes the files are contiguous, it'll send fewer IO requests. EG, instead of "this block here, that block there", it's "this group of blocks". Especially when dealing with sub-blocks.

The problem with this theory is still the same - while you might reduce load on the ESX IO stack, given that you're sending fewer requests, you have no idea what the actual storage is - it may take ~more~ requests and time for the SAN to actually respond, depending on where it moved the data to, not to mention latency. Plus then you also have to account for the regular IO from DK.

From this working theory - theoretically, if you have an IO bottleneck at the ~host~ level, this could marginally improve performance - however, there is no guarantee or reasonable assumption that can be made either way for the SAN side of things. On local disks, this could offer a performance increase, given that you're dealing with far more limited bandwidth and connectivity to your storage - SANs avoid this with the massive cache and their abstracted design, not to mention that you generally don't have much of a limitation when you get to the actual HBA at that point and multiple paths to your storage device

In simpler terms - you might be able to consolidate ~commands~ somewhat, but you won't actually be able to consolidate blocks and storage - and consolidating commands is of limited and minimal benefit, as per point 1 above.

Thuleman · Apr 24, 2009

Just to provide some more background for folks following this thread.

The guys at Diskeeper were aware of this thread and that I posted the reply of the senior engineer here. In turn I was asked to ask lopoetve to join the discussion on whether or not Diskeeper running in a guest VM on ESX would be beneficial to performance.

It should be pointed out that Diskeeper Corp. is an official VMware partner, so conceivably the companies do work together on an engineering level to make things work. In that light its surprising that there isn't a white paper talking about the specific benefits of defragmentation using Diskeeper in a guest VM on an ESX host. Several VMware KB articles talk about defragmentation, but as lopoetve pointed out, always in the context of hosted solutions (Workstation, Server, Fusion, etc.).

There are a couple of white papers created by Diskeeper:
Virtualization and Disk Performance (PDF, direct link)
Solving Virtualization's Pitfalls (PDF, direct link)

I was also asked to supply fragmentation analysis reports from Windows VMs which run on ESX so those can be used to specifically explain how Diskeeper can improve performance of guest VMs even though it is not aware of where the blocks are located on the physical storage. It's not trivial for me to do so, but I am looking into creating those reports.

This wouldn't be [H] if we wouldn't try to get to the bottom on this issue. I have also asked the Diskeeper folks to join the discussion here on this forum, as this certainly is a topic that is of interest to anyone running Windows server VMs in ESX. Hopefully the DK folks will join in here so that we don't have to monkey around with emails sent between different people and content "lost in translation".

lopoetve · Apr 24, 2009

Yes, there is no doubt of the benefit when applied to a hosted product - it will definitely benefit both host and guest. It will also benefit RDMS, where you're passing the storage through directly, and windows is sending block level commands straight to the SAN. When virtualized though, through a custom, abstracted and clustered filesystem like VMFS, with a block size several orders of magnitude greater than windows' default, when on an enterprise-class storage area network, I cannot see a benefit of any significance. Especially when you consider the block differences in reads, and the sheer size and forms of the cache involved, and the abstracted storage nature of ESX.

As for their results - San cache. They've got several gigs of the stuff at least. Do the test, clear the cache, then do it again - or better, do it on a "Defragged" file, then fragment the file, and then do it again. I bet the results surprise you.

Note: This is all my personal opinion. Nothing more.

Thuleman · May 26, 2009

It's been quiet on this issue. The logs that I was supposed to create take their sweet time as it isn't a priority really. However, I just received this email as part of the regular product updates and found it worth posting as it does seem to get to the bottom of the issue.

Aptly titled "The truth about SAN and defrag" here is what it has to say:

Diskeeper Product Email said:
The situation
A well tuned and state of the art SAN can offer extremely efficient and high-performing data storage when viewed at the block level, but fragmentation and its performance corrupting effects occur at the file system level. No SAN handles this issue. To explain let’s look at I/O Mapping and Redirection in a SAN.

Storage virtualization uses metadata to properly channel I/O. Software on a storage virtualization device (such as a SAN Switch) will translate logical disk locations to physical disk ones. Here is an example:

1. A storage virtualization device gets a request for a logical location of LUN#1, LBA 32.
2. It then performs a metadata lookup for that address and finds it actually maps to LUN#4, LBA167.
3. The device then redirects the request to the actual physical location of the data.
4. Once it retrieves the data, it passes it back to the originator without the originating requestor ever knowing that the request was completed from a different location than what it knew.

The issue that file fragmentation (in NTFS for example) creates is that rather than forwarding one or a small number of I/O requests for a file into the SAN, the SAN gets hundreds or thousands of requests for a given file. With fragmentation the above steps are carried out unnecessarily many times over, creating what is an easily and effectively preventable performance bottleneck.

The solution
Only Diskeeper® 2009 Server Edition with InvisiTasking® technology delivers the following benefits:

* Automatically eliminates performance bottlenecks due to fragmentation
* Completely schedule free. InvisiTasking enables full-time duty with zero resource conflict
* Allows the SAN to run at maximum speeds without further fragmentation accumulation
* Significantly reduces the need for more hardware and the related upgrade costs
* Rapid return on investment in weeks, not months
* Proven to lower cost of ownership
* Decreases the amount of energy required to power hard drives

As always, YMMV so "handle with care".

lopoetve · May 30, 2009

ok. I'm answering now. Finally wrote all this up.

lopoetve · May 30, 2009

When looking at fragmentation and how it affects disk performance, there are several things to consider. The key points are partial block reads causing increased io requests, random seek time vs consecutive seek due to dis-contiguous blocks on your storage device, and increased io requests due to discontiguous blocks on your underlying storage.

Firstly - The issues with partial block reads are obvious. If you have a file scattered across several blocks, yet not fully utilizing each complete block, you risk a significant performance loss. Each block must be read, and then combined with the other blocks, so with fragmented storage like this, you have to greatly increase the number of IO requests issued to "gather" the entire file. This is a major performance consideration on hosted products, and theoretically, could be a performance consideration on bare-metal products, ~within certain limitations and caveats~. More on this later. This I will grant diskkeeper - they will issue fewer IO requests after their product is done than before, the question simply remains of the impact of this in a true production environment.

Secondly - random seek vs consecutive seek. This is also obvious, and is why defrag was invented in the first place. Random seek time is always significantly higher than simply doing a consecutive read. There is no way around this - especially on a hosted environment. Now, on a SAN, where spindle count is ~significantly~ higher, this is no longer as big a deal - random seek has latency very similar to consecutive seek - in addition, you have 4+gb cache to work with for both read and write access which obliviates a great deal of your latency (barring, of course, that you have your san close to 100% fully utilized) due to its extremely high speed. In addition, one major consideration has to be given - You have multiple VMs per lun. This is the de-facto ~standard~ for VM configuration on SAN - this means that consecutive IO requests will almost always BE random, since they're coming from multiple different virtual machines at the host level, and multiple different hosts at the san level! There is no way to force this to be a consecutive seek. Even matching up blocks to reduce io requests will not significantly change this - Unless you have 1 vm per lun on a RAID1 group, with only one host hitting the target, or an extremely limited amount of cache (the caveats from above). Hence, defragmenting an environment such as this (which is, indeed, the standard configuration) gains you little if anything, especially given the increased load it puts out.

In other words - they're right, if you don't have a normal environment. If you're sharing storage, sharing luns with multiple vms, and heaven forbid, multiple sans being virtualized, you won't see much change in seek time, if any at all.

Now, we get to the underlying storage filesystem - VMFS. Fragmentation within vmfs simply put is not an issue - our absolutely massive block sizes (1/2/4/8) guarantee we can pass block boundaries of any guest OS (and san segment size) if we need to - but we also only do partial block reads, unless needed. We will consolidate all reads into the block read needed, and then do that - chances are, we will issue the same number of reads from a VMFS perspective no matter what happens internally (see my post 14 in this thread). Remember- ESX is ~NOT~ realtime emulation/virtualization. If multiple reads fall in a block, we read, at most, a block. Once.

Simply put - their product works exceptionally well on a hosted environment, and works ~if and only if~ you have a very, very specific configuration in a bare-metal type environment. But, there's something else to consider -

There are questionable things about doing defrags as well. For one, you ~will~ saturate the SCSI queue and load up the hosts and fabric. I can guarantee you'll see massive performance issues while this is going on (a Virus-scan update will cause performance issues in a shared environment like this, or can!). Depending on the san in question, you may even have disconnects from the aborts. Plus, there are some technologies that this will interfere with. For instance, Galeta - aka, VMware Data Recovery. We use a changed block tracking system for backups with the new VDR product (it's slick, btw). This means that if you're defragging the VM, almost every block will be changed - massively increasing the time required for de-dupe analysis and increasing backup time and load.

So yes, the product works - within a certain scope. The question is - do you find yourself ~in~ that scope?

edit: And yes, it'll work decently well on a physical mode RDM, where it is more like windows accessing a single lun with no shared requests or the like - especially if that RDM is on a single spindle / dual. It will still hit the fabric hard though.

oakfan52 · May 31, 2009

lopoetve, Are there any vmware whitepapers that discuss the defragmentation of the guest file systems ? What your saying makes sense but It would be a lot easier to convince management to allow me to remove perfectdisk from our vm's if there was some documentation out there.

lopoetve · Jun 1, 2009

Unfortuantely, no.

Thuleman · Jun 1, 2009

The folks at Diskeeper are aware of this thread, and I have invited them to participate in the discussion, so it's not like there's no opportunity for them to pitch in. Of course I am not overly surprised that there's no response (which makes sense from a corporate point of view), I had just hoped one of their tech folks would come on and just have an unofficial discussion about it.

oakfan52 said:
lopoetve, Are there any vmware whitepapers that discuss the defragmentation of the guest file systems ? What your saying makes sense but It would be a lot easier to convince management to allow me to remove perfectdisk from our vm's if there was some documentation out there.

I think the key to getting what you want is to file a support request with VMware, state your configuration, and ask whether VMware advises you to defragment your guest file systems.

With Diskeeper being a VMware Partner I can see that this could get political rather quickly if VMware would take a whitepaper-position saying that DK is worthless on ESX.

Curiously enough I found this link Diskeeper and SAN in which an HP engineer advised a customer to not defrag using DK, whereas a DK Product Manager in that thread rebutted that by saying HP has no clue how defrag really works. Fun read.

I love DK, and I'll continue to use it in desktop environments. However, I am not going to renew or use my server licenses anymore. I don't need a whitepaper to make that call, I trust lopoetve, and luckily I am "the decider" so I don't need to ask anyone else whether to renew.

IMHO not paying for defrag licenses should be an easy sell to management if you need someone else to sign off on it. Just lead with the numbers of how much money you'll be saving during this recession.

lopoetve · Jun 1, 2009

If it works, I'd be happy to learn and chat with them about it.

It's awesome from a hosted perspective though. Far far better than any of the alternatives I've seen.

gshayes · Jun 2, 2009

This thread was pointed out to me by somebody who thought I might be able to help clarify some things.

Disclaimer time. I work for Raxco Software, the maker of PerfectDisk - a Microsoft Certified defragmenter. Raxco is also a VMware partner, a Microsoft Gold Certified Partner and is part of the Citrix Ready program. I'm a former 5-year Microsoft MVP for Windows File Systems (which means I know more than the average person about file systems and how they work - I speak on Windows file system internals and performance).

One thing to point out is that when we talk about fragmentation/defragmentation in a Windows environment, what we are actually referring to is logical fragmentation - meaning that more than 1 logical I/O request has to be made to access a file. From the file system's perspective, if a file is fragmented, there are multiple runs listed for the file in the $MFT. If a file is contiguous, there is a single run listed for the file in the $MFT. The more runs for a file, the more logical I/O requests, the longer it takes, etc - with "longer" being a relative term.

Just as the file system doesn't know (or care) what the underlying disk technology used (IDE, SATA, RAIDx, SAN), in a virtualized environment, the file system doesn't know (or care) that is is virtualized. The file system doesn't know (or care) what virtualization technology is being used (i.e. ESX or Hyper-V), what type of abstraction is being performed, etc. The file system in the guest also doesn't know the underlying host file
system. What we can see/show is that just as fragmentation in a non-virtualized Windows can affect performance, it can also affect performance for virtualized guests. For example, if you time how long it takes to read a heavily fragmented file in the guest compared to how long it takes to read if that file is contiguous in the guest, you can see that it takes less time to read. Free space consolidation in the guest improves write performance just as it does in a non-virtualized system. Now, the peformance improvements may not be as dramatic as seen in a non-virtualized guest, but it can still be seen.

Things to think about if considering defragmenting a virtualized guest. Defragmentation occuring in a guest can translate into increased resource usage at the host - with the potential to saturate the host and/or affect performance in other guests attached to that host. Even defragmenters that only defragment when the system is "idle" can cause this to happen. Just because a guest is determined to have "idle" resources doesn't mean that the host has sufficient resources. For example, I have Guests A,B,C on Host 1 with "idle" defragmentation configured on all 3 guests. Guests A and B aren't doing much, but Guest C is resource intensive. Guests A and B may be seen as "idle" and defragmentation starts - which adds to resource usage at Host 1 - which then can affect performance in Guest C.

That is why "virtual awareness" in an aplication is very important - for an application running in a virtualized guest to be aware that it is running in a virtualized environment and who its physical host is. This is very important if the application is designed to improve performance - without degrading performance for other guests. PerfectDisk 10 Virtual Edition was released this past Jan and includes the ability to monitor not only guest system resources but also that guest's physical host system resources. So, if Guest A and Guest B are considered "idle" but Host 1 resource usage is above defined thresholds, Guests A and B will NOT defragment. This is a smarter way to ensure that defragmentation can occur at the guest level without saturating the host or affecting performance in other guests. PerfectDisk Virtual Edition supports ESX, vCenter as well as Hyper-V. Please see http://www.perfectdisk.com/news/pd-webinars for a recording of a Webinar showing PerfectDisk Virtual Edition in a ESX vCenter environment - moving a guest from an busy host to an idle host and how PerfectDisk Virtual Edition detects resource usage at the host level to determine if okay to defragment the guest.

I hope that this helps to clarify how Windows file systems work in a virtualized environment.

- Greg/Raxco Software
Microsoft MVP 2003-2007
Windows File Systems

Disclaimer: I work for Raxco Software, the maker of PerfectDisk - a commercial defrag utility, as a systems engineer in the support department.

lopoetve · Jun 2, 2009

except you're forgetting that multiple logical io requests are effectively combined to a single block request by esx. See my point 2 above.

esx is not realtime virtualizaion. given the massive block size, multiple reads almost always fall inside a single block.

edit: Free space consolidation is absolutely a consideration, if that is something that matters to you, but not, imho, for performance.

edit2: Hang on, considering something I didn't think about before.
edit3: Ok, I was primarily thinking about reads. Yes, free space consolidation will improve writes - windows will spend less time looking for "free blocks" to issue a write command to. This is definitely true and something I didn't consider - if you have an extremely write heavy process, it may be something worth considering. However, given again, ESX is not realtime emulation, depending on the writes, they may fall into the same VMFS block and thus be handled far smoother than if it was physical hardware. It will be a performance improvement from the perspective of less wasted resources, but may not actually translate into an improvement on speed. IMHO, I'd be looking for things that actually improve speed, vs wasted resources, but most of the ESX hosts I see have resources to spare.

gshayes · Jun 2, 2009

"except you're forgetting that multiple logical io requests are effectively combined to a single block request by esx. See my point 2 above."

From NTFS's point - NTFS doesn't know this or even care. NTFS is doing it's own logical requests - which is completely independent of how these requests are being passed through any abstraction layers to actually get to the data in the vhd/vmdk on the host drive.

Again, no matter what ESX or any other virtualization technology is doing to ultimately access the data, it is completely independent of what is happening from the guest and how NTFS is working in the guest. What we have to separate out is how Windows/NTFS functions natively - regardless if Windows is running on a physical system or a virtualized system and what the virtualization technology is actually doing with I/O requests generated by the guest. The virtualization technology may be doing optimization of requests specific to how it handles things but that is completely independent/separate of what Windows/file system is going. Another way of looking at this is looking at the virtualization technology as just another hard drive. Hard drive controllers do seek re-ordering, seek optimization, caching that is completely independing of caching performed by NTFS. Neither is aware of what the other is doing.

lopoetve · Jun 2, 2009

edit: researching.

gshayes · Jun 2, 2009

There's many places where you need to look at and think about performace. Not only at the guest level but also at the SAN level and whatever you can do in the virtualization software. This includes formatting drives along cluster boundries, using proper cluster size for NTFS, having adequate cache available, etc... It also includes making sure that you do things "smarter" - in order to minimize resource usage at the guest level - which minimizes resource at the host level which minimizes resource usage at the SAN level, etc... Defragmenting in the guest is just 1 piece of the puzzle.

Thuleman · Jun 3, 2009

gshayes said:
Please see http://www.perfectdisk.com/news/pd-webinars for a recording of a Webinar showing PerfectDisk Virtual Edition in a ESX vCenter environment - moving a guest from an busy host to an idle host and how PerfectDisk Virtual Edition detects resource usage at the host level to determine if okay to defragment the guest.

I haven't watched the webinar, don't have java installed and it will be a cold day in hell before I do put that on my boxes.

I suppose the move of a VM from a busy host to an idle host doesn't really help with I/O queues on the SAN controller though. It seems to me that the ultimate bottleneck isn't as much the host as it is the shared storage.

gshayes said:
There's many places where you need to look at and think about performace. [...] using proper cluster size for NTFS

I'd actually pay good money to have some info about optimal NTFS cluster sizes (getting a bit OT here), because all I see out there are write-ups on "I think it should be this or that" without actual "scientific method" applies to back those opinions up.

gshayes · Jun 3, 2009

Another way to view the webinar recording - Windows Media Player (.wmv).

http://ftp.raxco.com/pub/download/multimedia/webinars/VirtualWebinar.wmv

gshayes · Jun 3, 2009

Thuleman said:
I'd actually pay good money to have some info about optimal NTFS cluster sizes (getting a bit OT here), because all I see out there are write-ups on "I think it should be this or that" without actual "scientific method" applies to back those opinions up.

Guess you've never attended one of my presentations - where the advice is free

lopoetve · Jun 3, 2009

gshayes said:
There's many places where you need to look at and think about performace. Not only at the guest level but also at the SAN level and whatever you can do in the virtualization software. This includes formatting drives along cluster boundries, using proper cluster size for NTFS, having adequate cache available, etc... It also includes making sure that you do things "smarter" - in order to minimize resource usage at the guest level - which minimizes resource at the host level which minimizes resource usage at the SAN level, etc... Defragmenting in the guest is just 1 piece of the puzzle.

As a developer, you should know the old adage about programs spending 90% of their time in 10% of the code - the same counts here.
The latency and access time in storage is in the fabric - optimize there, not at the host level, where you're spending a minor fraction of the time (average FC kernel latency is around 1ms, while average FC fabric latency is around 15-20ms per command for ESX).

I'm saying that minimizing Guest level commands via defragmenting may very well not minimize host or fabric level commands at all - improving the 1ms is relatively meaningless. Now, add in the fact that you are indeed loading down the fabric even more doing a defragment (if not even causing issues from such - I've seen plenty of times that ESX starts having to abort commands due to fabric load during a defrag), and I personally think the cost is greater than any possible benefit, except for special cases.

Now alignment and cluster size is definitely important, as that causes the san to issue multiple reads where only one would be needed - see the thread in here with Spacehockey, where we fixed his DS.

lopoetve · Jun 3, 2009

Thuleman said:
I haven't watched the webinar, don't have java installed and it will be a cold day in hell before I do put that on my boxes.

I suppose the move of a VM from a busy host to an idle host doesn't really help with I/O queues on the SAN controller though. It seems to me that the ultimate bottleneck isn't as much the host as it is the shared storage.

You're exactly right. There are few times I've seen that individual hosts jam their own queue - except when the SCSI queue is full at the san (especially on HDS arrays). Optimize the storage device - optimize the fabric, optimize the physical disks and controllers, and you'll be more than fine.

gshayes · Jun 4, 2009

possible catch-22. If guests (or standalone systems) are heavily fragmented, more I/O gets generated to the SAN than needs to be. So, you can saturate your SAN either way.

What can be benchmarked is that even virtualized guests see an improvement in throughput if defragmented. So, to ensure best possible performance in a virtualized guest, defragmentation may be necessary - while taking into consideration the effect on a guests host or the actual SAN backend. If you do find it necessary/beneficial to defragment then you want a defragmenter that is capable of minimizing resource usage - rather than something that isn't virtually aware that is pretty much guaranteed to saturate things.

lopoetve · Jun 4, 2009

Standalone, yes. VMs, not necessarily - it depends on what I/O you're looking at - in terms of actual requests issued to the hosts, sure - in terms of actual reads done, not always. That depends on a great many things outside of either of our control - namely the actual layout of data on the VMFS volume determining what reads are finally issued from the host.

You're also not considering deduplication technologies (the data you're trying to move may not even exist in standalone form) - trying to defragment guests located on a dedupe store is not only pointless, but is a waste of resources not only on the CPU side (as there's nothing that is going to happen), but on the fabric (the san is going to ignore everything you tell it to do), and may even cause issues with the dedupe store if you consolidate blocks and it can no longer dedupe them.

In addition, you're not considering things like VDR or any of the other technologies that rely on changed block tracking - defragmenting a VM will triple or quadruple your backup time, as it now has to scan the entire virtual machine ~again~, and make tons of changes to the CBT list. Not a good idea, especially given that it may end up wasting significant amounts of space if it decides that all those blocks are changed enough to require another full image backup instead of only a incremental.

Defragmentation may be necessary in certain situations - I'll grant that. There are times where cpu resources will be at a premium and it may be worth it, or if you have a single system dedicated to a storage device, or if you have physical mode RDMs (100% totally worth it for those!). I just don't believe that there will be any significant performance increases on a shared storage device with multiple operating VMs on VMFS over a long period of time, given the technologies in play, and that there is a lot more to consider there. Now, I'm sure that diskeeper, especially a VMware aware version, would be a far better choice than windows defrag (much as it's a far better choice in the hosted environment), I just question the need for it most of the time in the ESX side of things.

If there's a performance improvement that is significant - show us. Get some good performance dumps where we have multiple vms on a volume with intermediate load and take the SAN cache out of the picture (flush it after the "un-defragmented" test) so we know it's not just stuff coming from the san cache. Make it a decent san too - something in the clariion range or EVA range, or a mid-range netapp on iSCSI.

Now, one other thing to consider - I've done all this comparison on VMFS. I haven't thought of NFS.

ESX, Diskeeper, and VM backups

Supreme [H]ardness

Extremely [H]

Supreme [H]ardness

Extremely [H]

Extremely [H]

Supreme [H]ardness

Extremely [H]

Supreme [H]ardness

Extremely [H]

Supreme [H]ardness

Extremely [H]

Supreme [H]ardness

Extremely [H]

Extremely [H]

Extremely [H]

Extremely [H]

Supreme [H]ardness

Extremely [H]

Supreme [H]ardness

Extremely [H]

Extremely [H]

[H]ard|Gawd

Extremely [H]

Supreme [H]ardness

Extremely [H]

n00b

Extremely [H]

n00b

Extremely [H]

n00b

Supreme [H]ardness

n00b

n00b

Extremely [H]

Extremely [H]

n00b

Extremely [H]