Windows Server 2012 Deduplication Ratio

jsmithv

n00b
Joined
Feb 28, 2012
Messages
6
I have been playing with the recent release preview of windows server 2012 and the deduplication ratios are not making sense to me.

I have enabled deduplication on a volume. I copy over the same iso 3 times (~683MB). I force a deduplication job to run from powershell (Start-DedupJob i: -type Optimization) and then the results are confusing. Hoping someone can make sense of the math here, just so i understand the process a bit better.

If I copy the same file 3 times, i would expect the ratio of saving to be close to 66% and the actual used space on the volume to be close to the size of a single copy - 683MB. Instead, I am seeing 41% ratio, and 1.88GB used space? However, the dedupe savings of 1.31GB seems right.

Has anyone played with this feature as yet than can explain why the discrepancy in the numbers? Note that the volume is empty except for the 3 copies of the iso:

Capture.PNG
 
This thought may sound amateurish, but is there a recycling bin type folder somewhere that stores a cache of the de-duped data until it is cleared in case the user wants to recover it?
 
Might be the case, however, the built in Recycle Bin is currently empty. Might just be too early since there are probably not too many people playing with these features yet. I can't seem to find anything other than blogs introducing the features and how they work. I did play with some zfs deduping and the ratios seemed to make much more sense so i was hoping out of box server 2012 would too. Thanks for the reply,

update: When looking at the properties of each file on the volume, the 'size' is 667MB, but the 'size on disk' is reported to be 4.00KB for all three. Seems to indicate that the 'real' blocks of the file are stored somewhere and these files I see are just the metadata...?
 
I couldn't find the option anywhere in there. From reading online it sounds like shadow copies has been removed. Have you seen otherwise?
 
Probably due to some overhead of NTFS + Dedup that doesn't show up when looking at file sizes but does show up when you check available disk space. For instance I just provisioned a 240GB partition on a WS2012 box for VM storage and even though it has never had a single file placed on it, it shows 131.8MB used disk space, a 4TB empty volume shows 324.3MB used disk space. Not a large amount, but necessary data such as MFT, and obviously scales well as it less than trippled for an 16 fold increase in drive space, but when you're only playing around with 600MB of data, an extra 100MB of FS overhead is going to skew your results a bit. Dedup probably adds to this overhead since the extra 2 files can't be COMPLETELY free, they still need pointers and such in the FS to represent them, and dedup may have extra tables and such hidden on the drive that it needs to do its job.

Throw say 7 more copies on the drive and dedup again, see how it does, should show how well the space saving scale.
 
I just started playing with this feature today. here is a good article about it: http://blogs.technet.com/b/filecab/...ata-deduplication-in-windows-server-2012.aspx


I used the DDPEval.exe tool to run against a few of our 500GB disks in production. The numbers look promising:

SACFPS200 sample disk (Group Shares): 68% savings in disk space
Evaluated folder: \\sacfps200\i$
Evaluated folder size: 488.00 GB
Files in evaluated folder: 790192

Processed files: 254440
Processed files size: 483.40 GB
Optimized files size: 152.33 GB
Space savings: 329.07 GB
Space savings percent: 68


CERFPS100 sample disk(Homedirs): 51% disk savings
Evaluated folder: \\sacfps300\i$
Evaluated folder size: 395.03 GB
Files in evaluated folder: 902857

Processed files: 444542
Processed files size: 391.16 GB
Optimized files size: 190.25 GB
Space savings: 200.91 GB
Space savings percent: 51
 
Back
Top