BridgeSTOR Dedupe card

feffrey

Gawd
Joined
Oct 26, 2010
Messages
585
Has any ever used a BridgeSTOR Dedupe Card?
We are looking at DPM for backups at work, however it is a storage hog.
Are there other hardware or software deduplication systems that anyone would recommend?
 
Has any ever used a BridgeSTOR Dedupe Card?
We are looking at DPM for backups at work, however it is a storage hog.
Are there other hardware or software deduplication systems that anyone would recommend?
ZFS has deduplication. So you can try the free OpenIndiana OS. But, dedup has several caveats; you need lot of RAM - in the order of 1GB RAM for every TB you want to dedup. Destroying a deduped snapshot can take days. Thus it would be easier to just buy more disks, it is cheap and safe.
 
I really don't get where this magical 1gig per 1tb for dedup comes from.

dedup needs 320bytes per block, at 128k block size, the largest blocksize that is available currently, you would require 2.5gigs of ram to hold it.

Then the dedup table can only use upto 1/4 of arc space, so that means you need 10gigs of ram per 1tb of disk space, if you only use large streaming files.

And I believe the requester knows about this, and that is why he is asking about a totally different dedup solution.
 
That's per TB of unique data I think - so it would only take 2.5gigs if you achieved no deduplication.
 
So, 1tb of disk, means I would waste over half the disk, as only 40% of the disk would contain unique data, to maintain that 1gig limit? that doesn't make sense.

Or inodes are extreemly expensive.
 
I talked to Bridgestore today found some interesting things out about it. They want 1.5 GB of RAM per TB of physical disc space just for the hash table.
With 35TB of hard drives in our JBOD attached to the backup server, that ends up being over 50GB of ram, and that is a tad excessive for just for one system. At that point SSD Hash table would be a cheaper option.
ZFS sounds like it would be the same thing, it needs boat loads of RAM or go SSD.
 
So, 1tb of disk, means I would waste over half the disk, as only 40% of the disk would contain unique data, to maintain that 1gig limit? that doesn't make sense.

Or inodes are extreemly expensive.

I expect the "1Gig/1TB data" figure is someone ballparking a ~1:2.5 dedupe ratio - so the "1TB" is only taking up 400GB of space - the "1TB of data" is "1TB of undeduplicated data, or 400GB of unique data".

Not saying it's right or correct - but at least it get's people considering the issue and in the right order of magnitude :)

@feffrey
IMO dedup only makes sense if you are looking at things where you might get 30x+ dedup ratios. If you have a VM farm where 95% of the base image is the same - but have 50+ images. Backups possibly as well, though most backup systems now have a post-process dedup (non-live) step so you wouldn't gain much of anything. When talking 2-3-4x dedup ratios you are generally better off buying more disk space.
 
I am using zfs snapshots to save space. I have a VM, and create a snapshot. Every new user gets a clone, where only the changes are saved. It is kind of a dedup. I have one master VM, and clone it to every user, so every user gets a separate filesystem.
 
HammerFS has a very efficient dedup implementation. You dedup after you've written the data so the performance is still exellent!
 
@feffrey
IMO dedup only makes sense if you are looking at things where you might get 30x+ dedup ratios. If you have a VM farm where 95% of the base image is the same - but have 50+ images. Backups possibly as well, though most backup systems now have a post-process dedup (non-live) step so you wouldn't gain much of anything. When talking 2-3-4x dedup ratios you are generally better off buying more disk space.

Replacing the drives is sounding like a better alternative at this point. With Arcserve we have 60 days of full backups on everything. About the only good thing with Arcserve is the built in dedup works.
If we do with DPM it is looking like I am either going to need to replace drives or go with a shorter retention period.
In an enterprise environment what is a good / average time frame for backups? I have had to restore from a backup that was 30 days old, nothing at 60 yet. (You would think a user would notice that half of their emails were gone within a day or so, not a just realize it for the first time a month later)
 
HammerFS has a very efficient dedup implementation. You dedup after you've written the data so the performance is still exellent!
Interesting. How much RAM does dedup require? Does it detect corrupted data?

(The issue with ZFS is not performance, the performance is good. RAM requirements are a bit hefty, though).
 
I've used Starwind dedupe. If you give it RAM and CPU, it performs very well. I'm not sure if the StarWind free edition has been upgraded from 5.7 to 5.8 like the paid for version, but in 5.8 dedupe performance is better than 5.7 and also it's not classed as experimental any more. The only reason why I haven't put it into production is that it's not yet supported for HA targets (Starwind supports synchronous replication between two arrays, which apart from availability is useful if you need to take it down for windows update or physical maintenance).

Starwind does the deduplication as data is written. It tries to keep the hashtable in RAM, or at least partially cached, and you can also use system RAM as a WB cache. It's also possible to keep the hashtable on a seperate location (e.g. SSD) to the actual data.

Once they have HA & dedupe working together (actually I lashed up my own DIY version of this, with each HA node storing it's data on another deduped node, but it meant too much orchestration hassle when shutting nodes down) then I will be seriously looking at running it with LSI CacheCade so that I can have hundreds of similar VMs benefit from CacheCade.

Obviously the effectiveness of any dedupe is going to be driven by how much duplication there actually isin your data. At least Starwind gives you a free way to test that out.

The main issue, apart from lack of HA, is that if you want to shrink the data (e.g. you've just deleted loads of backups) you need to do this with the target offline, and you need to do it by copying the data from one dedupe target to another. So you would need to have another server with enough capacity, or at least some spare disk connected to the same starwind server.
 
using it with veeam backup v6 now with 4kb blocks works very well

general rule of thumb you need to turn OFF both compression and deduplication at client to use dedupe on target

Starwind has a free version of their iSCSI SAN that does Dedup. Anyone use it before?
http://www.starwindsoftware.com/starwind-free
Tried to setup DPM with just 2 of our Hyper-V servers and it wanted 90% of the entire backup JBOD just for 5 days of backups. :eek:
 
netapp does the same but it's ok for so called pulsating load

ok for backup server but not usable for primary storage for virtualization

microsoft has off-line dedupe in windows 8 but they cheat on metadata size )))

HammerFS has a very efficient dedup implementation. You dedup after you've written the data so the performance is still exellent!
 
Back
Top