Storage strategies

rand__

n00b
Joined
May 13, 2012
Messages
29
Hey guys,

I've always wondered - are there smarter ways to store/share your data then just throw them on whatever storage unit you have regardless of age, access frequency or priority?

While i have played around a lot with different physical layout/filesystems and stuff (Raid/ZFS etc pp) I still haven't found a way to store data with a little more brain behind it...

I mean o/c i can create several arrays on a Raid with different drives (slower (large) /faster (small)) and distribute my data according to my estimation; same with ZFS, together with Caching (Zil/Log) to ramp up speed - but thats all manual.
What i am looking for would be a smart storage (and i think there are these available in the $$ market but i am looking for soho/home market here) which would dynamically move data between different array types based on things like
-Last access time
-Access count
-Custom priority

And no idea what not...

So it would move the pics/movies and stuff i haven't looked at for ages onto an old/slow storage unit and keep the daily/recently/important stuff on a fast track...

Is there a system/software like that?

Thx,
Thomas
 
I have never heard of something like this on a large scale. Sounds like an uber extension of having something in the disk cache vs. the rest of the volume. But I know nothing of an intelligent array of multiple RAID disks that prioritizes frequently-accessed data on, say, a RAID0 disk vs. other stuff on a different array. Sounds a bit complicated.

Easiest thing I can think of is you have EVERYTHING on a high-speed RAID0 disk, and then all of THAT data is copied (via a controller or backup program) to a redundant array of whatever RAID level is appropriate. But of course that is an inefficient use of disks.
 
I think i read something like that from one of the big storage vendors, EMC i believe... but its been a while.

And I guess it was moving data from 15k to 7.200 RPM nearline drives... o/c nowadays one could use SSDs for the fast part...

And it sure would be possible to have all on a fast array but thats actually not needed...
I guess everyone has tons of data they don't need everyday or even every week...

From a technical point of view it shouldnt even be soo difficult... just a lot of metadata on the storage items after all (except the arrays)
 
Storage tiering. Getting to be more common in enterprise arrays. Yes, EMC does it. Will move data up/down as needed from SSD to SAS to NL-SAS (SATA) automatically. I'm not aware of anything for SMB/Home use that does it, though.
 
I mean o/c i can create several arrays on a Raid with different drives (slower (large) /faster (small)) and distribute my data according to my estimation; same with ZFS, together with Caching (Zil/Log) to ramp up speed - but thats all manual.
What i am looking for would be a smart storage (and i think there are these available in the $$ market but i am looking for soho/home market here) which would dynamically move data between different array types based on things like
-Last access time
-Access count
-Custom priority
I dont understand what you mean "ZFS caching is manual"? ZFS does automatically move your hot data upwards in the hierarchy, into ARC (ramdisk), or L2ARC (ssd) or down to disks, depending on usage. No need to specify which level should keep the data, all is done automatically:
http://dtrace.org/blogs/brendan/2008/07/22/zfs-l2arc/
"...What’s important is that ZFS can make intelligent use of fast storage technology, in different roles to maximize their benefit..."

But this was maybe not what you asked for? I dont understand your question?
 
Yes i know that ZFS (Sun/Oracle) claims to be hybrid layered storage but I still see it (L2ARC/ZIL) more as a cache than a storage layer.

Its a matter of interpretation, sure;)
So what i was looking for was having different layer of (permanent physical) storage available; in particular using the different speed harddrives available; ie 15k SAS drives as 2nd layer after SSD (or third after memory) and than 7200 SAS/SATA as (more) permanent storage (and maybe even Tape/Remote Transfer) as a final bottom layer...

ZFS with another slow layer would be great - but thats not possible automated now - thats why i meant i'd have to do that manually on 2 or more different volumes,
 
Ah, ok now I understand. You mean that some data should be stored on the SSDs, and other data on the disks, and maybe some data on tape - and the system should move the data around automatically.

In that case, ZFS is not the solution because it stores all data on disk upon boot of the server, and after using the server for a few hours, ZFS starts to copy data to different levels: in RAM, in SSD or let data be on disk. But the data is not moved around, it is only copied. The original data is always still left on the disks, so if the SSDs crashes you still have your data on the raided disks.

So why can you not use a cached solution like ZFS where data is copied to different levels? If you instead prefer to move the data instead of copying data, then you need to have raided levels: raided SSDs, raided disks, etc. Because if one level fails you loose all data on that level. With ZFS it does not really matter if any level fails, because it is the raided disks that holds the master data. And raided disks give quite good protection.
 
Well at some point its a cost issue; i wont be able to save tons of data on SSDs so I'd like to use the next lower level (high speed disk). But those will be full with secondary data as well at a point in time, so I'd want a tertiary level (slow disks) or even a fourth level (tape) to store data.

And the second issue that ZFS is *only* looking at access times (in automation), i.e. what has been used often or recent is on fast devices; older stuff is on the main volume.
Now what I'd like was a way to prioritize specific data - i know i can move that to a high speed volume and leave the rest on slower media but thats manual interaction.
O/C i need to have some manual steps there, the system cant know which data is important to me or not; but it's cumbersome to create specific filesystem layouts for that on different volumes.

For example I have vms which i use quite often so these need to be on fast media;
then i have some which are quite large but i dont use them often. I would like these to be on fast media too because i dont want to wait for it (size) on a slow medium.

The larger the (speed) differences between the storage levels the worse it becomes;)
 
Now what I'd like was a way to prioritize specific data - i know i can move that to a high speed volume and leave the rest on slower media but thats manual interaction.

No system is going to be able to read your mind.

You can either set up things in a manual fashion or you can tolerate what the system does.

Most people evolve to tolerating what the system does.
 
I did not read the thread, just the OP.

But Windows Server 2012 R2 Storage spaces now comes with smart storage tiering.

It will automatically tier your storage for you based on usage patters, you can also manually tier too.
 
@GoergeHR: I know that:)
The problem is there is no tool yet enabling me to override the system defined/derived priorities - or a system with the concept of customer defined priorities (i believe)

I mean i can set up a cron job to copy the same 10G data every 4hrs to mark it highly active and have it in ZFS cache but thats not really the way i'd imagine it;)
 
@GoergeHR: I know that:)
The problem is there is no tool yet enabling me to override the system defined/derived priorities - or a system with the concept of customer defined priorities (i believe)

I mean i can set up a cron job to copy the same 10G data every 4hrs to mark it highly active and have it in ZFS cache but thats not really the way i'd imagine it;)

Storage tiering is a very rare feature of high-end ultra high price storage systems and I expect it a end of live feature as TB-SSDs are quite cheap now.

If you consider ZFS, you can add enough RAM to deliver > 95% of all reads from RAM (Arc Cache). Writes of the last 5s are cached as well (In RAM or on sync writes with the help of a fast ZIL device). If you cannot afford or use enough RAM for caching, you can add a 2nd Level Arc Cache with SSDs. Newer systems even do LZ4 compress in L2ARC to hold more data.

Tiering nowadays is not SSD tiering but RAM tiering.
 
No system is going to be able to read your mind.

You can either set up things in a manual fashion or you can tolerate what the system does.

Most people evolve to tolerating what the system does.

To be fair that isn't particularly true. You can pin stuff manually to the SSD tier and then let the system decide what else resides there - see that MS link I have for more details.

I've no idea if that is possible on enterprise systems but putting in an EMC VMAX with FAST VP in the near future so I guess I will know soon enough.
 
Storage tiering is a very rare feature of high-end ultra high price storage systems and I expect it a end of live feature as TB-SSDs are quite cheap now.

If you consider ZFS, you can add enough RAM to deliver > 95% of all reads from RAM (Arc Cache). Writes of the last 5s are cached as well (In RAM or on sync writes with the help of a fast ZIL device). If you cannot afford or use enough RAM for caching, you can add a 2nd Level Arc Cache with SSDs. Newer systems even do LZ4 compress in L2ARC to hold more data.

Tiering nowadays is not SSD tiering but RAM tiering.

That's caching, not tiering. Caching makes a copy, tiering moves the file. Both have their advantages and disadvantages.

I'm not sure I agree with the last statement either. All the high end vendors seem to punt SSD tiering at the moment. I could be wrong, pretty new to this SAN game :)
 
That's caching, not tiering. Caching makes a copy, tiering moves the file. Both have their advantages and disadvantages.

I'm not sure I agree with the last statement either. All the high end vendors seem to punt SSD tiering at the moment. I could be wrong, pretty new to this SAN game :)

You do not use tiering because you want tiering, you use because you want SSD performance for active data while using slow and cheap disk storage for main data.
In many use cases, you can achieve the same result with cheap ZFS caching instead of expensive tiering on some highend boxes.
 
While this strategy of tiering may work in large scale deployments, where say archival data is stored in uber-high density 45+ drive 5400rpm "backblazes", and it is moved up to a box running 7200rpm+ drives on a less saturated backplane.

But for small scale home use its totally pointless. (which I interpret is the point of this thread, home use)

A 24 drive box of 4TB eco drives that will spit out 1GB/s all day long is easy to do. That's like 75TB after parity and FS overhead.

Sure if you are trying to aggregate PBs of info from idle disks to nearline it makes sense, but who really needs this for home use?
 
You do not use tiering because you want tiering, you use because you want SSD performance for active data while using slow and cheap disk storage for main data.
In many use cases, you can achieve the same result with cheap ZFS caching instead of expensive tiering on some highend boxes.

That is the endgame but I assume since vendors are using tiering (in some cases in addition to caching) there is some benefit e.g. you can manipulate the persistent data in tiering easier than in caching.

I'm sure that is the case but as per my example above Windows 2012 R2 now offers tiering and that isn't expensive.
 
Sure if you are trying to aggregate PBs of info from idle disks to nearline it makes sense, but who really needs this for home use?

Its nice to make things go faster :) Let's face it, if we all bought what we just needed then probably wouldn't be on this site!
 
While this strategy of tiering may work in large scale deployments, where say archival data is stored in uber-high density 45+ drive 5400rpm "backblazes", and it is moved up to a box running 7200rpm+ drives on a less saturated backplane.

But for small scale home use its totally pointless. (which I interpret is the point of this thread, home use)

A 24 drive box of 4TB eco drives that will spit out 1GB/s all day long is easy to do. That's like 75TB after parity and FS overhead.

Sure if you are trying to aggregate PBs of info from idle disks to nearline it makes sense, but who really needs this for home use?

I don't think its useless for home-use. While ZFS can cache as much data in SSD's as you can afford, it useless when you want to write stuff to the fileserver. In my setup I have ZFS with several SSD's and enough RAM to cache everything I weekly use. But when I want to write a 1KB file, the whole array spins up. When downloading 20GB at 50KB/s the whole array has to be alive.
Some people use a single disk as work/download-share and move data to a bigger array when it's not frequently used. That's not very customerfriendly (especially with not so tech-minded wifes or kids) and you have to manually keep track of that folder.
With tiering you can work on your main share without the whole thing powered up, as long as the first tier is accesable and has enough free space. Apple has applied this technique to it's FusionIO buzzword and it's very effective.
 
When we talk about "intelligent" storage at home we should also avoid the single point of failure. The best NAS/SAN is useless then. Thats why I take a close look to Vmware VSA or Storage Spaces. But I don't figure the last one out, especially with local storage amd HA. Perhaps someone can help me with that?
 
When looking at data tiering (as opposed to caching), what I have found so far is that a decent number of people want it, and literally almost every one of them that bought something that actually has it (and there are a number of higher-end arrays with this functionality) ended up disliking it/regretting it.

It's astronomically difficult to actually tier 'intelligently', where 'intelligently' equals 'how the user(s) want, at least the majority of the time'. Most tiering solutions are lucky if they can reach 'how SOME of the user(s) want it, LESS than half the time'.

Stick to things that can cache hot/warm data on faster mediums. ZFS is especially good here because everything goes into the 'hot' by default then moves to the warm, as opposed to some others where this caching of data in hot/warm is a background process that are sometimes too late to cache the data to actually really help the workload.

As for 'the best NAS/SAN' being useless due to SPOF, bear in mind 'SAN' actually stands for 'Storage Area Network', and in general refers to devices or software that may (indeed, should) have failover & HA capabilities. Admittedly, you'll be hard-pressed to get seamless, unseen failover in a home environment. Sometimes you have to accept that with a sub-$1000 budget, you can't have a Ferrari. :)
 
Back
Top