Windows 8 Preview / I am playing with Storage Spaces

Silenus

Limp Gawd
Joined
Jan 6, 2011
Messages
177
**** EDIT: To see an update with native install and speed testing of storage spaces with local disks, see my post #21 ****

I am playing around this morning with the new Windows 8 Consumer preview and the new Storages Spaces setup. Installed in VirtualBox, and gave it some iSCSI disks from my OI NAS box. Heres a quick run through.



Disk management showing drives. I gave it 2 x 20GB dirves and 2 x 40GB just for the sake of testing with different size drives.







First creating a storage POOL. It automatically found and selected the unallocated iSCSI volumes.






Now promted to create a storage SPACE within the pool. I choose here a two-way mirror, and give it a logical size fo 60GB. After hitting the create button it formats the drives as needed.







Now I create a second storage space on the pool, this time I select a triple mirror and give it a 2TB logical size. Just for poops and giggles.





Storage spaces complete. Screen showing overview of pool and usage, two storage spaces and the physical disks.





Showing new volumes in Computer





New file copy demo and disk tools ribbon


 
Last edited:
Thanks for posting the info. I plan to play around with Windows 8 a bit after work.
 
Will be interesting to see how this all performs with iscsi when windows server 8 comes out...
 
Aye. I am also very curious about the upcoming ReFS. I'd like to see that come to Windows 8 in addition to Server 8, but I somehow doubt that will happen. Or perhaps they will add or allow conversion to it later for Windows 8.
 
Thanks for the post! Very slick, I can't wait to get home and try it out :)
 
Ok here are a few more...showing the new task manager:


Process tab




Performance tab showing logical cores and kernel times:




Performance tab showing disk activity and details:




New App History tab:




Users tab:




Details Tab




Services Tab

 
This is dreamy.

I am imagining you can serve up single disks over ISCSI, then make one big pool out of all of them with the redundancy features.
 
Thanks for the info. I currently use WHS and have been very interested in upgrading to Windows 8 and drive spaces.
 
When you use storage space, does it delete all the date on the HDD previously? Also, Did you only try NFS? Because Ideally I would wanna use 4TB HDD.
 
Hey thanks for taking the time to dive into this. I've been working all day and have had no time to setup anything. I gotta say this actually looks pretty darn impressive for windows in regards to storage.
 
When you use storage space, does it delete all the date on the HDD previously?

Unfortunately yes. http://blogs.msdn.com/cfs-file.ashx....12_2D00_Adding_2D00_drives_5F00_51BA6F7B.png

It would be nice if it were smart enough to not
"Here are all my drives and here are some empty ones. Please spend the next few days working out the parities between all these. Thanks, I'll let you get to work!"

I don't have enough empty drives to buffer with to convert my existing data to a storage space! Buying and returning a bunch of externals would be lousy.
 
your post made me try it on my new hp microserver that just came in today. I have windows 8 installed, put in 8GB ram and am moving the OS drive to the CD bay then i'll pop in my 4 2tb drives and see how the parity part works.
 
Unfortunately yes. http://blogs.msdn.com/cfs-file.ashx....12_2D00_Adding_2D00_drives_5F00_51BA6F7B.png

It would be nice if it were smart enough to not
"Here are all my drives and here are some empty ones. Please spend the next few days working out the parities between all these. Thanks, I'll let you get to work!"

I don't have enough empty drives to buffer with to convert my existing data to a storage space! Buying and returning a bunch of externals would be lousy.

I know I'm asking a loaded quesiton here...but does your issue mean you don't have a backup of your data?
 
This is really looking like it might be viable to build a win 8 machine to replace a WHS v1.
 
I know I'm asking a loaded quesiton here...but does your issue mean you don't have a backup of your data?
Yep :p
If a drive fails I lose whatever is on it. I upgrade one's size about two times a year. I've been fortunate enough with this cycling procedure to not have any failures. I keep ~8TB of data with this method and it is a lot of manual work.

It seems like if I start with a two 3TB drives and start emptying the others into the space I can add the newly emptied drives to the pool as I go. I just need one more 3TB to try this.

I got excited about this back when I read this article http://blogs.msdn.com/b/b8/archive/...rage-for-scale-resiliency-and-efficiency.aspx
I'm rereading it again to understand more what exactly will be going on with my data. It sounds like if I make two parity spaces and organize my data through that I should have a two drive failure protection.
 
So far I have been disappointed with the write performance on the parity setup with 4 drives. I was seeing ~35MB/s vs 90-100MB/s on the drive setup as a single. That is a pretty steep penalty although similar to what I see with my WD Sentinel in raid-5. I guess I am going to stick to Flexraid snapshot raid since the amount of files that change is very small. My data is mostly movies/music/pictures.

edit:
Did another few tests of 10-15GB of mixed files, transfers start out at 100+MB/s then quickly fell to 40MB/s then tapered off to 20MB/s
 
Last edited:
The big problem with ReFS as I see it, is that ReFS only tries to protect against metadata corruption. The data itself might be corrupted, even after a "chksdsk/fsck" (or whatever it is called). Other than that, I think that ReFS is a step forward in the right direction.
 
ReFS does have file-level checksumming and COW where it never directly overwrites filedata, but you need to set an attribute on the pool to get it. I'm not sure if the GUI exposes this.

The only problem is the parity spaces (aka raid5) doesn't have the fancy intelligent autocorrection based on checksums that mirrors (2 or 3 way mirrors) has.
 
So far I have been disappointed with the write performance on the parity setup with 4 drives.
That's pretty bad. What about read performance? What interface are the drives? Once the system has had time to settle everything in it seems like it would match up to the hardware (assuming their performance claims were reasonably given). I don't understand why software raid performance is poor when it doesn't have to be. What is hardware doing that quad core processors can't? MS is saying in general parity is better for storage and not often used space for frequent read/writes.

Their inclusion of USB 2.0 drives in this should have a huge disclaimer: Will be slow as a snail. Software parity/mirroring on all USB drives in one computer will be incredibly slow. It almost shouldn't have been an option just to save people from themselves.
 
Aye. I am also very curious about the upcoming ReFS. I'd like to see that come to Windows 8 in addition to Server 8, but I somehow doubt that will happen. Or perhaps they will add or allow conversion to it later for Windows 8.

It's supposed to be coming to Win 8 in SP1.
 
That's pretty bad. What about read performance? What interface are the drives? Once the system has had time to settle everything in it seems like it would match up to the hardware (assuming their performance claims were reasonably given). I don't understand why software raid performance is poor when it doesn't have to be. What is hardware doing that quad core processors can't? MS is saying in general parity is better for storage and not often used space for frequent read/writes.

Their inclusion of USB 2.0 drives in this should have a huge disclaimer: Will be slow as a snail. Software parity/mirroring on all USB drives in one computer will be incredibly slow. It almost shouldn't have been an option just to save people from themselves.

Drives are WD SATA Green 2TB drives (2 ERAX and 2 ERAS). It is a HP N40L so it is a light weight AMD processor but I didn't see it even spike over 20%. Even copying from the 250GB 7200.12 boot drive it was slow (taking the network out of the equation). Disks set up on their own I got 90+MB/s. I might grab 4 15K SAS 300GB drives from work and try with those too.
 
Last edited:
I thought I would post my findings here also.

Parity test

The Pool
5QxFk.png


Write speed to pool
TPDLQ.png


Write usage
B9BqS.png


Read speed from pool
rlpZl.png


The usage was about the same, I tried both local disk and from gigabit network location (with raid 5 array that will max out gigabit)

I think the write speed is too low to be usable for me.
 
Been interested in this since I saw the article on the MSDN blog, but these write performances make me weep.
 
ReFS does have file-level checksumming and COW where it never directly overwrites filedata,
In response to my question about ReFS only checksumming metadata, but not checksumming the data itself. Thus, the data might still be corrupted after doing "chkdsk/fsck", you posted this link:
http://blogs.msdn.com/b/b8/archive/...-generation-file-system-for-windows-refs.aspx
"Metadata integrity with checksums
Integrity streams providing optional user data integrity"

That is the link I refered to, when I said Ive read it. It seems that I am correct. Only the metadata is checksummed. The data itself is not checksummed. Thus, your data might be corrupt.

The question is, why is only the metadata checksummed? Why is the data not checksummed? I think that is because to correctly checksumming everything in a water tight way, is very hard and difficult to do. MS does not have the know-how to do that. Therefore they only enable checksums on metadata. Start with small steps.
 
In response to my question about ReFS only checksumming metadata, but not checksumming the data itself. Thus, the data might still be corrupted after doing "chkdsk/fsck", you posted this link:
http://blogs.msdn.com/b/b8/archive/...-generation-file-system-for-windows-refs.aspx
"Metadata integrity with checksums
Integrity streams providing optional user data integrity"

That is the link I refered to, when I said Ive read it. It seems that I am correct. Only the metadata is checksummed. The data itself is not checksummed. Thus, your data might be corrupt.

The question is, why is only the metadata checksummed? Why is the data not checksummed? I think that is because to correctly checksumming everything in a water tight way, is very hard and difficult to do. MS does not have the know-how to do that. Therefore they only enable checksums on metadata. Start with small steps.

Same article:

The key features of ReFS are as follows (note that some of these features are provided in conjunction with Storage Spaces).

  • Metadata integrity with checksums
  • Integrity streams providing optional user data integrity

The "optional" Integrity streams are the important bit. Enabled by default on mirror volumes, and can be enable while formatting for parity volumes. While not very explicit, the article claims, "Integrity streams protect file content against all forms of data corruption."
 
Ok ladies and gents. Update to my original post with some speed testing of my own. This setup is now on a native install (not VM) with 3 identical disks for the storage pool.

System is:
- Stock Core 2 Quad Q6600 on 975X chipset
- Intel ICH7R southbridge for the drives.
- Data disks are 3 identical 500GB Caviar Black (WD5002AALX) for the storage pool.


Control test. This is plain single data drive formatted (no storage spaces used)





Storage Space, NO RESILIENCY (RAID 0 essentially)





Storage Space, 2-WAY Mirror




Storage Space, 3-WAY Mirror




Storage Space, PARITY. Ok this is one you've been waiting for. Unfortunately there is the same heavy write penalty others seem to be finding.




During the write test showing disk activity and CPU usage (trivial)




Same PARITY setup being written to from a network location on a file copy. Basically same results but higher CPU usage due to network activity.
 
Thanks for the testing. Hopefully MS can work on the write speeds because as is, its just too slow.
 
Interesting, I read the opposite. Do you have links that confirm this?
Yup, from the MSDN blog. If you look for "integrity streams", you'll find a lot of info.

In addition, we have added an option where the contents of a file are check-summed as well. When this option, known as “integrity streams,” is enabled, ReFS always writes the file changes to a location different from the original one. This allocate-on-write technique ensures that pre-existing data is not lost due to the new write. The checksum update is done atomically with the data write, so that if power is lost during the write, we always have a consistently verifiable version of the file available whereby corruptions can be detected authoritatively.

At the most basic level, integrity is an attribute of a file (FILE_ATTRIBUTE_INTEGRITY_STREAM). It is also an attribute of a directory. When present in a directory, it is inherited by all files and directories created inside the directory. For convenience, you can use the “format” command to specify this for the root directory of a volume at format time. Setting it on the root ensures that it propagates by default to every file and directory on the volume. For example:
Code:
D:\>format /fs:refs /q /i:enable <volume>
D:\>format /fs:refs /q /i:disable <volume>
By default, when the /i switch is not specified, the behavior that the system chooses depends on whether the volume resides on a mirrored space. On a mirrored space, integrity is enabled because we expect the benefits to significantly outweigh the costs. Applications can always override this programmatically for individual files.

Thanks for the testing. Hopefully MS can work on the write speeds because as is, its just too slow.
It may be the fact that it is journelling the changes before writing them to the parity spaces data store to guard against the RAID5 write hole. You can apparently define an arbitary drive for where this journal should live on, so I suspect dumping it on an SSD should dramatically improve preformance.

linky on the info:
Parity spaces include a journal to ensure data integrity regardless of write size and in the presence of unexpected power loss. Stay tuned for more information on the work we have done with Windows file systems - this work builds on Storage Spaces. Parity spaces do use some memory caching to improve performance. Storage Spaces do not use a SSD as a write buffer although SSDs can be used to back the journal thereby helping performance for parity spaces
 
Last edited:
yea those write speeds are pretty terrible, hopefully they can fix it in a patch, and hopefully before RTM
 
Bleh, such bad write performance for parity mode. Not to mention the data is striped, so all data is lost on multiple drive failure. I would switch to FlexRAID for storage pooling, but it's turning into a paid product and it doesn't seem stable or supported well enough to warrant paying for it.
 
Bleh, such bad write performance for parity mode. Not to mention the data is striped, so all data is lost on multiple drive failure. I would switch to FlexRAID for storage pooling, but it's turning into a paid product and it doesn't seem stable or supported well enough to warrant paying for it.

I'm not 100% on this, but I believe you can set as many parity drives as you want
 
I'm not 100% on this, but I believe you can set as many parity drives as you want

From a reply to a commenter in this article: http://blogs.msdn.com/b/b8/archive/2012/01/05/virtualizing-storage-for-scale-resiliency-and-efficiency.aspx


with Windows 8, parity spaces tolerate failure of any single physical disk backing the specific parity space

So it sounds like 1 parity drive per space. You could create multiple spaces to get multiple parity drives, but each space would still only tolerate 1 drive failure before complete data loss.
 
Unfortunately there is not an option for more than one parity drive. The developer preview for the server version also does not have this.
 
There is no one "parity" drive. The chunk of data protected by a parity block can span 8 members(which is then 7 slices of data + 1 slice of parity), so if you have enough drives it is then posible to have different sets of data+parity being on distinctly different drives.

In that case you can survive more than 1 drive dieing, and if you are using ReFS only the damaged files go away(which it can log) rather than the entire volume. Obviously having two sets of parity protecting the same block of data is better.
 
Yup, from the MSDN blog. If you look for "integrity streams", you'll find a lot of info.

It may be the fact that it is journelling the changes before writing them to the parity spaces data store to guard against the RAID5 write hole. You can apparently define an arbitary drive for where this journal should live on, so I suspect dumping it on an SSD should dramatically improve preformance.

linky on the info:
This is really weird. MS talks a lot about data integrity, but only Metadata is checksummed. By default. Why would MS talk about data integrity, and not provide it? There is something fishy here.

CERN did a study, and concluded "checksumming is not necessarily enough" to provide data integrity. I mean, on all hard drives, there are lot of checksums - but still the hard drives might have corrupted data. There are checksums in many places, but still the data might be corrupt. So why would MS succeed giving data integrity with ReFS? Adding checksums does not suffice, according to CERN.

On the other hand, there is research showing that ZFS gives data integrity. Until there are research on ReFS, I will not trust ReFS. Even worse, the average ReFS user will not tinker with ReFS and will not turn on data streams on the data itself, so the data might still be corrupted. What purpose does it have to only have checksums on the metadata? ReFS gives only some basic protection.
 
If you pull a drive out of the storage pool can you see the contents on it as a single drive in a different system?
 
If you pull a drive out of the storage pool can you see the contents on it as a single drive in a different system?
No. It's a new thin provisioned volumn format which natively does striping. The difference is mirrored spaces have at least 2 copies of a strip somewhere, and parity spaces have a random parity chunk somewhere.
 
Back
Top