Faster, Higher, Stronger : powerful storage

ST3F

Limp Gawd
Joined
Oct 19, 2011
Messages
181
Hi there.

I've been reading since about 14 months, most of posts about data storage system.

The investment is quite big :
  • need about 32 TB @ very high-speed.
  • very secure
  • 24/7 during 5 years
  • Expandable
  • sharing files (SMB and/or AFP)
  • send an email when a file on a specific folder is ready to download
  • 1x 2 port Fiber Dual 10 GbE Card or 2x 1 port Dual Fiber 10 GbE
  • 1 workstation in Dual 10 GbE fiber SR (Apple MacPro) connected directly to the storage, sequential access (write 2 files ~ 500 GB each during 6 hours @ 120 Mb/s ... 240 Mb/s from this workstation to storage)
  • 8 workstations in Gigabit (Apple iMac + MacPro), random access (read files being writting and some other files @120 Mb/s & 50 Mb/s ; write 120 Mb/s files)... thanks to "preftech" capability :)
  • 1 Switch 24x Gigabit + Dual 10 GbE fiber SR

Use : editing
=>: Photos (JPEG, Raw),
=> Videos (mov / mxf HD compressed files, mp4 SD compressed files),
=> Audio (Wav, MP3)

I was thinking about ZFS, vdev mirrored as describes :
=> http://www.nex7.com/node/1
=> http://www.solarisinternals.com/wik...Configuration_Example_.28x4500_with_mirror.29

My daft :

- 4U 24 Bay (Storage + system)

  • Rackmount : 4U 24 bay like Norco 4224
  • Motherboard : Supermicro X9DRL-iF
  • CPU : Intel Xeon E5-2630
  • Memory : 16 GB ECC PC1333
  • PSU : 600w redundant
  • Controller : 2x LSI SAS 9207-8i
  • Expander : HP or Chembro
  • Storage : 20x 3 TB drives
  • OS : OpenIndiana or Solaris 11 on 2x Intel X25-V mirrored
  • ZIL : 2x Intel Solid-State Drive 311 mirrored
  • L2ARC : 2x Intel Solid-State Drive 520 stripped

- 4U 24 bay JBOD (Storage) + 3U 8x 3,5" (System)

  • Rackmount 4 U Storage : JBOD 24 bay
  • Rackmount 3 U System motherboard & external SAS Controller : 8x 3,5" with LSI 2308 chip
  • Storage : 24x 3 To drives
  • OS :OpenIndiana or Solaris 11 on 2x Intel X25-V mirrored (in the 3U)
  • ZIL : 2x Intel Solid-State Drive 311 mirrored (in the 3U)
  • L2ARC : 2x Intel Solid-State Drive 520 stripped (in the 3U)

Some questions :
  1. Hard drive : wich one ? (no 3 T WD RE4 :( ... and no RE5 series announcement yet 06.25.2012)
  2. How bandwidth do you think this kind of system can reach ?
  3. Any reference for a 1 U system with LSI 2308 or LSI 2008 with dual external SAS ?
  4. Any reference for a 24 bay JBOD SAS / SATA enclosure ?
  5. Expander : HP or Chembro ?

Cheers.

St3F
 
Last edited:
I think you mean, you needed 32tb instead of 32gig.

You probably want more memory, unless your normal file your editing fits nicely in 10gigs. More ram is always better than spending money on l2arc.

You sure you want to use the intel 311's for zil? those drives are very slow, max write speed to your system will be gigabit network speeds.

You didn't say what size intel 520's you planned on using, but they will eat gigs of your ram, making that 16gigs worth even less. If your looking at 4 256gig models, that is going be probably 3gigs of ram used for this alone, since your mainly storing large files.
 
Hi Patrickdk.

Thank you to reply and spend some time on this purpose.

I think you mean, you needed 32tb instead of 32gig.
Indeed ; corrected, thx
You probably want more memory, unless your normal file your editing fits nicely in 10gigs. More ram is always better than spending money on l2arc.
Ok
You sure you want to use the intel 311's for zil? those drives are very slow, max write speed to your system will be gigabit network speeds.
Nop, this is my draft : I read SLC is better for ZIL and MLC for L2ARC
There is 1 workstation which wirtes @ 2x 120 Mb/s (connected @ 10 GbE on the storage) + 8 workstations browsing, R/W in gigabit at the same time.
... what do you propose ?

You didn't say what size intel 520's you planned on using, but they will eat gigs of your ram, making that 16gigs worth even less. If your looking at 4 256gig models, that is going be probably 3gigs of ram used for this alone, since your mainly storing large files.
L2ARC : MLC Intel 520 60 GB x2 (stripped, so 120 Go)

It seemens older ZFS system with old SSD (with two 160GB Intel X25-M stripped for L2ARC and 32GB Intel X25-E drives for ZIL) can reach, 1200 MB/s in sequential read...

More ideas to fulfill my quest ? :)

Cheers.

St3F
 
Also I should note that if the load is SMB on Solaris 11 then you won't use the ZIL at all. NFS uses ZIL heavily.

I'm not sure about AFP
 
I would use the Intel (LSI) SAS expander instead. It has the benefit of not requiring a PCIe slot to function
http://www.newegg.com/Product/Product.aspx?Item=N82E16816117207
Thank you.
If you do 12 striped mirrors you should easily be able to saturate your network, both the 10g and 1g
What about security ?
... I read :
Consider creating 24 2-device mirrors. This configuration reduces the disk capacity by 1/2, but up to 24 disks or 1 disk in each mirror could be lost without a failure.
Is it more secure that creating 4 x vDev of 6x 3 To in RaidZ2 ? in this case, does it still saturate 10g ?

Cheers.

St3F
 
Yes, if your writes use the ZIL, they will be limited to a single drive, based on latency, at 1 queue depth.

The 311 drive will get 90-100MB/sec, so your be limited to that, not 2x.

If your writes don't hit the zil (aren't sync writes), then I guess they are completely optional. They will only be used for sync writes.

Your older x25-e's have a higher streaming write speed of 170MB/sec.

I'm not sure you even need any. If you find your writes are too slow, you can easily add something later.
 
Yes, if your writes use the ZIL, they will be limited to a single drive, based on latency, at 1 queue depth.

The 311 drive will get 90-100MB/sec, so your be limited to that, not 2x.
What better solution can you propose ?
If your writes don't hit the zil (aren't sync writes), then I guess they are completely optional. They will only be used for sync writes.
And, could it be nice for scrub, not to decrease performance of the system while using it ?

Cheers.

St3F
 
A few questions.. Who will be administering the system? Are you looking for single sign on and if so what OS is your existing file and/or mail server running? Are you looking for single point permissions administration? You only list 9 machines hitting this box, what are the average size of the files you are working on? How much data will you be moving to the 32TB volume? How much has your storage need increased in the past 18 months? What are your thoughts on how you will be backing all this data up? Another array, LTO, etc? Will it generally just be used 9-5 M-F (reasons being your backup window availabilities?) Do you have any apps that are AFP specific? Are all your client machines Macs running OSX at least 10.5? Do you have any PC clients?

Next, you say you have one workstation writing about 1TB/day to the array. How long do you need to maintain this data (eg if you need to keep it 1yr, you would need ~200TB of storage just for that machine based on an average number of workdays in a year) I can make more specific suggestions once I get more info.
 
A zil drive will have no effect on speed if your using a scrub or not, as the zil will be used the same way as if you are not doing a scrub. Sync will be fast on a zil, and async will be fast cause you don't care about a confirmation.

The question isn't really about better solution, Anything is better is your willing to pay more :)
I'm just not sure it's needed at all, so why bother.

The only better solutions I know of would be zeusram and ddrdrive.

I keep thinking about mlc drives that have high write speeds, but most of them only seem to get that write speed at higher queue depth, that a zil will never see.

I just feel, installing a zeusram or ddrdrive, when your rarely going use it, from how it sounds like you are planning right now, seems like a waste, and will only get in the way of the real write performance you could have.
 
Hi mwroobel.

Thank you have a look there.

Who will be administering the system?
My team.

Are you looking for single sign on and if so what OS is your existing file and/or mail server running?
There is no shared storage ; there is no mail server.
4 automatic sending mail needed (for 4 different folder)
4 ftp accounts on specific folders
At this time, every single workstation have their own local storage, shared by Gigabit

Are you looking for single point permissions administration?
No:
- 1 account for files to delete / to move
- 1 account for files to move
- 1 Cron permission task to delete one month old files on the /archive folder
- 1 Cron permission task to delete four months old files on the /ingest and /export folder
- 1 Cron permission task to delete six months old files on the /edit , /project , /photos , /web , /audio , /restore folders


You only list 9 machines hitting this box, what are the average size of the files you are working on? How much data will you be moving to the 32TB volume?
- One of these workstation only writes (in /ingest folder):
... 2 files (120Mb/s each) a the same time during 6 hours (320 GB each)
... then 2 new files are created.
= let's say 1,4 TB per day, during 5 day... four times per year

- Five of these read & write;
... read file while recording with the workstation previously mentioned in the /ingest folder
... transfer / ripp / write files from hard drive or flash card which contains video files encoded @ 50 Mb/s to /edit folder (~ 4 TB four times per year)
... can read at the same time 4 streams each @ 120 Mb/s from the /ingest folder
... export / write video files (bitrate 120 Mb/s) from 1 minute to 52 minutes long.to /archive folder
... export / write multiple video files (bitrate 10Mb/s) from 1 minute to 52 minutes long.to /export folder

- One of these read & write, edit JPEG and/or Raw photo files

- One of these read & write, edit audio files

- An access by FTP from the Backup server (archive) to our server, in Gigabit to deliver 120Mb/s video files to /restore folder

How much has your storage need increased in the past 18 months?
This is the lack of actual system !!
No way to know : there is external hard drive walking wherever :eek:

What are your thoughts on how you will be backing all this data up? Another array, LTO, etc?
What we need to backup is what we send into the /archive folder.
ASAP the file is transferred, an email alter to a recipient archivist who download the file from this storage to their storage.. or the file could automatically be transferred to their storage (by FTP)
My team doesn't manage backup / archive storage & the work done on the backup side.


Do you have any apps that are AFP specific? Are all your client machines Macs running OSX at least 10.5?
Software we use work on AFP and CIFS / SMB.

Do you have any PC clients?
No PC ... BUT we can boot on windows with these mac, using Win 7 Pro on bootcamp installed in another partition of the system hard drive.

Next, you say you have one workstation writing about 1TB/day to the array. How long do you need to maintain this data (eg if you need to keep it 1yr, you would need ~200TB of storage just for that machine based on an average number of workdays in a year)
Let's say 1,4 TB/day during 5 day... four times per year.
We need to keep these files during four months ; if they're accidentally deleted, we ask to archiv crew to restore and transfer them to our storage.

I can make more specific suggestions once I get more info.
That will be awesome !!

Cheers.

St3F
 
Last edited:
The question isn't really about better solution, Anything is better is your willing to pay more :)
I'm just not sure it's needed at all, so why bother.
About 7000 $/€
TThe only better solutions I know of would be zeusram and ddrdrive.

I just feel, installing a zeusram or ddrdrive, when your rarely going use it, from how it sounds like you are planning right now, seems like a waste, and will only get in the way of the real write performance you could have.
Ok... I have a look at zeusram in case of need.

Thank you.

St3F
 
Another few questions.. Is this surveillance video and/or satellite pictures (there are specific tweaks you can make for those particular workflows?) I expect you are going to be installing a *nix of some kind, which is your team most familiar with/comfortable with. Do you have any scripts which are host specific? Are you looking for 24/7 onsite service? Will the box be hosted in your offices or at a colo? Is it in the US? Are you running straight ftp or sftp? What kind of firewall is going to be sitting in front of this box that your external clients have to go through? You mentioned $7K, how flexible is that? Do you foresee that as your total expected capital expenditure (eg have you also budgeted for spare equipment (drives, controllers etc in this price or are you expensing them separately?)
 
only a few thoughts:

- with SMB and netatalk, you do not need a log device and when these slow Intels are quite useless
- you need more RAM, think of 64 GB+ (better 128GB+) for high speed (deliver most reads from RAM)
- you do not need a l2arc with large files, think about more RAM like 192 GB-256 GB
- you do not need too much CPU power but should look at 2011 chipsets due to memory needs
- i would try to avoid the expander up to 36 drives (or the max number of bays in a single case, ex http://www.supermicro.com/products/chassis/4U/847/SC847A-R1400LP.cfm)
- when you need an expander, use LSI SAS2 based ones (Intel http://www.intel.com/content/www/us/en/servers/raid-expander-res2cv-brief.html)
- avoid to mix netatalk and SMB sharing (may change with netatalk 3), easiest with full ACL support is CIFS
- use LSI 9211 or 9200 (external) HBA with IT firmware (or cheap IBM 1015 or 16 channel 9201)
9207 may work but is quite new.
- use Intel Nics/ 10 GBe (check compatibility with Solaris/OSX)

- HP builds nice and quite cheap 10 GBe switches ex HP 2910 (4x 10GBe, 24 or 48 x 1Gb)
optionally attach 10 GBe MacPros directly without switch
- use SuperMicro boxes/mainboards
- disks: seagate or hitachi, use 512B ones whenever possible, 24/7 Sata is ok especially without expander, use 2-3 hotfix disks
- performance scales with number of vdevs -> use as much raid-1 as possible

- think about a second backup/ failover system (different physical location)
 
Last edited:
The only better solutions I know of would be zeusram and ddrdrive.

I keep thinking about mlc drives that have high write speeds, but most of them only seem to get that write speed at higher queue depth, that a zil will never see.

I just feel, installing a zeusram or ddrdrive, when your rarely going use it, from how it sounds like you are planning right now, seems like a waste, and will only get in the way of the real write performance you could have.
Very expensive !!
In this forum, with 5 years guaranty, Plextor M3 Pro is often mentioned for the job.

http://www.dataonstorage.com/

those jbods are IMO the best COTS you can buy. i just stood up a monster system over the weekend using 10 of the dns-1600s. they're extremely fast.
Ouch : :eek:
- DNS-1660D, Dual SAS I/O Controller, 60 bay 3.5" = $10,995.00 ...
- DNS-1600D, Dual SAS I/O Controller, 24 bay 3,5" = $4,795.00
... far price away from Norco !!

Another few questions.. Is this surveillance video and/or satellite pictures (there are specific tweaks you can make for those particular workflows?)
It's for video conference recording (2 rooms at the same time) / video editing / media / communication

I expect you are going to be installing a *nix of some kind, which is your team most familiar with/comfortable with.
In other workflow, for other need, we used to work with XFS on Suze up to 8 TB for production and MDAM on Debian for temporary storage.

Do you have any scripts which are host specific?
Are you looking for 24/7 onsite service? Will the box be hosted in your offices or at a colo? Is it in the US?
No scripts right now.
Yes : 24/7 onsite service during 5 years
We need 3 of them : one (~ 30 TB) will be installed in the place where meeting take place, another one (~ 30 TB) in our office, last one (~16 TB) when we work on mobile storage solution as for festival.
No it is no in the US.

Are you running straight ftp or sftp? What kind of firewall is going to be sitting in front of this box that your external clients have to go through?
FTP from & to internal server owned by each service.
No external client.

You mentioned $7K, how flexible is that? Do you foresee that as your total expected capital expenditure (eg have you also budgeted for spare equipment (drives, controllers etc in this price or are you expensing them separately?)
Flexible : it is.
Spare equipment are budgeted (4 spare drives) ; controllers, PSU, motherboard are not include in this price range.

only a few thoughts:

- with SMB and netatalk, you do not need a log device and when these slow Intels are quite useless
- you need more RAM, think of 64 GB+ (better 128GB+) for high speed
- you do not need a l2arc with large files, think about more RAM like 192 GB-256 GB
- you do not need too much CPU power but should look at 2011 chipsets due to memory needs
OK
- i would try to avoid the expander up to 30 drives (or the max number of bays in a single case)
- when you need an expander, use LSI SAS2 based ones (Intel http://www.intel.com/content/www/us/en/servers/raid-expander-res2cv-brief.html)
We follow specs from Oracle Solaris for ZFS in mirrored configuration;
- 2x controller =/> LSI 2008 with 10x 3 TB each controller
- 2x expander (so Intel ?) with 10x 3 TB each controller

-- avoid to mix netatalk and SMB sharing (may change with netatalk 3), easiest with full ACL support is CIFS
Ok ; will study this.

-- use LSI 9211 or 9200 (external) HBA with IT firmware (or cheap IBM 1015 or 16 channel 9201)
9207 may work but is quite new.
- use Intel Nics/ 10 GBe (check compatibility with Solaris/OSX)
We own several IBM 1015...
Ony small tree provide 10GbE Nic driver for OSX :mad:
- HP builds nice and quite cheap 10 GBe switches ex HP 2910
- use SuperMicro boxes/mainboards
- disks: seagate or hitachi, use 512B ones whenever possible
Ok

- performance scales with number of vdevs -> use as much raid-1 as possible
Allright... something like :
Code:
# zpool status
  pool: mpool
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        mpool       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c0t1d0  ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0         
          mirror    ONLINE       0     0     0
            c0t2d0  ONLINE       0     0     0
            c1t2d0  ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c0t3d0  ONLINE       0     0     0
            c1t3d0  ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c0t4d0  ONLINE       0     0     0
            c1t4d0  ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c0t5d0  ONLINE       0     0     0
            c1t5d0  ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c5t1d0  ONLINE       0     0     0
            c6t1d0  ONLINE       0     0     0         
          mirror    ONLINE       0     0     0
            c5t2d0  ONLINE       0     0     0
            c6t2d0  ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c5t3d0  ONLINE       0     0     0
            c6t3d0  ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c6t4d0  ONLINE       0     0     0
            c6t4d0  ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c5t5d0  ONLINE       0     0     0
            c6t5d0  ONLINE       0     0     0
         spares
          c0t0d0    AVAIL   
          c1t0d0    AVAIL   
          c5t0d0    AVAIL   
          c6d0d0    AVAIL   

errors: No known data errors
??

Cheers.

St3F
 
Ouch : :eek:
- DNS-1660D, Dual SAS I/O Controller, 60 bay 3.5" = $10,995.00 ...
- DNS-1600D, Dual SAS I/O Controller, 24 bay 3,5" = $4,795.00
... far price away from Norco !!
they're all dual controller. the 1600 is dual LSI the 1660 is PMC Sierra but isn't really a high performance JBOD.

they're the best JBODs you can get though. just sayin.
 
We follow specs from Oracle Solaris for ZFS in mirrored configuration;
- 2x controller =/> LSI 2008 with 10x 3 TB each controller
- 2x expander (so Intel ?) with 10x 3 TB each controller

With expanders, you can add hundreds of disks to your pool but it adds complexity/
firmware problems. Use when needed, avoid when possible.

About your large multipe raid-1
how important are your data?
how current is your backup?

If your data is really valuable, avoid single parity raid-1
(I have had a double failure in the same raid-1 a few months ago and was glad of a quite current backup )

For my really valuable data (ad-directory user database, mail) i use only 3 x mirrors
and accept the premium and like the better read performance.
 
About your large multipe raid-1
Why not ... what should I do ?
how important are your data?
They are !
When a metting is being recorded, the same is done on a video tape ; so there is a first backup in real time.
how current is your backup?
Everyday day during the meeting session : 5 days per week, 4 week by years.
Files in the /archive folder are downloaded to another server which is not our responsibility anymore ; the same for /web folder

f your data is really valuable, avoid single parity raid-1
(I have had a double failure in the same raid-1 a few months ago and was glad of a quite current backup )
I'm listening : how to do ?

For my really valuable data (ad-directory user database, mail) i use only 3 x mirrors and accept the premium and like the better read performance.
I need ~ 30 TB and have 20 or 24 drive bay available... how could it be possible ?

Cheers.

St3F
 
you're doing video, if i were you i would not do mirrors as gea suggests. you want raidz2 or raidz3 for that. it doesnt seem you care about random small IO you're more concerned withlarge sustained sequential. raidz2/3 do sequential very well.
 
you're doing video, if i were you i would not do mirrors as gea suggests. you want raidz2 or raidz3 for that. it doesnt seem you care about random small IO you're more concerned withlarge sustained sequential. raidz2/3 do sequential very well.

Yes, this question must be answered first.
How is your workload

1. large sequential filetransfers, mostly single user, (backup/restore of local edited files) or mostly reads with enough RAM
2. server-based editing of small or large files with concurrent multiple users


with the first option in mind, you can build (30 TB, max 24 disks, each 3TB)

1. max Space
4 x raid-z2, each build from 5 disks = 20 disks, usable capacity 36 TB
I/O performance is like 4 disks read and write
any two disks can fail without problems, medium rebuild time on failures

add minimum 2 hotspare discs

with max 20 disks:
3 x raid-z2, each build from 6 disks = 18 disks, usable capacity 36 TB
add 2 hotspare, I/O like 3 disks


if your workload needs better I/O values for option 2, you may:

2. max Performance
10 x Raid1 mirrors = 20 discs, usable capacity 30 TB
I/O performance is like 20 discs (read) and 10 disks (write) on balanced pools
a two disc-failure in the same vdev will result in a pool lost, short rebuild time on failures
(i have had such a failure, but should not happen too often)

add minimum 2 hotspare discs

3. max security
8 x Raid-1 (3way mirror) = 24 disks, usable capacity 24 TB
I/O performance is like 24 discs (read) and 8 disks (write) on balanced pools
any two disks can fail without problems, short rebuild time on failures

add minimum 2 coldspare discs (replace in case of failures)


pure sequential performance is sufficient in any case.
 
Last edited:
you're doing video, if i were you i would not do mirrors as gea suggests. you want raidz2 or raidz3 for that. it doesn't seem you care about random small IO you're more concerned with large sustained sequential. raidz2/3 do sequential very well.
Indeed : we manage files between from 60 MB to 600 GB ...

Our workload :
- 9 workstations working with large sequential file.

First I was thinking about the solution # 1 ... but need ~ 1600 MB sustained bandwidth
Then, as I read everywhere, the most performance could be in mirror, solution #2

You said :
4 x raid-z2, each build from 5 disks = 20 disks, usable capacity 36 TB
I/O performance is like 4 disks read and write
... so I/O could be better with SSD cache ?
... what data rate could I reach, equiped with 3 To 7200 Trs hard drives ?

Cheers.

St3F.
 
First I was thinking about the solution # 1 ... but need ~ 1600 MB sustained bandwidth

1600MB/s = 12,800Mb/s (1.6GB/s). Even if you achieved maximum theoretical performance over 10GbE (you wont) you would only achieve 10,000Mb/s (1.25GB/s) and have nothing left over for anything else. If you see this as your needs now (and they will probably increase over your expected 5 years lifecycle) you may want to look either at 40GbE or FDR IB as your transport hardware. Regardless of how fast an array you create, the lack of bandwidth for your needs will be a choke point.
 
1600MB/s = 12,800Mb/s (1.6GB/s). Even if you achieved maximum theoretical performance over 10GbE (you wont) you would only achieve 10,000Mb/s (1.25GB/s) and have nothing left over for anything else
Let's say : internal read performance ~ 1600 MB/s ;)
.... like this kind of product : http://www.resource.holdan.eu/specs/Sonnet_Fusion_RX1600_Vfibre.pdf which is dedicated to Video Editing. (with troubles http://forums.creativecow.net/thread/303/129)

I think, ZFS for video editing, edit while capturing ... is the best way to take, instead of traditional Raid6.
Using the preftech capability could be real great in video editing, more over edit while capture / transfer as describe

be5e3078dfed4bc7165f9d4e88c467618ed3e770.JPG


In my mind : video stream are large files already compressed.
= compression off in ZFS, so that it consumes less Ram & CPU
= deduplication OFF
= snapshot OFF in all folder with video files // ON with pictures & project (from Adobe Premiere and Final Cut Pro 7)
// -> what about atime = OFF for this case (video editing), to avoid useless write and gain datarate ?

+ have you ever heard of the 2 GB limitation in file writing using AFP protocol ?
cf :
... http://forums.creativecow.net/thread/3/925843
... http://support.apple.com/kb/TA21611?viewlocale=en_US !! oO

// NFS works with OS X :
- http://www.techrepublic.com/blog/mac/mounting-nfs-volumes-in-os-x/430
- http://ivanvillareal.com/osx/nfs-mac-osx-lion/

Cheers.

St3F
 
Last edited:
i would/do run compression on. lzjb (default) compression is basically free on modern xeon CPUs.

zfs arc/l2arc is quite good just make sure to tweak the arc evacuation settings if you have a lot of memory. you can get into situations where the system dumps 50GB of cached data and the CPUs spike to 100% for 10s or so. google arc_c_min or just go to zfsbuild.com and the second or third blog post talks about what happens.

large memory systems need some tweaking but the performance at the ram/arc layer is amazing.
 
i would/do run compression on. lzjb (default) compression is basically free on modern xeon CPUs.
In my mind, as video files are already compressed (by the CODEC sets), I wonder me if it was necessary to enable compression on lzib : if you try to zip ou rar a video file (mp4, mkv, mov ...), you obtain the same file size !
zfs arc/l2arc is quite good just make sure to tweak the arc evacuation settings if you have a lot of memory. you can get into situations where the system dumps 50GB of cached data and the CPUs spike to 100% for 10s or so. google arc_c_min or just go to zfsbuild.com and the second or third blog post talks about what happens.
Ok, I will study this tweak.

- HP builds nice and quite cheap 10 GBe switches ex HP 2910 (4x 10GBe, 24 or 48 x 1Gb)
optionally attach 10 GBe MacPros directly without switch
What about Zyxel switches ?

The XGS-4528F has been proposed by one of our reseller....
=> http://us.zyxel.com/Products/detail...yGroupNo=F902E07D-0F43-4864-B967-FE83999FC7B1

zyxel_xgs-4528f_l.gif


Datsheet : http://us.zyxel.com/Products/detail...yGroupNo=F902E07D-0F43-4864-B967-FE83999FC7B1

Cheers.

St3F
 
24 ports + 2x / 4x 10GbE ports : price HP 2910al <=> Zyxel XGS-4528F quite similar, about 2000 &#8364; without XFP / SFP

powerconnect 8024 :
- why is it so expensive ? (14 k&#8364;) => up to 24x 10 GbE port + 4 combo

Cheers.

St3f
 
Last edited:
24 ports + 2x / 4x 10GbE ports : price HP 2910al <=> Zyxel XGS-4528F quite similar, about 2000 € without XFP / SFP

powerconnect 8024 :
- why is it so expensive ? (14 k€) => up to 24x 10 GbE port + 4 combo

Cheers.

St3f

There is a huge difference in price between a 24 Port GigE switch with 2/4 10GbE ports and a 24 port 10GbE wirespeed switch. That is the reality of the market now.
 
Y1. max Space
4 x raid-z2, each build from 5 disks = 20 disks, usable capacity 36 TB
I/O performance is like 4 disks read and write
any two disks can fail without problems, medium rebuild time on failures

add minimum 2 hotspare discs
with 4K hardrive, isn't "even" numbers ?
=> 1. so that it should be 5x Raid-z2 each build from 4 disks ?

2. max Performance
10 x Raid1 mirrors = 20 discs, usable capacity 30 TB
I/O performance is like 20 discs (read) and 10 disks (write) on balanced pools
a two disc-failure in the same vdev will result in a pool lost, short rebuild time on failures
(i have had such a failure, but should not happen too often)

add minimum 2 hotspare discs.
20 discs = 4 free space.
I could use 4 sapre disks.
=> 2. If 1 to 4 drive(s) fail(s), does spare disk add / replace automatically the failed drive in the the vDev ?

... 3 .What do you think about this test :
"ZFS Performance &#8211; RAIDZ vs RAID 10" => http://forum.setiaddicted.com/viewtopic.php?p=99747#p99747

... 4. What do you think about this tweak for multimedia files ?
"Tweak ZFS for Media Files server" => http://forum.setiaddicted.com/viewtopic.php?p=99723#p99723

Cheers.

St3f
 
Last edited:
i wouldnt use raidz2 with 4 disk sets over mirrors. you lose the same amount of space and your performance isn't any better.

you do gain reliability but it isn't that much.
 
i wouldnt use raidz2 with 4 disk sets over mirrors. you lose the same amount of space and your performance isn't any better.

you do gain reliability but it isn't that much.
Ok, but you said :
"you're doing video, if i were you i would not do mirrors as gea suggests. you want raidz2 or raidz3 for that. it doesnt seem you care about random small IO you're more concerned withlarge sustained sequential. raidz2/3 do sequential very well. "

... what about 4k & ashift ?

sub.mesa wrote:
As i understand, the performance issues with 4K disks isn&#8217;t just partition alignment, but also an issue with RAID-Z&#8217;s variable stripe size.
RAID-Z basically works to spread the 128KiB recordsizie upon on its data disks. That would lead to a formula like:
128KiB / (nr_of_drives &#8211; parity_drives) = maximum (default) variable stripe size
Let&#8217;s do some examples:
3-disk RAID-Z = 128KiB / 2 = 64KiB = good
4-disk RAID-Z = 128KiB / 3 = ~43KiB = BAD!
5-disk RAID-Z = 128KiB / 4 = 32KiB = good
9-disk RAID-Z = 128KiB / 8 = 16KiB = good
4-disk RAID-Z2 = 128KiB / 2 = 64KiB = good
5-disk RAID-Z2 = 128KiB / 3 = ~43KiB = BAD!
6-disk RAID-Z2 = 128KiB / 4 = 32KiB = good
10-disk RAID-Z2 = 128KiB / 8 = 16KiB = good

What could be the best ?
What hard drive can you recommend me ? (3 To Seagate Constellation ES in SAS are a bit to expensive and only 3 years warranty :( )

Cheers.

St3f

PS : test of WDRE4, IBM 7k3000, Seagate ES :
... test : configuration http://blog.tsunanet.net/2011/08/hitachi-7k3000-vs-wd-re4-vs-seagate.html
... graph : http://tsunanet.net/~tsuna/benchmarks/7K3000-RE4-ConstellationES/sysbench.html
 
for raidz2 youre best route is either 6 or 10 disk sets since you're apparently going to be maxing out each individual drive's ability to sequentially transfer data.

4 disk raidz2 can be done and the use case for that is decent performance (where performance is random iops across many 4 disk sets vs 6 disk sets) + reliability but you're basically losing 50% of the drive space to parity. with mirrors you lose at least 50% usable space as well but gain random IOPs performance on reads.

as for drives, i use the constellation es.2 drives, they're great.
 
Ok, but you said :
"you're doing video, if i were you i would not do mirrors as gea suggests. you want raidz2 or raidz3 for that. it doesnt seem you care about random small IO you're more concerned withlarge sustained sequential. raidz2/3 do sequential very well. "

... what about 4k & ashift ?


I think what he's getting at is that if you have a certain number of drives you'll likely get better sequential IO performance from them by configuring them as RAIDZ(2) rather than mirrors. Once you go as low as 4 drives though, RAIDZ2 probably doesn't make a great deal of sense, other than the fact that it can survive any double disk failure whereas a 4 disk mirror set can only survive half of the possible double disk failure scenarios (which may be a big deal for some).

For instance if you had 10 drives, then for bandwidth you'd probably get a little better sequential performance configuring them in an 8+2 raidz2 set rather than a 5+5 mirror set. You'd get n*8 capacity too as opposed to n*5 of the mirror set.
However it may be a closer call if you configured as 2x 3+2 raidz2 (n*6 capacity)
The mirror set would have better small random IO performance though!
It's all a tradeoff in the end though :)

As for ashift etc, AIUI if the drive reports physical sector size as 4k, then ZFS should deal with it with no intervention - the question isn't just whether the drive reports it though, it's also whether the OS accepts what it reports.
However, correct reporting or not, if you create your pool and you end up with ashift=9, then it's my understanding that if you want proper 4k operation, you may need to destroy it and recreate it using the gnop trick or a modifed zpool binary which can set ashift=12!
 
with 4K hardrive, isn't "even" numbers ?
=> 1. so that it should be 5x Raid-z2 each build from 4 disks ?


20 discs = 4 free space.
I could use 4 sapre disks.
=> 2. If 1 to 4 drive(s) fail(s), does spare disk add / replace automatically the failed drive in the the vDev ?

... 3 .What do you think about this test :
"ZFS Performance &#8211; RAIDZ vs RAID 10" => http://forum.setiaddicted.com/viewtopic.php?p=99747#p99747

... 4. What do you think about this tweak for multimedia files ?
"Tweak ZFS for Media Files server" => http://forum.setiaddicted.com/viewtopic.php?p=99723#p99723

Cheers.

St3f

There is no always the best method, only optimizations better for one or another workload or
for price, performance or space.

Example:
If you use z2 vdevs with the golden numbers of data-disk (4,8,16,..), all reads and writes can use all disks so you have the
best possible sequential performance. If you add 2 discs, you have more capacity without the performance advantage of
the two extra disks because reads and writes are not spread over all disks.

If you do not look at sequential performance (read/write single datastream on ongoing sectors of the disk) but look at IO
(fragmented disks, concurrent read/writes), you must look at the number of vdevs because I/O scales with number of vdevs.
The best sequential values are useless regarding I/O. As most vdevs as possible is the answer here, even without the golden numbers.

I would optimize the disk config for sequential performance on single user or backup machines
and best I/O for multiuser, database/web or server editing use.

I would not tweak settings unless you have a very special workload. The defaults are quite good.

If you have hot-spares, they are used automatically on a disk failure.

Most important "tweaking":
use RAM, RAM, RAM and as much spindles/vdevs as possible

About disks: This is the most important thing, even more important than disk layout
(Its like a car where the engine is most important, not the wheels or the spoiler)

use fastest disks, you can get, best would be 2,5" 10k disks, but the max size is 1 TB (WD VelociRaptor) but they have
halv the seek rate of 3,5" disks, otherwise 7200U/m 3,5" disks, If you can avoid 4k disks, you should.

You may replace your 3,5" case and 20/24 disks with 2,5" cases (SuperMicro has 2,5" cases from 24 up to 72 disks)
to have more options for a more powerful storage due to more spindels. (faster, higher, stronger..)

Last thing: You may just try a more space optimized pool to see if its fast enough. If not, use a more performance I/O optimized pool.
It would be good, if you have the option to add more disks/vdevs to increase performance.
 
Thank you, all of you for your precious time !
It's very helpful. I enjoy learning all this stuff ^^ :)

for raidz2 youre best route is either 6 or 10 disk sets since you're apparently going to be maxing out each individual drive's ability to sequentially transfer data.
Ok

as for drives, i use the constellation es.2 drives, they're great.
SAS or SATA interface
... ST33000650NS ?
.... ST33000650SS ?
.... ST33000641SS ?
The thing :
==>> Seagate Entreprise / Constellation (ES, ES.2) = 5 years (only 2 years of warranty for those manufactured from 31.12.2011 to 30.06.2012)
..... vs 5 years with WD RE4 (ok only 2 To ... :/ )

If you do not look at sequential performance (read/write single datastream on ongoing sectors of the disk) but look at IO
(fragmented disks, concurrent read/writes), you must look at the number of vdevs because I/O scales with number of vdevs.
The best sequential values are useless regarding I/O. As most vdevs as possible is the answer here, even without the golden numbers.

I would optimize the disk config for sequential performance on single user or backup machines
and best I/O for multiuser, database/web or server editing use.
There will be multiuser utilization on sequential files : at the same time, from 9am to 13pm
- One workstation record 2 big video files during 4 hours length (@ 120 Mb/s = 210 GB each) at the same time = 240 Mb/s = 30 MB/s to ./Ingest folder
- Five workstations will read one of these recording streams, not at the same duration position (cf Preftech : http://upload.setiaddicted.com/fichiers_upload/be5e3078dfed4bc7165f9d4e88c467618ed3e770.JPG)
// if one (or two or three ...) of these five workstations don't read one of these big video files, they will transfer / record files between 20 MB to 4 GB to ./Rushes_ENG
- One workstation will transfer, edit and share pictures with Photoshop
- One workstation will record and edit sound

.... still confused on which ZFS configuration to adopt.

I'm modifying some piece of the configuration :
  • Motherboard : Supermicro X8DAH+ (already owned two )
  • CPU : 2x 5506 (or X5550 ?) (already owned these pair)
  • RAM : 48 Go ECC REG PC1333 (I already own 16x 4 GB)
  • Graphic card : PNY NVIDIA Quadro NVS 295
  • Case for System : Supermicro SC825TQ-R720LPB
  • Case for Storage : 24x 3,5" with expander Chembro CK23601 or UEK12803
  • HBA : LSI SAS 9212-4i4e ... x1 or x2 ?

Cheers.

St3F
 
Last edited:
I use all SAS drives, even SAS SSDs. they're a bit more but if you're building enterprise grade stuff use enterprise grade parts.

for you though SATA is probably acceptable since it sounds like you're mainly concerned with sustained throughput.and not concerned with dual path or HA much.

what i would do is explain to management your routes forward, explain that you may need time to try different disk configurations, and then use whatever works best for you workload.

as for your parts list ... why the graphics card? pretty sure the mobo has one.

i'm not terribly familiar with the chenbro line, if i were you i would email support and ask them what you need. my gut says that second link is what you'll need.

as for the controller, i would use the lsi 9205-8E. that gives you 48gbps worth of throughput and can handle 600k iops (i've actually seen it do 600k :)). 9205 is well supported too.

you dont list a motherboard (that chassis doesnt anyway). something like this is what i would do.

http://www.supermicro.com/products/motherboard/QPI/5500/X8DTH-6F.cfm

that gives you an onboard SAS port so you can power the drive bays in that chassis. with those you can either add some SSDs for zil/l2arc if you find you need them or even add a few more data drives if you want.
 
for you though SATA is probably acceptable since it sounds like you're mainly concerned with sustained throughput.and not concerned with dual path or HA much.
Indeed : most of Video Editing Solution Storage are based on 16x 3,5" SATA Raid 6 mostly on ATTO Raid card)
=> as this kind of product : http://www.holdan.co.uk/Sonnet/Storage+Solutions/Fusion+RX1600+Vfibre#tab=description
... wich can provide up to 1600 MB/s on read / 1200 MB/s on write ; nowhere IOPS is mentioned :eek:

RX1600vfibre.jpg


as for your parts list ... why the graphics card? pretty sure the mobo has one.
Nop : Supermicro MBD-X8DAH+-F -O doesn't have one.

as for the controller, i would use the lsi 9205-8E. that gives you 48gbps worth of throughput and can handle 600k iops (i've actually seen it do 600k :)). 9205 is well supported too.
Ok, I study this.

something like this is what i would do.

http://www.supermicro.com/products/motherboard/QPI/5500/X8DTH-6F.cfm

that gives you an onboard SAS port so you can power the drive bays in that chassis. with those you can either add some SSDs for zil/l2arc if you find you need them or even add a few more data drives if you want.
Indeed, it could be fine, but I already have X8DAH+, 2x Xeon 55xx...
Everything is possible, I just to solve the price equation ;)
=> with your solution : 1x Motherboard + 1x LSI HBA with external connector
=> with mine : 1x Graphic card + 1x or 2x LSI HBA, one with external connector + case SM SC825TQ-R720LPB

What about CPU : Single or Dual ? Xeon 5504 or 5506 could be ok ?

Cheers.

St3F
 
uhm, might want to look at the 9201-16i too. then use two brackets to run 2 of the internal ports through external brackets. some of the zfs related threads on this board have links to these brackets or just google external sas bracket.

that gets you down to a single card however the 4i-4e cards might be cheaper over all and will likely be faster since it will use two pci-e slots although pci-e x8 has enough bandwidth regardless.

your CPUs should be fine, if you don't have a need for one then just slap them both in. i have a dell 1950 with a u320 scsi card running two older power vaults full of old 10k scsi drives and it only has 2 quad 1.86 xeons ... barely ever hit more than 50% cpu running one pool with 2 x 7 disk raidz3 and one pool of 7 mirror sets with system wide compression. its a solid albeit old backup repository and unfortunately the mirrored pool is still production ... le sigh need more time to fix things that aren't broken or new business.
 
Back
Top