27tb zfs with Dell 9th gen hardware

med

n00b
Joined
Mar 3, 2016
Messages
11
Thought some of you folks might enjoy. It's my first time storage build, I think this has gone well so far...

Dell 2950 gen 2
Xeon quad-core 3.0ghz
8gb RAM
Perc 5/i
- 2 x 500gb WD Black in Raid-1
- 6 x 1tb WD Blue in Raid-10
Perc H200 (flashed to IT mode)
- SAS cable to MD1000 below

Dell MD1000
- 4 x 4tb WD Red drives (WD40EFRX)
- 10 x 1tb enterprise drives (a few WD Black WD1003FBYX, the rest Seagate Constellation)

zfs pool is 7 x 2-disk mirrors with one disk hot spare, stats are at the bottom...

A few thoughts:

- Yes, I'm adding more RAM :)
- Yes, it's too full right now, have bigger disks on the way.
- I built it with 1tb drives, have been swapping them in pairs for 4tb disks. That'll be a max of 60tb + spare. (18tb usable in my setup)... I'm reasonably sure I could use bigger disks yet.
- Yes, the MD1000 is 3gbps backplane, but that really doesn't impact much given that it's a gigabit NAS
- Max transfer speed is pretty good, roughly 120MB/s read/write over the network (i.e. maxing out the gigabit connection)
- It's built for big storage, good redundancy, speed is not the priority
- The vast majority of what I have on here is media for Plex. Has been pretty rockin' so far. Other uses include photo storage for the wife (raw files are big.) and time machine backup on the 2950's internal raid-10.
- It was a major bitch to make this thing quiet enough to live with. I did the 2950 firmware hack to lower the minimum fan threshold, swap in slower fans on the 2950, and tricked the MD1000 PSUs into using quieter fans as well. After all that, it's quiet enough to be in my theater room without being noticeable. I have a few good articles for that if anyone wants to try it themselves. If you don't need to worry about noise, just freaking leave it.
- No concerns or issues about heat, system utilization is pretty low, the fans never have to adjust speed in order to keep the system cool. But they could. But they don't.
- I consulted a lot of other forums and posts trying to get information on whether this would end up working, a few of the things I heard were...
-- "The MD1000 doesn't support non-Dell disks!" -- after these units went out of their support period, the latest firmware updates lifted this restriction. Didn't run into this at all.
-- "The MD1000 will be slow, it has a 3gbps backplane" -- not an issue at all for a NAS on a 1 gig network.
-- "The MD1000 won't support disks bigger than 2gb" -- I used a Dell H200 exactly as shown in this guy's post: A Cheaper M1015 - the Dell H200 and a HOWTO Flash - Overclockers Australia Forums and was able to use 4tb drives. How big can it go? 6gb? 10gb? It would be interesting to find out.
-- "It will be super loud!" -- It was. The 2950 has 6 jet-engine fans, and for some reason, the MD1000 has 4. But with a little work, you can swap in quieter consumer fans and trick the firmware to be ok with them.
-- "The H200 and the MD1000 won't connect! I suppose you could use an adapter-type SAS cable but that would be unreliable" -- Works totally fine, just need 8470 to 8088 type cable.
-- "You won't need interposers, it can use SAS or SATA disks" -- Actually in this case you do need interposers, otherwise the disks won't be recognized by the MD. This was actually a pretty big bummer because of the cost. I found sleds with interposers together on ebay, it was much cheaper that way. But you could literally spend more on sleds and interposers than the MD1000 unit itself.
- One bonus here is that everything *else* for these Dell systems is absolutely dirt cheap because there are so many companies phasing them out of their data centers, the market is flooded. Extra power supply? 9.99. Another quad-core 3ghz CPU? 45 bucks. 32 gigs of RAM? 100 bucks.
- Looks like about I get about 45% formatted vs raw capacity. i.e. 26tb x 45% = 11.7tb (not counting hot spare)
- Unfortunately I just get disk activity lights, but no status (green) or error (amber) indicators on the MD1000. Ubuntu's ledmon doesn't help here. It can be worked around, I just made a good list of disk serial numbers and drive bay numbers. It's also possible to issue a command to run constant activity on one physical disk to ID it by the activity light. Small sacrifice I suppose.

Anyway, here's what it looks like so far:

root@lightnmagic:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
matrix 11.8T 10.4T 1.39T - 52% 88% 1.00x ONLINE -

root@lightnmagic:~# zpool status
pool: matrix
state: ONLINE
scan: scrub canceled on Thu Apr 21 11:34:03 2016
config:

NAME STATE READ WRITE CKSUM
matrix ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E4CLFRC6 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E4KY2KX4 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0NDD8Z7 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0SERCFE ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
ata-ST1000NM0011_Z1N43MM8 ONLINE 0 0 0
ata-ST1000NM0011_Z1N45BY3 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
ata-ST1000NM0011_Z1N45K3A ONLINE 0 0 0
ata-ST1000NM0011_Z1N45JXT ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
ata-WDC_WD1003FBYX-01Y7B1_WD-WMAW30214559 ONLINE 0 0 0
ata-WDC_WD1003FBYX-01Y7B1_WD-WCAW36618780 ONLINE 0 0 0
mirror-5 ONLINE 0 0 0
ata-MB1000EBNCF_WCAW32452173 ONLINE 0 0 0
ata-MB1000EBNCF_WCAW32420922 ONLINE 0 0 0
mirror-6 ONLINE 0 0 0
ata-MB1000EBNCF_WCAW32447247 ONLINE 0 0 0
ata-MB1000EBNCF_WCAW32451905 ONLINE 0 0 0
spares
ata-MB1000EBNCF_WCAW30204232 AVAIL

errors: No known data errors

root@lightnmagic:~# fdisk -l | grep "Disk /dev"
Disk /dev/sdb: 2498.9 GB, 2498865659904 bytes
Disk /dev/sda: 499.6 GB, 499558383616 bytes
Disk /dev/sdn: 4000.8 GB, 4000787030016 bytes
Disk /dev/sdo: 4000.8 GB, 4000787030016 bytes
Disk /dev/sdd: 4000.8 GB, 4000787030016 bytes
Disk /dev/sdc: 4000.8 GB, 4000787030016 bytes
Disk /dev/sde: 1000.2 GB, 1000204886016 bytes
Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes
Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes
Disk /dev/sdm: 1000.2 GB, 1000204886016 bytes
Disk /dev/sdl: 1000.2 GB, 1000204886016 bytes
Disk /dev/sdk: 1000.2 GB, 1000204886016 bytes
Disk /dev/sdj: 1000.2 GB, 1000204886016 bytes
Disk /dev/sdi: 1000.2 GB, 1000204886016 bytes
Disk /dev/sdh: 1000.2 GB, 1000204886016 bytes
Disk /dev/sdq: 1000.2 GB, 1000204886016 bytes
Disk /dev/sdp: 1000.2 GB, 1000204886016 bytes




I wanted to bring something up and see if others have comments on it... if I were to do this all over again, I'd be pretty tempted by one of those 24-bay Supermicro boxes because I'm reasonably sure it would end up being quieter and a little simpler to rack up. But I'm not sure it would necessarily be cheaper?... I'm wondering how much people have spent on a supermicro box as opposed to something like this... if you're reasonably patient on eBay...

- Dell 1950 ... I've seen as low as $75, probably plus shipping
- Dell MD1000 ... again as low as $75, plus shipping
- Dell H200 ... $60
- Rails for both, $35 and $40 bucks, might come with the box
- Tripp Lite S520-1M SAS cable, new from newegg ... $56 but cheaper for used equivalent
- Sleds + interposers (required!) ... $9-15 each, depending on what your MD1000 came with
- Disks... WD Red (4tb) is about $37/TB from newegg

So call it 400-450+shipping for the base system, plus disks. Enticing? Total crap? What do you think..

Man, that was a long post.

+pics
2016-04-28 17.24.43.jpg
2016-04-28 17.24.22.jpg
2016-04-28 17.23.52.jpg
2016-04-28 17.23.31.jpg
 
Isn't the 2950 a DDR2 server?

ZFS will preform better with DDR3. Something I read. I am being a parrot. I could look it up on the FreeNAS forum if I had the time, but it was stated with some serious emphasis.
 
It is. I've read that for zfs you want to make sure to have ECC RAM instead of a single stick of consumer RAM for example (i.e. a glitch in RAM will cause zfs to perceive or write checksum errors) ... but I haven't given it much thought. Performance bottleneck is still the network connection. I've thought about doing interface bonding to add a few more lanes to the highway, but haven't put the time into it. I've got 8 more TB coming on Monday :)
 
It is. I've read that for zfs you want to make sure to have ECC RAM instead of a single stick of consumer RAM for example (i.e. a glitch in RAM will cause zfs to perceive or write checksum errors)
Actually, faulty RAM dimms do not trash your zfs data:
Will ZFS and non-ECC RAM kill your data? | JRS Systems: the blog

"...But what if your evil [faulty] RAM flips a bit in the second data copy? Since it doesn’t match the checksum either, ZFS doesn’t overwrite anything. It logs an unrecoverable data error for that block, and leaves both copies untouched on disk. No data has been corrupted..."
 
Actually, faulty RAM dimms do not trash your zfs data:
Will ZFS and non-ECC RAM kill your data? | JRS Systems: the blog

"...But what if your evil [faulty] RAM flips a bit in the second data copy? Since it doesn’t match the checksum either, ZFS doesn’t overwrite anything. It logs an unrecoverable data error for that block, and leaves both copies untouched on disk. No data has been corrupted..."

Nice. Less to worry about then it seems.

I think I've done well so far except for the idea of growing the pool by growing mirror vdevs one by one. Data isn't terribly well-distributed across vdevs. It isn't necessarily causing performance issues for me at this point, but I'm in the middle of moving/recopying data onto a spare 4tb just to try and shake things into better alignment. Actually seems to be working, which is good.
 
Nice Job, I am also working on upgrading my storage and I just got one of these beasts (MD1000) and I would really like to get some quieter fans. Can you share what fan you used and this "dummy device to simulate a tach signal to keep the MD1000 from freaking out". Thanks so much if you can point me in the right direction.
 
Actually, faulty RAM dimms do not trash your zfs data:
Will ZFS and non-ECC RAM kill your data? | JRS Systems: the blog

"...But what if your evil [faulty] RAM flips a bit in the second data copy? Since it doesn’t match the checksum either, ZFS doesn’t overwrite anything. It logs an unrecoverable data error for that block, and leaves both copies untouched on disk. No data has been corrupted..."

yeah, that whole thing is BS. ZFS without ECC = plain idiotic. ZFS does a shit ton of data storage in memory using complex data structures. Not using ECC memory is a quick way to fubar everything. Quite honestly given the how cheap ECC capable CPUs and RAM is, if you are doing a storage server without ECC, you shouldn't. Actual real world raid cards have ECC on everything, ZFS is on loading all that work, all that caching, etc into system memory, and it damn well needs ECC. There are nearly infinite ways that memory errors can fubar a ZFS array.
 
yeah, that whole thing is BS. ZFS without ECC = plain idiotic. ZFS does a shit ton of data storage in memory using complex data structures. Not using ECC memory is a quick way to fubar everything. Quite honestly given the how cheap ECC capable CPUs and RAM is, if you are doing a storage server without ECC, you shouldn't. Actual real world raid cards have ECC on everything, ZFS is on loading all that work, all that caching, etc into system memory, and it damn well needs ECC. There are nearly infinite ways that memory errors can fubar a ZFS array.

Yep this, exactly this. I always tell people, it's not strictly a requirement but it is highly advised and upon failure help will not be forthcoming. It's like Chris Rock once said, "you can drive a car with your feet I you want to but that doesn't make it a good fucking idea!"
 
Back
Top