ZFS Damaging WD20EADS Drives?

axan

[H]ard|Gawd
Joined
Nov 5, 2005
Messages
1,935
OK this might seem kinda silly but I'm starting to suspect that ZFS is somehow causing bad blocks on my WD hdds.


About 3 months ago I've created 2x 8drive raidz2 pool using WD20EADS hdds. All drives were tested prior to this and none had any problems. About month ago I started to get zpool errors, thousands of checksum errors + some read and write. All drives were showing as degraded and finally the whole pool collapsed.

After that I've tested each drive again and 12 out of 17 had unrecoverable bad sectors. I did rma with WD and received the new drives.

Same story again, tested first batch of 8 drives, running windlg +hdtune pro surface scans, + hdtune pro random erase + verify. All drives tested ok.

Created new raidz2 pool out of those 8 drives and copied 2tb of data to it.
This time it only took hours before my drives started showing as degraded with lots of checksum errors.

I rebooted into windows and ran hdtune pro surface test. It's still running but already about half of those drives have bad blocks.

I'm pretty pissed and confused the drives work fine in windows but fail in nexenta and solaris express11

Here's the hardware.
Supermicro X8DTH-6F
Intel Xeon E5620
16GB RAM
SM LSI2008 hbas flashed with newest LSI IT firmware.
 
Interesting, but i fail to see how ZFS could actually cause these bad sectors.

Did you check the SMART data on your drives? If you get SMART output like Current Pending Sector etc, then this really is an internal issue with the HDDs, i can't imagine how ZFS would be directly responsible in some weird bug or something.

Instead, the WD20EADS stores 2TB on very high data density platters while still using only 40-byte ECC per sector, which is not enough to silence the issue of amnesia; the HDD sometimes forgets the data due to insufficient ECC correction data.

Regular scrubbing would be recommended. Also check your SMART output for UDMA CRC error count (indicating cabling errors) or other unexpected SMART output.
 
like I said all drives were tested prior to creating the pool. All smart values were fine. Now some drives show pending sector counts higher then 0
Oh and my previous pool that survived for 2 months was scrubbed weekly with no problems until the failures started.
I would chuck it up to coincidence if it was few drives or one time occurrence. But now it's 16+ drives in about 3 month time. And like I said the newest batch just came from WD was tested and the pool only survived few hours before massive errors.

The same hardware was used to test drives, same controller, same cables. All drives tested ok.

I'm going to create new pool out of hitachi 2tb drives and see what happens.
 
Well there's a reason the industry is not producing any non-4KiB sector in 2011 and later. The BER problems of 512-byte sector HDDs got out of hand and is like a dirty little secret to the industry i guess.

But i would not completely discount the possibility of some other issue causing your problems. But i still would think BER is the most likely cause for your problem, as it can be expected on such high capacity disks that have too little ECC correction relative to the areal data density.

By the way, if you see errors attributed to write then this must be something else than BER. Perhaps post some SMART data from your most affected disks? I know you already looked at most stuff, but it can't hurt to have another pair of eyes look at it, i guess.
 
Here's smart data from 2 drives with the most bad sectors
Drive3.png
Drive4.png
 
I see a large number of "Unsafe Shutdown Count" in your output, this may potentially be a cause for this issue, though that's just a guess. It kind of looks like all the shutdowns were unclean. I wonder how that could be; perhaps the server is powering down too early? Might be worth investigating.
 
I would also test/check other components in your system. Memory, cables, power supply, HBA, etc. The chance of software (even at OS level) causing this is slim.
 
this batch was ran in a vm on a passthrough controller so they never really power down. Also I looked at the other drives on that controller and they unsafe shutdown values vary from 4 to ~378, most are below 20
 
Yes, one drive has 378 unsafe shutdowns while having 379 powerups. The other drive has 67 unsafe shutdowns and 66 powerups. I guess you did not powerup one drive 300 additional times? Actually an unsafe shutdown is the only thing that will cause pending sectors for an otherwise flawless drive. I don't think a controller reset will cause an unsafe shutdown. Even the spinup count equals the powerup count, which suggests that the drive really lost power that often.

This looks like either a faulty or overloaded power supply or a loose power connection.
 
@OP

I suspect either

a) bad/loose cables. You'd be surprised how often this can happen. Best thing to do is what I did: replace all the el cheapo SATA cables with latched cables instead. Latched cables are notably more expensive, but worth it IMO.

b) bad controller. Disc controllers can go bad - I had it happen to the integrated Promise TX2 on an ASUS PC-DL Deluxe. Lost 500GB of (unimportant) data.

c) bad luck with the drives. How were they shipped? Was the packaging adequate?
 
Yes, one drive has 378 unsafe shutdowns while having 379 powerups. The other drive has 67 unsafe shutdowns and 66 powerups. I guess you did not powerup one drive 300 additional times? Actually an unsafe shutdown is the only thing that will cause pending sectors for an otherwise flawless drive. I don't think a controller reset will cause an unsafe shutdown. Even the spinup count equals the powerup count, which suggests that the drive really lost power that often.

This looks like either a faulty or overloaded power supply or a loose power connection.

Hey omni I think you are right about this being power issue. I was using norco 7 way splitter to power the backplanes, I just opened the case and checked the cables, 2 5v pins were loose as well as one of the ground pins on the feed. I got rid of that pos and ran molex connections straight from psu, using 2 different feeds per backplane. Hopefully that will solve the problem. will find out soon.
 
Back
Top