• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

SAS expander trouble

Sjakie

n00b
Joined
Dec 14, 2011
Messages
40
Hiya. For the past few weeks i've been scratching my head on this one. I have a LSI 1068e with the p21 IT firmware on it, and a SASx28 expander to a bunch of wd20earx and ears drives.
Now all disks hooked up directly to the controller works fine with heavy loads. With all drives in the chassis with expander i get a ton of kernel messages with heavier workloads like a raid rebuild (mdadm).

An example of a read benchmark; (same behavior with a rebuild more or less)
vywggy.png


At the end one disk is diagnosed with a bad sector. That's also a rather odd problem, disks are getting pending sectors by the dozen when they are put to work. If you let badblocks scan overnight they are cleared, IF hooked up to a sata port or directly to the controller.
Another sympton is the slow rebuild speed, dropping to a whopping 500kbytes/s, probably by the many resets caused by the hundreds of reads and writes.
Again, the disks seem to be fine, i've scanned them a few times already and they only get pending sectors when hooked up on the sas expander. They never get reallocated either (which is quite a relieve)

Now the error in dmesg that jumps out time after time is 'SubCode(0x2000)', but there are also a few others every now and then. There are a few threads on mailing lists on this issue (from the looks of it) but i havent spotted the real solution yet.

--

So, the things i've already tried is jumpering all EARX drives to 300mbps instead of 600. Also tried it out with only 2 disks in and the amount of errors dropped significantly, but were still present.
WIth LSIutil i've been diagnosing some stuff, but didn't come up with anything. All data lines seem active and it reports no errors.

I'm going to patch my current kernel (3.2 Wheezy) with the latest mpt fusion driver (4.28) and see if it fixes anything. In it's current form this is unworkable and i really cant have disks laying around anymore, thats why i got that damned chassis ;)

Is there anyone with thoughts on this matter?
 
Turned out, knock on wood, that the head parking of the green drives was the cullprit after all. I had the timer well beyond the normal 8 seconds but only turning it off (-d) with the open idle3 tool fixed most of the errors. Seems stable now.
 
Back
Top