ARGH SAS Expanders

bexamous

[H]ard|Gawd
Joined
Dec 12, 2005
Messages
1,670
I have an onboard LSI 1068E, by itself it works GREAT. 8 drives = no problem. I then bought a Chenbro expander, CK12804, and argh is this unstable.

I get these annoying messages in syslog:
[ 862.673187] sd 6:0:9:0: [sdk] Add. Sense: ATA pass through information available
[ 862.706747] sd 6:0:6:0: [sdh] Sense Key : Recovered Error [current] [descriptor]
[ 862.706751] Descriptor sense data with sense descriptors (in hex):
[ 862.706753] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
[ 862.706761] 00 4f 00 c2 00 50
[ 862.706765] sd 6:0:6:0: [sdh] Add. Sense: ATA pass through information available
[ 862.720379] sd 6:0:4:0: [sdf] Sense Key : Recovered Error [current] [descriptor]
[ 862.720383] Descriptor sense data with sense descriptors (in hex):
[ 862.720384] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
[ 862.720392] 00 4f 00 c2 00 50


And eventually everything connected to the expander will just disappear. Is the HP expander better? I went with the Chenbro because I didn't want to waste a PCIe slot just for power. What the heck do people use for Linux/BSD for controller/expander?

I'm just annoyed with this BS and want to buy something that'll work. I'm using Linux now but eventually am going to go to BSD w/ZFS.
 
Supermicro board? If so, I can't get the HP SAS Expander to work with my Supermicro X8ST3-F-O's onboard 1068E. It may save you a bit of hassle trying that.
 
Supermicro board? If so, I can't get the HP SAS Expander to work with my Supermicro X8ST3-F-O's onboard 1068E. It may save you a bit of hassle trying that.

What issues was it giving you?
 
No, its a Tyan board. I ordered a HP SAS expander last night just cause I was annoyed with everything. I'll see if it is any different. I do have 3 PCIe slots & onboard 1068E, I should just ditch expanders and buy a couple Intel SAS cards w/1068E chips off ebay.
 
Okay so here is a big wtf, kernel's mptsas driver is really out of date, no one maintains it basically. Kernel is at like 3.0.12.0 or something, LSI latest is 4.18.0.0... ???? Who knows what LSI has fixed.

So anyways with a little messing around I have Ubuntu 10.04 running 2.6.33, and I got DKMS to build the 4.18.0.0 driver. I'll see how this goes.

bexamous@nine:~/bin$ cat /proc/mpt/version
mptlinux-4.18.00.00
Fusion MPT base driver
Fusion MPT SAS host driver

YAY.

I'll be super pissed if this fixes all my problems as I now have a HP SAS Expander comming :(.
 
I'm not sure if updating the driver made things better/worse. The driver update alone was not some magic fix though.

Good news is I might have a 'working' setup.

[ 793.872522] sd 6:0:6:0: [sdh] Add. Sense: ATA pass through information available
[ 793.877561] sd 6:0:6:0: [sdh] Sense Key : Recovered Error [current] [descriptor]
[ 793.877565] Descriptor sense data with sense descriptors (in hex):
[ 793.877567] 72 01 00 1d 00 00 00 0e 09 0c 00 00 00 00 00 00
[ 793.877576] 00 00 00 00 00 00

These errors alone do not mean a problem has occurred apparently. For some reason when a SATA drive returns SMART info on a SAS controller this stuff will get printed to syslog. This looks like something that is being fixed.

But for me, reading SMART info definitely also seemed to play a part in causing drives to disapear and whatnot.


Anyways so there is two changes I've made that make things MUCH MUCH better if not 'fix' any problems I've been having:

1. In the LSI BIOS setup there were some timeout values that defaulted to 10 10 10 10 I think, I changed them to 0 0 0 0. Unsure why but saw it recommended somewhere. The default values seem to vary on what 1068E product you have.

2. On the Chenbro SAS Expander there is a jumper labeled INT / EXT. EXT seems to be way better. I don't understand why but at this point I don't care.

So with these two changes, and perhaps the upated driver, I'm almost feeling optimistic about things working now.

I've been streaming 500+MB/sec from the card for last few hours without issue and randomly using hddtemp to get smart info. Reading smart info still seems to hit some bugs but rather than things blowing up, instead system ends up recovering. If not using SMART data will minimize risk thats fine with me. I've never used it much so I just uninstalled hddtemp/smartctl to be safe(r).

I have my fingers crossed. Still if someone has a HBA + Expanders that work perfectly under Linux I'd like to know what your setup is.
 
2. On the Chenbro SAS Expander there is a jumper labeled INT / EXT. EXT seems to be way better.

That is interesting.

What kind of ports does the Chenbro expander have. All SFF-8087? Or are there some SFF-8088?
 
This expander has no external ports, not sure why that jumper is labeled INT/EXT.

BTW, I ran into something wierd. I got lsiutil, search google for it, its somewhere on lsi's site. While this utility does a bajillion things, two interesting things are:

1. 'Display attached devices'
Code:
Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 16

SAS1068E's links are 3.0 G, 3.0 G, 3.0 G, 3.0 G, down, down, down, down

 B___T     SASAddress     PhyNum  Handle  Parent  Type
        500e081010082bf0           0001           SAS Initiator
        500e081010082bf1           0002           SAS Initiator
        500e081010082bf2           0003           SAS Initiator
        500e081010082bf3           0004           SAS Initiator
        500e081010082bf4           0005           SAS Initiator
        500e081010082bf5           0006           SAS Initiator
        500e081010082bf6           0007           SAS Initiator
        500e081010082bf7           0008           SAS Initiator
        5001c4500009af00     0     0009    0001   Edge Expander
 0   5  5001c4500009af04     4     000a    0009   SATA Target
 0   6  5001c4500009af05     5     000b    0009   SATA Target
 0   7  5001c4500009af06     6     000c    0009   SATA Target
 0   8  5001c4500009af07     7     000d    0009   SATA Target
 0   0  5001c4500009af08     8     000e    0009   SATA Target
 0   1  5001c4500009af09     9     000f    0009   SATA Target
 0   2  5001c4500009af0a    10     0010    0009   SATA Target
 0   3  5001c4500009af0b    11     0011    0009   SATA Target
 0   9  5001c4500009af10    16     0012    0009   SATA Target
 0  10  5001c4500009af11    17     0013    0009   SATA Target
 0  11  5001c4500009af12    18     0014    0009   SATA Target
 0  12  5001c4500009af13    19     0015    0009   SATA Target
 0   4  5001c4500009af15    21     0016    0009   SATA Target
 0  13  5001c4500009af3d    28     0017    0009   SAS Initiator and Target

Type      NumPhys    PhyNum  Handle     PhyNum  Handle  Port  Speed
Adapter      8          0     0001  -->    0     0009     0    3.0
                        1     0001  -->    1     0009     0    3.0
                        2     0001  -->    2     0009     0    3.0
                        3     0001  -->    3     0009     0    3.0

Expander    30          0     0009  -->    0     0001     0    3.0
                        1     0009  -->    1     0001     0    3.0
                        2     0009  -->    2     0001     0    3.0
                        3     0009  -->    3     0001     0    3.0
                        4     0009  -->    0     000a     0    3.0
                        5     0009  -->    0     000b     0    3.0
                        6     0009  -->    0     000c     0    3.0
                        7     0009  -->    0     000d     0    3.0
                        8     0009  -->    0     000e     0    3.0
                        9     0009  -->    0     000f     0    3.0
                       10     0009  -->    0     0010     0    3.0
                       11     0009  -->    0     0011     0    3.0
                       16     0009  -->    0     0012     0    3.0
                       17     0009  -->    0     0013     0    3.0
                       18     0009  -->    0     0014     0    3.0
                       19     0009  -->    0     0015     0    3.0
                       21     0009  -->    0     0016     0    3.0
                       28     0009  -->    0     0017     0    3.0

Enclosure Handle   Slots       SASAddress       B___T (SEP)
           0001      8      500e081010082bf0
           0002      1      5001c4500009af00    0  13

Interesting information I got from this... everything seemed to be working fine. I check this utility, and the Adapter listed only 3 links. There is a minisas cable between the adapter and my expander, that should be 4 links. Everything worked normally but I could not get 4 links to show up. I then replaced the SAS cable, and now I get 4 links. Apparently the first cable was bad, and unless I looked at lsiutil output I would have never realized. I'm not sure if there is any noticeable difference between 3 and 4 links between hba and expander, but it cannot hurt.

2. 'Display PHY Error Count'
I'm not yet sure if this is actually useful, but I bet if you were having problems it would be worth checking...
Code:
Diagnostics menu, select an option:  [1-99 or e/p/w or 0 to quit] 12

Adapter Phy 0:  Link Up
  Invalid DWord Count                                      16,011
  Running Disparity Error Count                            16,123
  Loss of DWord Synch Count                                     0
  Phy Reset Problem Count                                       0
Adapter Phy 1:  Link Up, No Errors
Adapter Phy 2:  Link Up, No Errors
Adapter Phy 3:  Link Up, No Errors
Adapter Phy 4:  Link Down, No Errors
Adapter Phy 5:  Link Down, No Errors
Adapter Phy 6:  Link Down, No Errors
Adapter Phy 7:  Link Down, No Errors
Expander (Handle 0009) Phy 0:  Link Up
  Invalid DWord Count                                     745,086
  Running Disparity Error Count                           710,287
  Loss of DWord Synch Count                                     1
  Phy Reset Problem Count                                       0
Expander (Handle 0009) Phy 1:  Link Up
  Invalid DWord Count                                     709,679
  Running Disparity Error Count                           667,688
  Loss of DWord Synch Count                                     1
  Phy Reset Problem Count                                       0
Expander (Handle 0009) Phy 2:  Link Up
  Invalid DWord Count                                     745,093
  Running Disparity Error Count                             2,252
  Loss of DWord Synch Count                                     1
  Phy Reset Problem Count                                       0
Expander (Handle 0009) Phy 3:  Link Up
  Invalid DWord Count                                     745,074
  Running Disparity Error Count                           737,221
  Loss of DWord Synch Count                                     1
  Phy Reset Problem Count                                       0
Expander (Handle 0009) Phy 4:  Link Up, No Errors
Expander (Handle 0009) Phy 5:  Link Up, No Errors
Expander (Handle 0009) Phy 6:  Link Up, No Errors
Expander (Handle 0009) Phy 7:  Link Up, No Errors
Expander (Handle 0009) Phy 8:  Link Up, No Errors
Expander (Handle 0009) Phy 9:  Link Up, No Errors
Expander (Handle 0009) Phy 10:  Link Up, No Errors
Expander (Handle 0009) Phy 11:  Link Up, No Errors
Expander (Handle 0009) Phy 12:  Link Down, No Errors
Expander (Handle 0009) Phy 13:  Link Down, No Errors
Expander (Handle 0009) Phy 14:  Link Down, No Errors
Expander (Handle 0009) Phy 15:  Link Down, No Errors
Expander (Handle 0009) Phy 16:  Link Up, No Errors
Expander (Handle 0009) Phy 17:  Link Up, No Errors
Expander (Handle 0009) Phy 18:  Link Up, No Errors
Expander (Handle 0009) Phy 19:  Link Up, No Errors
Expander (Handle 0009) Phy 20:  Link Down, No Errors
Expander (Handle 0009) Phy 21:  Link Up, No Errors
Expander (Handle 0009) Phy 22:  Link Down, No Errors
Expander (Handle 0009) Phy 23:  Link Down, No Errors
Expander (Handle 0009) Phy 24:  Link Down, No Errors
Expander (Handle 0009) Phy 25:  Link Down, No Errors
Expander (Handle 0009) Phy 26:  Link Down, No Errors
Expander (Handle 0009) Phy 27:  Link Down, No Errors
Expander (Handle 0009) Phy 28:  Link Up, No Errors
Expander (Handle 0009) Phy 29:  Link Down, No Errors

Diagnostics menu, select an option:  [1-99 or e/p/w or 0 to quit]

You can see all phy links that are up/down and the error counts for each. The 4 links that have all those errors, are the Expander's 4 links to the HBA. I'm not sure when those errors occured. I'm resetting these counters, also via lsiutil, and see if they come back. Again, if you had bad cabling, or suspect setup, it would be interesting to see all the phy error counts.

Also, my setup is still working perfectly since changing the jumper setting on the expander & changing those timeout values in LSI BIOS. At this poitn I also know I had a bad SAS cable, perhaps that had something to do with all the problems I was having, who knows.
 
Yeah that patch fixes ata-passthrough/smartctl I believe, I was testing it for a few days and unable to get controller to hang using smartctl/hddtemp.

However, after I got excited the mptsas driver was fixed, I twice had all the drives fall off the controller (without use of smartctl/hddtemp/ATA-PASSTHROUGH). Also the LSI rep says there is no hardware bug and seemed to imply the patch should not be needed, which seems to imply there is a bug somewhere else... I dunno, perhaps I read his comment wrong.

I've got a mpt2sas controller on my desk at home, I'm going to switch to that and compare.
 
Back
Top