ZFS and SAS expanders with SATA drives a toxic combo?

mikesm · Sep 18, 2010

So tonight I was looking at a link from picker's NAS assembly bog and noticed this post: http://gdamore.blogspot.com/2010/08/why-sas-sata-is-not-such-great-idea.html

Is this really true? I have used my HP SAS expander with no issue at all with hardware RAID and SATA disks, and any of the high port count controllers use on-board SAS expanders, Additionally, if you are using multiple enclosures, SAS expanders are the only practical way of handing the interconnection. So if ZFS has some fatal issue with this configuration, it would seem to be an major issue.

Is this true for all platforms regardless of operating system? What's the issue here that is driving this problem with ZFS that doesn't show itself with linux software raid and windows hardware raid?

thanks,
Mike

sweloop64 · Sep 19, 2010

If someone listed the type of debug-info that the sata-protocol carries that the sas-protocol might silently drop one might be able to do a risk analysis from that...

My guess it's related to the combination of sas<->sata (+ expander) + zfs (+ mptdriver) that triggers something...

As he says in his update "I can say that we (meaning Nexenta) have verified that SAS/SATA expanders combined with high loads of ZFS activity have proven conclusively to be highly toxic.[...] You may think SATA looks like a bargain, but when your array goes offline during ZFS scrub or resilver operations because the expander is choking on cache sync commands, you'll really wish you had spent the extra cash up front.".

Hard to draw any conclusions from that... He doesn't even mention any debug-info from the sata-channel in the update, but instead focus on general issues...
It's just as general as when a friend says "the pc reboots/hangs during high load, like games, rendering etc" and I would as a solution say "you need to buy the most expensive top of the line pc there is, that will most likely fix everything".

It could be as simple as heat, as most expanders have passive cooling or that zfs is flooding the system with commands that require some kind of debug-info back to let it know when to back off...

sub.mesa · Sep 19, 2010

Well i generally recommend to use plain HBAs without expanders or port multipliers or other technology that lowers the dedicated bandwidth or complicates your setup. The more complex setup you have, the more things that can go wrong. Expanders due to their nature can reduce performance; the bandwidth isn't actually there.

Isn't buying one or two HBA cards an option? They're not that expensive. You can try with your HP expanders and if that gives you trouble replace it for real HBAs instead.

Not sure whether ZFS is extra sensitive to expanders or something; but ZFS is highly threaded; highly parallel I/O. If you use technology that disrupts or hinders the ability to do parallel I/O (like expanders or PCI bus) then you might notice that in your performance levels.

Expanders may be very nice on WHS where things work as JBOD and only one or a few disks is accessed at a time; not when ALL disks are accessed constantly at the SAME time. Dedicated bandwidth is a plus here, not hard to imagine.

mikesm · Sep 19, 2010

sub.mesa said:
Well i generally recommend to use plain HBAs without expanders or port multipliers or other technology that lowers the dedicated bandwidth or complicates your setup. The more complex setup you have, the more things that can go wrong. Expanders due to their nature can reduce performance; the bandwidth isn't actually there.

Isn't buying one or two HBA cards an option? They're not that expensive. You can try with your HP expanders and if that gives you trouble replace it for real HBAs instead.

Not sure whether ZFS is extra sensitive to expanders or something; but ZFS is highly threaded; highly parallel I/O. If you use technology that disrupts or hinders the ability to do parallel I/O (like expanders or PCI bus) then you might notice that in your performance levels.

Expanders may be very nice on WHS where things work as JBOD and only one or a few disks is accessed at a time; not when ALL disks are accessed constantly at the SAME time. Dedicated bandwidth is a plus here, not hard to imagine.

Dedicated HBA's are just not practical when you have a large number of drives, and definitely if you have 2 or more enclosures. Using an enclosure with an expander builtin is a fantastic way of expanding a large setup.

Given that all large port count raid controllers use on-board expanders, and that exapnders are supported fine under hardware RAID, the issue here (if it exists) is with ZFS, not the expander.

Are you saying that you agree the combo is toxic, or that you don't believe it's toxic, just an more complicated solution? When you are talking about 20 or more disks, everything is complicated, so that isn't really a valid criteria in these configurations.

The nextenta people of all people should know what works and what doesn't I would think.

thx
mike

sub.mesa · Sep 19, 2010

Well my experience is limited so i'm not competent to make a judgement about expanders. But i think many will agree that the more complex your setup is with more and different points of failure, the higher the chance for any potential issues.

But keep in mind home I/O workloads are different from server I/O workloads. Many people would be doing mainly sequential I/O to the fileserver, storing large files. I could imagine the expanders creating some latency problems which is more severe with some workloads than others.

If you want to store your HDDs externally, you could consider external Mini-SAS connectors; each cable serving 4 disks full bandwidth. Some controllers have 2 external and 2 internal Mini-SAS ports; but be careful about OS support; though if it uses LSI 1068E chip it should work.

For internal 20-disk storage you only need 2 controllers: 2x8=16 and you should have 6 onboard SATA as well which you should use since these are your fastest ports. So with just two cheap controllers you serve 22 disks.

I also seen solutions which have a cable with PCI-express x16 that also carries power and that you can insert a PCI-express x8/x16 controller in the external casing. But can't find it now and availability would probably suck.

So howmany disks do you want to serve, and why not build separate boxes i.e. one main one backup? That's the setup i use. So it depends all on your individual needs; do what option sounds the best to you. But the less potential bottlenecks or source of headaches, the better i think.

picker · Sep 19, 2010

Like others have said, perhaps ZFS doesn't lead to the toxicity Garrett refers to, it identifies bugs present in all expanders tunneling sata because it exercises issues more than anything else.

netapp uses an interpret card at each sata disk for their sas shelf. Perhaps someone can think of a larger sata tunnel user? I'd say zfs with expanders tunneling sata are the most reliable and tested implementation of tunneling out there, its just not perfect and why Garrett recommends clients buy sas disks for their sas shelves.

sweloop64 · Sep 19, 2010

All I know is that the HP SAS Expander(which is designed for servers, with massive airflow in mind) gets very hot during (heavy) work with the standard passive cooling in a regular case...

I'd like to see some detailed specifications of the setups that Garrett refers to before any conclusions, other then wild guesses, can be made...

sub.mesa · Sep 19, 2010

sub.mesa said:
I also seen solutions which have a cable with PCI-express x16 that also carries power and that you can insert a PCI-express x8/x16 controller in the external casing. But can't find it now and availability would probably suck.

I found this:

http://www.ioi.com.tw/products/proddetail.aspx?CatID=113&DeviceID=3021&HostID=2041&ProdID=1130002

sweloop64 · Sep 19, 2010

Went over a few nexenta/solaris/open solaris forums and one feeling that struck me was "they have broken the mpt driver", noted were some random incompatibilities and broken hardware...
Only expanders I found mentioned were based on the lsi sas(gen 1) chip, not one mentioned the fw of the expander...
Only found one that cross tested his hardware, and he found an incompatible(or faulty) mobo that was triggered by high i/o (scrub)...
Most were posting random mpt errors...
Frequently mentioned HBA-chip was LSI 1068E...
All in all there were very few(less than 10 that I found) individuals reporting issues....

To sum it up, too little to go on, but probably something to keep an eye on if something more comes up...

quillo · Sep 19, 2010

I have a massive issue with my LSI 3081E-R (rebadged Intel) causing all sorts of weird errors. I still haven't been able to figure out the cause, but all IOPS to disks that are attached to that controller (via expander) will suddenly stop and often require the whole system to be hard rebooted.

I'm using a HP SAS expander (new version) with the previous firmware version on FreeBSD. As has been pointed out, that article doesn't really explain in much detail what the problem is but it doesn't sound like it can simply be resolved with a new driver version.

It would be nice to know that other people are having the same problem because it's got me pulling my hair out. This is a personal server though, not for a business or anything which is a plus.

Originally I was experiencing this problem whenever data was accessed from an idle state, or streaming of data (e.g. movies). I suspected that the problem might be the F3EG drives I was using causing a port timeout as they spinup from idle, so after increasing the timeout settings on the controller this seems to have partially fixed the problem as it doesn't occur as regularly, so my next hunch is that it may related to the TLER setting on my WD drives.

mikesm · Sep 19, 2010

I am wondering if they did break the mpt driver in some ugly way that is not broken under solaris itself. Sun sells large numbers of JBOD chassis that are based on SAS exapnders and explicitly support SATA disks in them. So this is making me think that the issue may be associated with the LSI controllers.

I am going to play around with using an old adaptec controller I have in JBOD mode with an exapnder and see how it functions. Adaptec support in Solaris is quite good, and if it is the LSI chip drivers that are the source of the problem, it should avoid them.

For an nexenta engineer to sound such a direct alarm about this combination without explaining the issue seems irresponsible to me. But I have seen code collapse under load in strange ways, and the more complex the setup, the more likely odd conditions will get exercised.

Still, something to consider when looking at ZFS vs hardware raid or linux md raid.

sub.mesa · Sep 20, 2010

quillo: you haven't yet stated if the problems persisted when you connect disks directly without using an expander. How are your disks being detected? You have the controller running in IT-mode?

mikesm · Sep 20, 2010

here's an interesting post on Solaris forums:

ronnyegn

Posts: 11
From:

Registered: 1/10/10

Re: I/O to zpool stalls under high I/O load
Posted: Feb 14, 2010 11:33 AM in response to: nwsmith
To: Communities » storage » discuss

Click to reply to this thread Reply

Hi,

we´ve chosen the Adaptec controller mainly because we had to attach 40 disks and the controller offer up to 24 SATA ports so we dont need SATA expander.

The bug with the SUNWaac driver not recognizing the disks is a known one. There is a open bug which states to use the driver from adaptec instead. What we did.

In the meantime we replaced both Adaptec controller with some LSI 1086E based ones. In order to attach 40 disks we had to use two controller with one expander each. After replacing the Adaptec controller we observed the same errors and IO completely locked up.

It turned out the I/O hangs were a bug in the used SATA disks (Seagate ST31000340NS). These disks having an issue with NCQ which will hang up your SAS HBA according to the Seagate forums.

But after disabling NCQ everthing was fine on the LSI controllers.

The strange this is: Before switching to LSI-based controllers i already tried turning off NCQ in /etc/system AND at controller level but to no avail.

After the IO no longer locked up the system completely we occasionally observed some "failed reads" which could be solved by setting SATA disks fixed to 1.5 Gbit/s speed.

I will summarize these findings in part III released soon (i hope in the comming week). Currently i am playing with the system and especially with FC-COMSTAR and testing I/O figures with Oracle ORION testsuite.

Yours sincerely
Ronny Egner

sub.mesa · Sep 20, 2010

Yes older disks have bad NCQ implementation that adds latency and slows down I/O and may have other bugs as well. But more recent HDDs should do NCQ fine. This should help with multiqueue random reads, and ZFS does those plenty.

If you want ZFS to queue less I/O's, then you can also look at tuning the /boot/loader.conf with:
vfs.zfs.vdev.min_pending="1"
vfs.zfs.vdev.max_pending="1"

mikesm · Sep 20, 2010

sub.mesa said:
Yes older disks have bad NCQ implementation that adds latency and slows down I/O and may have other bugs as well. But more recent HDDs should do NCQ fine. This should help with multiqueue random reads, and ZFS does those plenty.

If you want ZFS to queue less I/O's, then you can also look at tuning the /boot/loader.conf with:
vfs.zfs.vdev.min_pending="1"
vfs.zfs.vdev.max_pending="1"

Is their a way to control this per drive or per pool? My adaptec lets me disable or enable this on a per drive basis.

sub.mesa · Sep 20, 2010

No this setting is pool wide, like other settings in /boot/loader.conf

Not sure how/if you can disable NCQ on a per drive basis. But i did come across the "camcontrol tag" command:

Code:

# camcontrol tags pass6 -v
(pass6:mpt0:0:0:0): dev_openings  255
(pass6:mpt0:0:0:0): dev_active    0
(pass6:mpt0:0:0:0): devq_openings 255
(pass6:mpt0:0:0:0): devq_queued   0
(pass6:mpt0:0:0:0): held          0
(pass6:mpt0:0:0:0): mintags       2
(pass6:mpt0:0:0:0): maxtags       255

And i think you might also change this in the controller's BIOS though i haven't done this on my SuperMicro HBAs. But i know it does allow alot of options.

quillo · Sep 20, 2010

sub.mesa said:
quillo: you haven't yet stated if the problems persisted when you connect disks directly without using an expander. How are your disks being detected? You have the controller running in IT-mode?

Controller is in IT mode and seems to detect the disks as though they were directly attached.

I haven't tested with just the controller because I'm not really sure how to reproduce the problem, sometimes it might happen during heavy IO (e.g. multiple file copies) and sometimes it will happen with light IO (e.g. streaming).

Now that I know about the camcontrol tags command I might take a look at that next time to see how many commands are queued. I've also dropped the vfs.zfs.vdev.min/max_pending values to 4/8.

quillo · Sep 21, 2010

Well I managed to provoke a response from my controller today while running two scrubs... Looks like it might have killed an array out of spite

Code:

  pool: online0
 state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://www.sun.com/msg/ZFS-8000-HC
 scrub: scrub completed after 3h5m with 0 errors on Tue Sep 21 14:33:37 2010
config:

        NAME        STATE     READ WRITE CKSUM
        online0     UNAVAIL      1    38     2  insufficient replicas
          mirror    ONLINE       2    76     5
            da15    ONLINE       9 1.21K     0
            da16    ONLINE       4    80     5
          mirror    ONLINE       0     0     0
            da17    ONLINE       3   592     0
            da18    ONLINE       0     0     0
        spares
          da19      AVAIL   

errors: 5 data errors, use '-v' for a list

  pool: vault0
 state: DEGRADED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://www.sun.com/msg/ZFS-8000-HC
 scrub: scrub in progress for 5h11m, 57.79% done, 3h47m to go
config:

        NAME        STATE     READ WRITE CKSUM
        vault0      DEGRADED     3     0     0
          raidz2    DEGRADED     7     3   116
            da2     ONLINE       6     7    15  130K repaired
            da3     FAULTED     16   298    65  corrupted data
            da4     ONLINE       7     2     0  1K repaired
            da5     ONLINE       7     2     0  1K repaired
            da6     ONLINE       5     2     2  2K repaired
            da7     ONLINE      10     3     2  2K repaired
            da8     ONLINE       7     2     0  512 repaired
            da9     ONLINE     129    59     1
            da10    ONLINE       7     4    90  1.02M repaired
            da11    ONLINE       6     2     0  1.50K repaired
            da12    ONLINE       7     3     0  1K repaired
            da13    ONLINE       7     3     0  1.50K repaired
        spares
          da14      AVAIL

Code:

Sep 21 16:17:34 xxxxxx kernel: mpt0: request 0xffffff80005c36c0:26594 timed out for ccb 0xffffff00072db000 (req->ccb 0xffffff00072db000)
Sep 21 16:17:34 xxxxxx kernel: mpt0: attempting to abort req 0xffffff80005c36c0:26594 function 0
Sep 21 16:17:35 xxxxxx kernel: mpt0: mpt_wait_req(1) timed out
Sep 21 16:17:35 xxxxxx kernel: mpt0: mpt_recover_commands: abort timed-out. Resetting controller
Sep 21 16:18:59 xxxxxx kernel: mpt0: mpt_cam_event: 0x0
Sep 21 16:18:59 xxxxxx kernel: mpt0: mpt_cam_event: 0x0
Sep 21 16:18:59 xxxxxx kernel: mpt0: completing timedout/aborted req 0xffffff80005c36c0:26594
Sep 21 16:18:59 xxxxxx kernel: mpt0: mpt_cam_event: 0x16
Sep 21 16:18:59 xxxxxx kernel: mpt0: mpt_cam_event: 0x12
Sep 21 16:18:59 xxxxxx kernel: mpt0: mpt_cam_event: 0x12
Sep 21 16:18:59 xxxxxx kernel: mpt0: mpt_cam_event: 0x1b
Sep 21 16:18:59 xxxxxx kernel: mpt0: mpt_cam_event: 0x12
Sep 21 16:18:59 xxxxxx last message repeated 54 times
Sep 21 16:18:59 xxxxxx kernel: mpt0: mpt_cam_event: 0x16
Sep 21 16:18:59 xxxxxx kernel: (da3:mpt0:0:44:0): Synchronize cache failed, status == 0x4e, scsi status == 0x0
Sep 21 16:18:59 xxxxxx kernel: (da0:mpt0:0:41:0): WRITE(10). CDB: 2a 0 3 7a 54 5f 0 0 8 0 
Sep 21 16:18:59 xxxxxx kernel: (da0:mpt0:0:41:0): CAM Status: SCSI Status Error
Sep 21 16:18:59 xxxxxx kernel: (da0:mpt0:0:41:0): SCSI Status: Check Condition
Sep 21 16:18:59 xxxxxx kernel: (da0:mpt0:0:41:0): UNIT ATTENTION asc:29,0
Sep 21 16:18:59 xxxxxx kernel: (da0:mpt0:0:41:0): Power on, reset, or bus device reset occurred
Sep 21 16:18:59 xxxxxx kernel: (da0:mpt0:0:41:0): Retrying Command (per Sense Data)
---snip---

Stanza33 · Sep 21, 2010

Sounds like just performing a scrub can cause erros for this guy
http://www.nexenta.org/issues/214

interesting the first post's link is the responding Nexenta engineer

mikesm · Sep 21, 2010

Stanza33 said:
Sounds like just performing a scrub can cause erros for this guy
http://www.nexenta.org/issues/214

interesting the first post's link is the responding Nexenta engineer

These guys are in denial. Sun's OWN JBOD's use expanders, and have for some time and supported SATA disks in them as well.

This is the problem with Oracle having bought Sun - real engineers aren't working on these issues anymore it appears. If Nexenta doesn't get with the program here soon, the entire market is going to write off ZFS.

Stories like this one scare the hell out of IT managers - no one wants something like that happening to them!

picker · Sep 21, 2010

> while running two scrubs ... might have killed an array out of spite

with sooo many drives spewing perhaps your PS isn't up to the task...

BTW, the mpt driver isn't zfs, its another group..

mikesm · Sep 21, 2010

picker said:
> while running two scrubs ... might have killed an array out of spite

with sooo many drives spewing perhaps your PS isn't up to the task...

BTW, the mpt driver isn't zfs, its another group..

If you look at the threads on this bug, you see folks who have 20 servers running the same config and seeing exactly the same issues. This is not a PSU issue.

I recognize that ZFS and the MPT driver are in different groups. But folks who use hardware raid are not seeing this issue, so from a reputation perspective, it's really a problem only for folks who are using ZFS on expanders hooked to an LSI card.

The point is the same - this system configuration leads to toxic behavior, and no IT manager when they hear about this kind of bug is going to go anywhere near it until it's fixed. If it isn't fixed and soon, then admins will just remember the problem and that no one fixed it, and go sour on the whole platform.

I now understand the reason why some of the Sun ZFS engineers calling Nexenta a joke.

People who are entrusted with massive storage implementations for their enterprises are hyper sensitive to potential problems that affect the availability and integrity of that storage. People should be pulling the fire alarm triggers over a big like this, not saying that the users configurations aren't valid.

picker · Sep 21, 2010

> Sun's OWN JBOD's use expanders,
yup.... but they don't sell sata disks for them

> and have for some time and supported SATA disks in them as well.
all expanders "support" sata drives, but sun and nexenta don't.
BTW, Garrett is VP of engineering at nexenta and one of three people
who have committed code into http://www.illumos.org/

if you want to try b147 goto http://openindiana.org/download/

> scare the hell out of IT managers
correct, lets all watch the FUD..

sweloop64 · Sep 21, 2010

correct me if I'm wrong, but if I were to test b147, I wouldn't be able to return to b128 with my zfs-setup would I?

sub.mesa · Sep 21, 2010

Yes, just don't upgrade the pool or filesystems. If you keep it at its current version you should only have to import it again on the original system/OS.

mikesm · Sep 21, 2010

picker said:
> Sun's OWN JBOD's use expanders,
yup.... but they don't sell sata disks for them

> and have for some time and supported SATA disks in them as well.
all expanders "support" sata drives, but sun and nexenta don't.
BTW, Garrett is VP of engineering at nexenta and one of three people
who have committed code into http://www.illumos.org/

if you want to try b147 goto http://openindiana.org/download/

> scare the hell out of IT managers
correct, lets all watch the FUD..

Sun does sell JBOD's with SATA disks: See the data sheet here: http://www.sun.com/storage/disk_systems/expansion/datasheet.pdf (which says they support up to 192 SATA disks). They do this not via HBA's but via expanders.

The nexenta folks should be working on finding an answer to the problem rather than saying not to use expanders.

picker · Sep 21, 2010

> They do this not via HBA's but via expanders.

true.. but like netapp, they use an interpret card on each drive tray.

this card issues the GUID and the buss disconnects (NCQ) plus the dual ports and such all in sas to the expander. No sata tunnel across the expander.

EDIT: oops, the photo is of a sata to FC interpret card. I can get a photo of a SAS one when I'm back at work, but its about the same.
http://blogs.sun.com/greg/entry/welcome_to_fishworks claims sata is tunneled. I stand corrected.

mikesm · Sep 21, 2010

picker said:
> They do this not via HBA's but via expanders.

true.. but like netapp, they use an interpret card on each drive tray.

this card issues the GUID and the buss disconnects (NCQ) plus the dual ports and such all in sas to the expander. No sata tunnel across the expander.

EDIT: oops, the photo is of a sata to FC interpret card. I can get a photo of a SAS one when I'm back at work, but its about the same.
http://blogs.sun.com/greg/entry/welcome_to_fishworks claims sata is tunneled. I stand corrected.

This is what I thought - they do use tunneling like everyone else, but the interesting point of that post is the "heavily modified mpt(4) driver". I wonder if this is the root of some of these problems...

quillo · Sep 21, 2010

Looking through the OpenSolaris forums and bug report regarding this I've disabled MSI/MSIX and reduced min_pending and max_pending to 1. So far so good, it has even given a huge boost to multi-threaded read/writes.

mikesm · Sep 22, 2010

quillo said:
Looking through the OpenSolaris forums and bug report regarding this I've disabled MSI/MSIX and reduced min_pending and max_pending to 1. So far so good, it has even given a huge boost to multi-threaded read/writes.

That usually hoses your sequential performance. Have you tested that?

This smells like a driver bug...

sweloop64 · Sep 22, 2010

mikesm said:
This is what I thought - they do use tunneling like everyone else, but the interesting point of that post is the "heavily modified mpt(4) driver". I wonder if this is the root of some of these problems...

Do you mean that they use a different(working) mpt driver in their enterprise products then the the one that is used in the regular (open)solaris?

Stanza33 · Sep 22, 2010

Not that I understand it completely.... but I was reading at www.scsi.org

found this

http://serialstoragewire.net/Articles/2004_0225/developer_article_2_feb.html

snip
By comparison, the ATA protocol is not capable of multi-initiator access or true dual-port capability. SATA devices have no notion of multiple SATA hosts. SATA devices maintain only a single ATA task file register image. Within a SAS domain, its possible that more than one STP initiator port might be vying for access to the same SATA device. This can impact performance or create potential deadlock conditions.

SAS initiators rely on SL_CC (connection control) link layer state machine as the primary mechanism for managing STP connections. In the example above, an STP target port establishes a connection with an STP initiator port by responding to STP_Open with Open_Accept. The SL_CC1 arb select state machine transitions to SL_CC3 connected by transmitting connection open confirmation. Once the connection is open, the STP target port (in the expander) rejects all subsequent connection requests from STP initiator ports by sending an OPEN_REJECT (STP Resources Busy) message to the SL transmitter.
snip

Make any sense to anyone?

quillo · Sep 22, 2010

mikesm said:
That usually hoses your sequential performance. Have you tested that?

This smells like a driver bug...

Yep... Sequential reads are actually faster now. Unsure if it was because of the change to max_pending or MSI settings though.

mikesm · Sep 22, 2010

quillo said:
Yep... Sequential reads are actually faster now. Unsure if it was because of the change to max_pending or MSI settings though.

Cool. If it works that's all you can ask for...

bexamous · Sep 22, 2010

Well LSI driver blows under Linux so I'm not sure why it would be any better in Solaris.

d00dz · Sep 22, 2010

Can someone tell me if Im affect?
I have a Supermicro USAS-L8i Card connected to a Chenbro Expander (24port)
Running Opensolaris svn134.

I dont have a problem with scrubs.

Code:

scrub: scrub completed after 9h12m with 0 errors on Wed Sep 15 19:54:02 2010

What's this mpt driver thingy?

Read/Writes to the server via a 1Gbe connection seem fine.

quillo · Sep 23, 2010

d00dz said:
Can someone tell me if Im affect?
I have a Supermicro USAS-L8i Card connected to a Chenbro Expander (24port)
Running Opensolaris svn134.

Maybe... The L8i uses the same LSI chipset (1068E) but according to the OSol bug report it is only confirmed to exist up to svn126, so maybe 134 is unaffected?

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6894775

Normally array scrubs are fine for me too, I think what killed it for me last time was running a scrub on two arrays at once. I can also trigger the problem by running a large number of file copies to or from the array that run for 1h+.

d00dz · Sep 23, 2010

quillo said:
Maybe... The L8i uses the same LSI chipset (1068E) but according to the OSol bug report it is only confirmed to exist up to svn126, so maybe 134 is unaffected?

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6894775

Normally array scrubs are fine for me too, I think what killed it for me last time was running a scrub on two arrays at once. I can also trigger the problem by running a large number of file copies to or from the array that run for 1h+.

Hmmm Not sure. But I have 20 hard drives, using raidz2. Split in two. So 10 x 2 Devs? It shows up as one big zpool, but split into two.

When I do a scrub it's just zpool scrub POOLNAME.

Is that two "arrays"?

I was using svn116, then updated to svn134. The zpool is still using the version from svn116, ie havent updated the pool. I updated from svn116 because of the cifs windows reboot issue, which is solved in svn130+ I think, but definately svn134.

sub.mesa · Sep 23, 2010

bexamous said:
Well LSI driver blows under Linux so I'm not sure why it would be any better in Solaris.

Under Linux the Marvell 88SE6480 chip used in SuperMicro AOC-SASLP-MV8 works pretty well; but at least in BSD this controller sucks with continuous timeouts and other problems. I've also heard such problems on OpenSolaris and Linux, frankly. People had to compile experimental drivers and it still wouldn't run properly.

SuperMicro AOC-SASLP-MV8, using Marvell 88SE6480 which apparently work in Linux but not BSD.

Honestly i've seen SuperMicro USAS-L8i used in both Linux, BSD and OpenSolaris setups. Doesn't mean you won't ever get problems with that card, but it seems to have the best compatibility.

poloser · Oct 22, 2010

I read somewhere that changing HBA to new SAS2008 chipset worked in same situation (mpt timeouts).
Guys, who suffering this issue, can you try this?

ZFS and SAS expanders with SATA drives a toxic combo?

Limp Gawd

Limp Gawd

2[H]4U

Limp Gawd

2[H]4U

n00b

Limp Gawd

2[H]4U

Limp Gawd

n00b

Limp Gawd

2[H]4U

Limp Gawd

2[H]4U

Limp Gawd

2[H]4U

n00b

n00b

Gawd

Limp Gawd

n00b

Limp Gawd

n00b

Limp Gawd

2[H]4U

Limp Gawd

n00b

Limp Gawd

n00b

Limp Gawd

Limp Gawd

Gawd

n00b

Limp Gawd

[H]ard|Gawd

Weaksauce

n00b

Weaksauce

2[H]4U

n00b