OI KP...

bexamous

[H]ard|Gawd
Joined
Dec 12, 2005
Messages
1,670
Okay I've got a ESX5/OI server with a LSI SAS2008 controller (passthrough to OI), connected to it is a HP SAS expander, and then attached to it is a bunch of SATA drives.

I'm running into KP. It seems like with heavy use, eg scrub, I can sometimes get it to happen within a few hours, and in normal use it seems like it happens once every other week or something. Below is log. I'm fairly sure the controller / expander / disks are good, I ran Linux on this same hardware for a long time with no issue. Even if disk is going bad still a bug somewhere if it ends up with a null dereference.

Not sure what is easiest way of dealing with this.

- Currently I actually have two LSI2008 controllers, each had their own zpool, one was a 8 drives, the other had the expander with 16 drives. Should I maybe put all drives on the expander and connect it to the onboard LSI1068E? How do I move zpools from the LSI2008 to 1068E controller though? The drive names are different, SAS2 controller I get a guid and on the 1068E the drives have some other naming method.

- Both zpools I use are v28, should I just ditch oi_151a and move to Solaris11 or FreeBSD9?

Getting rid of sas expander isn't really a long term option, I hope to add more drives sometime soon.

- I guess another option would be bang my head against the wall and try to figure out why I'm hitting this NULL dereference. Not sure I want to waste time on this if there is some easily alternative, eg above options.


Dec 1 17:40:40 eight scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:40 eight Disconnected command timeout for Target 15
Dec 1 17:40:40 eight scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:40 eight mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110d01
Dec 1 17:40:40 eight scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:40 eight mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31120436
Dec 1 17:40:41 eight scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:41 eight Log info 0x31120436 received for target 28.
Dec 1 17:40:41 eight scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Dec 1 17:40:41 eight scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:41 eight Log info 0x31120436 received for target 28.
Dec 1 17:40:41 eight scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Dec 1 17:40:41 eight scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:41 eight Log info 0x31120436 received for target 28.
Dec 1 17:40:41 eight scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Dec 1 17:40:41 eight scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:41 eight Log info 0x31120436 received for target 28.
Dec 1 17:40:41 eight scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Dec 1 17:40:41 eight scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:41 eight Log info 0x31120436 received for target 28.
Dec 1 17:40:41 eight scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Dec 1 17:40:41 eight scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:41 eight Log info 0x31120436 received for target 28.
Dec 1 17:40:41 eight scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Dec 1 17:40:41 eight scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:41 eight Log info 0x31120436 received for target 28.
Dec 1 17:40:41 eight scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Dec 1 17:40:41 eight scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:41 eight Log info 0x31120436 received for target 28.
Dec 1 17:40:41 eight scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Dec 1 17:40:41 eight scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:41 eight Log info 0x31120436 received for target 28.
Dec 1 17:40:41 eight scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Dec 1 17:40:41 eight scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:41 eight Log info 0x31120436 received for target 28.
Dec 1 17:40:41 eight scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
Dec 1 17:40:42 eight scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:42 eight mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31170000
Dec 1 17:40:44 eight scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:44 eight Log info 0x31140000 received for target 15.
Dec 1 17:40:44 eight scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
Dec 1 17:40:44 eight scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:44 eight Log info 0x31140000 received for target 15.
Dec 1 17:40:44 eight scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
Dec 1 17:40:44 eight scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:44 eight Log info 0x31140000 received for target 15.
Dec 1 17:40:44 eight scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
Dec 1 17:40:44 eight scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:44 eight Log info 0x31140000 received for target 15.
Dec 1 17:40:44 eight scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
Dec 1 17:40:44 eight scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:44 eight mptsas_check_task_mgt: Task 0x3 failed. IOCStatus=0x4a IOCLogInfo=0x0 target=15
Dec 1 17:40:44 eight scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:44 eight mptsas_ioc_task_management failed try to reset ioc to recovery!
Dec 1 17:40:45 eight scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:45 eight mpt0 Firmware version v11.0.0.0 (?)
Dec 1 17:40:45 eight scsi: [ID 365881 kern.info] /pci@0,0/pci15ad,7a0@15/pci1000,3020@0 (mpt_sas0):
Dec 1 17:40:45 eight mpt0: IOC Operational.
Dec 1 17:40:53 eight unix: [ID 836849 kern.notice]
Dec 1 17:40:53 eight ^Mpanic[cpu0]/thread=ffffff0011e1dc40:
Dec 1 17:40:53 eight genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=ffffff0011e1d930 addr=20 occurred in module "unix" due to a NULL pointer dereference
Dec 1 17:40:53 eight unix: [ID 100000 kern.notice]
Dec 1 17:40:53 eight unix: [ID 839527 kern.notice] sched:
Dec 1 17:40:53 eight unix: [ID 753105 kern.notice] #pf Page fault
Dec 1 17:40:53 eight unix: [ID 532287 kern.notice] Bad kernel fault at addr=0x20
Dec 1 17:40:53 eight unix: [ID 243837 kern.notice] pid=0, pc=0xfffffffffb86ccfb, sp=0xffffff0011e1da28, eflags=0x10246
Dec 1 17:40:53 eight unix: [ID 211416 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6b8<xmme,fxsr,pge,pae,pse,de>
Dec 1 17:40:53 eight unix: [ID 624947 kern.notice] cr2: 20
Dec 1 17:40:53 eight unix: [ID 625075 kern.notice] cr3: 4400000
Dec 1 17:40:53 eight unix: [ID 625715 kern.notice] cr8: c
Dec 1 17:40:53 eight unix: [ID 100000 kern.notice]
Dec 1 17:40:53 eight unix: [ID 592667 kern.notice] rdi: 20 rsi: ffffff035c8de8f8 rdx: ffffff0011e1dc40
Dec 1 17:40:53 eight unix: [ID 592667 kern.notice] rcx: 9f5e8000 r8: 37bb9f5a07e40 r9: ffffff035c8de6c8
Dec 1 17:40:53 eight unix: [ID 592667 kern.notice] rax: 0 rbx: 30780d289 rbp: ffffff0011e1db70
Dec 1 17:40:53 eight unix: [ID 592667 kern.notice] r10: 3a75b4b7 r11: 0 r12: 0
Dec 1 17:40:53 eight unix: [ID 592667 kern.notice] r13: 0 r14: 20 r15: ffffff035c8de8f8
Dec 1 17:40:53 eight unix: [ID 592667 kern.notice] fsb: fffffd7fff022a40 gsb: fffffffffbc304a0 ds: 0
Dec 1 17:40:53 eight unix: [ID 592667 kern.notice] es: 0 fs: 0 gs: 0
Dec 1 17:40:53 eight unix: [ID 592667 kern.notice] trp: e err: 2 rip: fffffffffb86ccfb
Dec 1 17:40:53 eight unix: [ID 592667 kern.notice] cs: 30 rfl: 10246 rsp: ffffff0011e1da28
Dec 1 17:40:53 eight unix: [ID 266532 kern.notice] ss: 38
Dec 1 17:40:53 eight unix: [ID 100000 kern.notice]
Dec 1 17:40:53 eight genunix: [ID 655072 kern.notice] ffffff0011e1d810 unix:die+dd ()
Dec 1 17:40:53 eight genunix: [ID 655072 kern.notice] ffffff0011e1d920 unix:trap+1799 ()
Dec 1 17:40:53 eight genunix: [ID 655072 kern.notice] ffffff0011e1d930 unix:cmntrap+e6 ()
Dec 1 17:40:53 eight genunix: [ID 655072 kern.notice] ffffff0011e1db70 unix:mutex_enter+b ()
Dec 1 17:40:53 eight genunix: [ID 655072 kern.notice] ffffff0011e1dc20 genunix:taskq_thread+285 ()
Dec 1 17:40:53 eight genunix: [ID 655072 kern.notice] ffffff0011e1dc30 unix:thread_start+8 ()
Dec 1 17:40:53 eight unix: [ID 100000 kern.notice]
Dec 1 17:40:53 eight genunix: [ID 672855 kern.notice] syncing file systems...
Dec 1 17:40:53 eight genunix: [ID 904073 kern.notice] done
Dec 1 17:40:54 eight genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Dec 1 17:41:05 eight genunix: [ID 100000 kern.notice]
Dec 1 17:41:05 eight genunix: [ID 665016 kern.notice] ^M 43% done: 286611 pages dumped,
Dec 1 17:41:05 eight genunix: [ID 495082 kern.notice] dump failed: error 28
 
So FWIW just using fileserver as normal issue would repo once a week or something. If I put all the drives on the HP expander and scrub the array I could get it to fail within a few hours.

I ended up swapping the HP SAS2 Expander with a LSI SAS Expander and its currently on it's 3rd scrub without issue. Seems like its fixed the problem, not sure why/how.

Downside is its slower, only get like 300MB/sec from array I put on the LSI expander... vs closer to 600MB/sec when using the HP expander. Although good thing about LSI expander is it doesn't need a PCIe slot.
 
I am glad you (may) have fixed the problem! I guess HP expander just doesn't work as well as people say it does. :(
 
I just finished my second system using the intel expander with an lsi2008 card. This combinination works well for me.

Plus, if you have a full size case (for a 12x13 motherboard), and install a normal sized motherboard, you can mount the expander into the extra motherboard screws, and not waste a slot.
 
Back
Top