• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

LSI HBA Errors - How to interpret?

packetboy

Limp Gawd
Joined
Aug 2, 2009
Messages
288
Setup:
Solaris 11
Supermicro motherboard
8 3TB Hitachi SATA (mix of 5400 and 7200RPM)
Sans Digital 8-drive SAS enclosure (no Expander)
Array is ZFS RaidZ1

Array keep intermittently dropping out...shows a CRC error for each of the 8 drives...remote server and then all is good again for a couple of days/weeks.

Diags from LSIUtil show this:

Code:
Diagnostics menu, select an option:  [1-99 or e/p/w or 0 to quit] 12

Adapter Phy 0:  Link Up
  Invalid DWord Count                                          18
  Running Disparity Error Count                                16
  Loss of DWord Synch Count                                     4
  Phy Reset Problem Count                                       0

Adapter Phy 1:  Link Up
  Invalid DWord Count                                          18
  Running Disparity Error Count                                17
  Loss of DWord Synch Count                                     4
  Phy Reset Problem Count                                       0

Adapter Phy 2:  Link Up
  Invalid DWord Count                                          16
  Running Disparity Error Count                                15
  Loss of DWord Synch Count                                     4
  Phy Reset Problem Count                                       0

Adapter Phy 3:  Link Up
  Invalid DWord Count                                          63
  Running Disparity Error Count                                60
  Loss of DWord Synch Count                                     4
  Phy Reset Problem Count                                       0

Adapter Phy 4:  Link Up
  Invalid DWord Count                                          16
  Running Disparity Error Count                                16
  Loss of DWord Synch Count                                     4
  Phy Reset Problem Count                                       0

Adapter Phy 5:  Link Up
  Invalid DWord Count                                          16
  Running Disparity Error Count                                15
  Loss of DWord Synch Count                                     4
  Phy Reset Problem Count                                       0

Adapter Phy 6:  Link Up
  Invalid DWord Count                                          16
  Running Disparity Error Count                                16
  Loss of DWord Synch Count                                     4
  Phy Reset Problem Count                                       0

Adapter Phy 7:  Link Up
  Invalid DWord Count                                          16
  Running Disparity Error Count                                16
  Loss of DWord Synch Count                                     4
  Phy Reset Problem Count                                       0

Basically small number of errors spread across ALL drives.

Not sure if we have an HBA, cable, enclosure or drive problem.

Thoughts on best approach to isolate?

On bootup, get this in /var/adm/messages :

Code:
Mar  3 09:04:45 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:45 zulu04  mptsas_handle_event_sync: IOCLogInfo=0x31170000
Mar  3 09:04:45 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:45 zulu04  mptsas_handle_event: IOCLogInfo=0x31170000
Mar  3 09:04:45 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:45 zulu04  mptsas_handle_event_sync: IOCLogInfo=0x31170000
Mar  3 09:04:45 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:45 zulu04  mptsas_handle_event: IOCLogInfo=0x31170000
Mar  3 09:04:45 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:45 zulu04  mptsas_handle_event_sync: IOCLogInfo=0x31170000
Mar  3 09:04:45 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:45 zulu04  mptsas_handle_event: IOCLogInfo=0x31170000
Mar  3 09:04:45 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:45 zulu04  mptsas_handle_event_sync: IOCLogInfo=0x31170000
Mar  3 09:04:45 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:45 zulu04  mptsas_handle_event: IOCLogInfo=0x31170000
Mar  3 09:04:45 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:45 zulu04  mptsas_handle_event_sync: IOCLogInfo=0x31170000
Mar  3 09:04:45 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:45 zulu04  mptsas_handle_event: IOCLogInfo=0x31170000
Mar  3 09:04:45 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:45 zulu04  mptsas_handle_event_sync: IOCLogInfo=0x31170000
Mar  3 09:04:45 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:45 zulu04  mptsas_handle_event: IOCLogInfo=0x31170000
Mar  3 09:04:45 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:45 zulu04  mptsas_handle_event_sync: IOCLogInfo=0x31170000
Mar  3 09:04:45 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:45 zulu04  mptsas_handle_event: IOCLogInfo=0x31170000
Mar  3 09:04:45 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:45 zulu04  mptsas_handle_event_sync: IOCLogInfo=0x31170000
Mar  3 09:04:45 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:45 zulu04  mptsas_handle_event: IOCLogInfo=0x31170000
Mar  3 09:04:46 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:46 zulu04  mptsas_check_scsi_io: IOCStatus=0x4b IOCLogInfo=0x31110d00
Mar  3 09:04:46 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:46 zulu04  mptsas_check_scsi_io: IOCStatus=0x4b IOCLogInfo=0x31110d00
Mar  3 09:04:46 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:46 zulu04  mptsas_check_scsi_io: IOCStatus=0x4b IOCLogInfo=0x31110d00
Mar  3 09:04:46 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:46 zulu04  mptsas_check_scsi_io: IOCStatus=0x4b IOCLogInfo=0x31110d00
Mar  3 09:04:46 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:46 zulu04  mptsas_check_scsi_io: IOCStatus=0x4b IOCLogInfo=0x31110d00
Mar  3 09:04:46 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:46 zulu04  mptsas_check_scsi_io: IOCStatus=0x4b IOCLogInfo=0x31110d00
Mar  3 09:04:46 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:46 zulu04  mptsas_check_scsi_io: IOCStatus=0x4b IOCLogInfo=0x31110d00
Mar  3 09:04:46 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:46 zulu04  mptsas_check_scsi_io: IOCStatus=0x4b IOCLogInfo=0x31110d00
Mar  3 09:04:46 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:46 zulu04  mptsas_check_scsi_io: IOCStatus=0x4b IOCLogInfo=0x31110d00
Mar  3 09:04:46 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:46 zulu04  mptsas_check_scsi_io: IOCStatus=0x4b IOCLogInfo=0x31110d00
Mar  3 09:04:46 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:46 zulu04  mptsas_check_scsi_io: IOCStatus=0x4b IOCLogInfo=0x31110d00
Mar  3 09:04:46 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:46 zulu04  mptsas_check_scsi_io: IOCStatus=0x4b IOCLogInfo=0x31110d00
Mar  3 09:04:46 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:46 zulu04  mptsas_check_scsi_io: IOCStatus=0x4b IOCLogInfo=0x31110d00
Mar  3 09:04:46 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:46 zulu04  mptsas_check_scsi_io: IOCStatus=0x4b IOCLogInfo=0x31110d00
Mar  3 09:04:46 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:46 zulu04  mptsas_check_scsi_io: IOCStatus=0x4b IOCLogInfo=0x31110d00
Mar  3 09:04:46 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:46 zulu04  mptsas_check_scsi_io: IOCStatus=0x4b IOCLogInfo=0x31110d00
Mar  3 09:04:47 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:47 zulu04  mptsas_check_scsi_io: IOCStatus=0x48 IOCLogInfo=0x31130000
Mar  3 09:04:47 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:47 zulu04  mptsas_check_scsi_io: IOCStatus=0x48 IOCLogInfo=0x31130000
Mar  3 09:04:47 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:47 zulu04  mptsas_check_scsi_io: IOCStatus=0x48 IOCLogInfo=0x31130000
Mar  3 09:04:47 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:47 zulu04  mptsas_check_scsi_io: IOCStatus=0x48 IOCLogInfo=0x31130000
Mar  3 09:04:47 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:47 zulu04  mptsas_check_scsi_io: IOCStatus=0x48 IOCLogInfo=0x31130000
Mar  3 09:04:47 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:47 zulu04  mptsas_check_scsi_io: IOCStatus=0x48 IOCLogInfo=0x31130000
Mar  3 09:04:47 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:47 zulu04  mptsas_check_scsi_io: IOCStatus=0x48 IOCLogInfo=0x31130000
Mar  3 09:04:47 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:04:47 zulu04  mptsas_check_scsi_io: IOCStatus=0x48 IOCLogInfo=0x31130000
Mar  3 09:06:20 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:06:20 zulu04  mptsas_access_config_page: IOCStatus=0x22 IOCLogInfo=0x30030116
Mar  3 09:06:21 zulu04 genunix: [ID 483743 kern.info] /scsi_vhci/disk@g5000cca228c1b5f7 (sd3) multipath status: degraded: path 17 mpt_sas19/disk@w5000cca228c1b5f7,0 is online
Mar  3 09:06:21 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:06:21 zulu04  mptsas_access_config_page: IOCStatus=0x22 IOCLogInfo=0x30030116
Mar  3 09:06:21 zulu04 genunix: [ID 483743 kern.info] /scsi_vhci/disk@g5000cca228c1859a (sd6) multipath status: degraded: path 10 mpt_sas14/disk@w5000cca228c1859a,0 is online
Mar  3 09:06:22 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:06:22 zulu04  mptsas_access_config_page: IOCStatus=0x22 IOCLogInfo=0x30030116
Mar  3 09:06:23 zulu04 genunix: [ID 483743 kern.info] /scsi_vhci/disk@g5000cca228c1b645 (sd1) multipath status: degraded: path 12 mpt_sas16/disk@w5000cca228c1b645,0 is online
Mar  3 09:06:23 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:06:23 zulu04  mptsas_access_config_page: IOCStatus=0x22 IOCLogInfo=0x30030116
Mar  3 09:06:23 zulu04 genunix: [ID 483743 kern.info] /scsi_vhci/disk@g5000cca225eb9c2f (sd7) multipath status: degraded: path 13 mpt_sas17/disk@w5000cca225eb9c2f,0 is online
Mar  3 09:06:25 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:06:25 zulu04  mptsas_access_config_page: IOCStatus=0x22 IOCLogInfo=0x30030116
Mar  3 09:06:25 zulu04 genunix: [ID 483743 kern.info] /scsi_vhci/disk@g5000cca225ee29e9 (sd4) multipath status: degraded: path 14 mpt_sas13/disk@w5000cca225ee29e9,0 is online
Mar  3 09:06:25 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:06:25 zulu04  mptsas_access_config_page: IOCStatus=0x22 IOCLogInfo=0x30030116
Mar  3 09:06:26 zulu04 genunix: [ID 483743 kern.info] /scsi_vhci/disk@g5000cca225ecbb95 (sd2) multipath status: degraded: path 11 mpt_sas18/disk@w5000cca225ecbb95,0 is online
Mar  3 09:06:26 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:06:26 zulu04  mptsas_access_config_page: IOCStatus=0x22 IOCLogInfo=0x30030116
Mar  3 09:06:27 zulu04 genunix: [ID 483743 kern.info] /scsi_vhci/disk@g5000cca228c1aa0a (sd5) multipath status: degraded: path 15 mpt_sas20/disk@w5000cca228c1aa0a,0 is online
Mar  3 09:06:27 zulu04 scsi: [ID 243001 kern.info] /pci@0,0/pci8086,340e@7/pci1000,3080@0 (mpt_sas12):
Mar  3 09:06:27 zulu04  mptsas_access_config_page: IOCStatus=0x22 IOCLogInfo=0x30030116
Mar  3 09:06:27 zulu04 genunix: [ID 483743 kern.info] /scsi_vhci/disk@g5000cca225eddfcf (sd8) multipath status: degraded: path 16 mpt_sas15/disk@w5000cca225eddfcf,0 is online

Thinking I should probably disable multipathing in the OS as that never seems to cause anything but trouble.
 
Back
Top