ESXi 5.0 + LSI MegaRAID SAS 9240-8i

nate1280

n00b
Joined
Feb 14, 2011
Messages
35
So I just recently got my hands on a Lenovo ThinkServer TS430 which happens to have an LSI 9240-8i controller card. I don't have any drives for it yet, trying to source the hot swap tray for cheaper then $70 bucks :( I have the drives (1TB WD Blacks), and going to setup a RAID10 array with 6 drives, once I have trays, for a 3TB datastore.

For the mean time, I hooked up a drive to the internal headers, just so I had some space to play with, and of course, put a USB key in to install ESXi 5.0 onto.

After having a heck of a time getting it to install, it always seemed to hang at loading the megaraid_sas driver, but wound up being I was just impatient and it took a really long time for it to load past that. Used the original release iso, can't quite remember but I don't think that one even booted past the driver. I then tried the driver rollup 2 iso, and it seemed to be fine, then updated to U1, and it still loads, but, there's quite the delay at that specific driver.

Watching the log while it loads, and checking the vmkernel log afterwards i see:

Code:
2012-03-04T18:42:33.518Z cpu4:2178)megaraid_sas: HBA reset handler invoked without an internal reset condition.

Seems like it resets it twice, for some reason? And based on timestamps, takes about 1 minute to get fully loaded. Now, one thing I'm wondering is if its doing this because there are no drives attached or arrays configured? And I'm also wondering if this is normal behaviour? Since this is the first machine I've used that has a hardware controller in it, I don't know what to expect loading time wise of the main hypervisor. My other machine running 4.0u1, loads nice and quick, but, no hardware raid controller card in it.

And checking the HCL, the TS430, and 9240-8i are both listed as supported. Altho, only 1 network adapter is useable (unless the 82579LM driver is loaded).
 
The firmware when I got the machine initially was 20.10.1-0029, but I updated it to the latest firmware of 20.10.1-0077 from LSI's website while I was doing the u1 update to the machine.

But there is still the delay on the latest firmware.
 
Try disablling IRQ remapping... but that issue normally pops up well after boot.
 
Disabled IRQ remapping and rebooted host, unfortunately, still the length delay and resets.

Thought i'd post a bigger snippet of the log containing all the megaraid_sas info, from loading to success:

Code:
2012-03-04T18:41:53.426Z cpu4:2626)Loading module megaraid_sas ...
2012-03-04T18:41:53.428Z cpu4:2626)Elf: 1862: module megaraid_sas has license GPL
2012-03-04T18:41:53.428Z cpu4:2626)module heap: Initial heap size: 1048576, max heap size: 43311104
2012-03-04T18:41:53.428Z cpu4:2626)vmklnx_module_mempool_init: Mempool max 43311104 being used for module: 32

2012-03-04T18:41:53.428Z cpu4:2626)vmk_MemPoolCreate passed for 256 pages

2012-03-04T18:41:53.428Z cpu4:2626)module heap: using memType 2
2012-03-04T18:41:53.429Z cpu4:2626)module heap vmklnx_megaraid_sas: creation succeeded. id = 0x41000ac00000
2012-03-04T18:41:53.429Z cpu4:2626)<6>megasas: 00.00.05.34 Mon. May 2 17:00:00 PDT 2011
2012-03-04T18:41:53.429Z cpu4:2626)PCI: driver megaraid_sas is looking for devices
<6>megasas: 0x1000:0x73:0x1000:0x9240: 2012-03-04T18:41:53.429Z cpu4:2626)domain 0 bus 2:slot 0:func 0
2012-03-04T18:41:53.429Z cpu4:2626)DMA: 524: DMA Engine 'vmklnxpci-0:2:0.0' created.
2012-03-04T18:41:53.429Z cpu4:2626)DMA: 524: DMA Engine 'vmklnxpci-0:2:0.0' created.

 megasas_init_mfi:  Line 4372: skinny Template. instance->pdev->device = 0x732012-03-04T18:41:53.429Z cpu4:2626)megasas: FW now in Ready state
2012-03-04T18:41:53.429Z cpu4:2626)IntrVector: 283: 0x29
2012-03-04T18:41:53.429Z cpu4:2626)VMK_PCI: 1177: device 000:002:00.0 allocated 1 vectors (intrType 3)
2012-03-04T18:41:53.429Z cpu4:2626)MSIX enabled for dev 0000:02:00.0
megasas_init: fw_support_ieee=671088642012-03-04T18:41:53.451Z cpu4:2626)<3>megasas: INIT adapter done 
2012-03-04T18:41:53.516Z cpu0:2626)<3>megasas: max_sectors_per_req = 0x1e0
2012-03-04T18:41:53.516Z cpu0:2626)<3>megasas: tmp_sectors = 0x140
2012-03-04T18:41:53.516Z cpu0:2626)IDT: 991: 0x29 <megasas> sharable (entropy source), flags 0x10
2012-03-04T18:41:53.516Z cpu0:2626)VMK_VECTOR: 137: Added handler for shared vector 41, flags 0x10
2012-03-04T18:41:53.516Z cpu0:2626)<6>megasas: io_attach:  host->irq: 184  host->unique_id: 512  host->can_queue: 25.
2012-03-04T18:41:53.516Z cpu0:2626)<7>megasas: max_sectors : 0x100, cmd_per_lun : 0x80
2012-03-04T18:41:53.516Z cpu0:2626)LinPCI: LinuxPCI_DeviceIsPAECapable:532: PAE capable device at 0000:02:00.0
2012-03-04T18:41:53.516Z cpu0:2626)VMK_PCI: 684: Device 000:002:00.0 name: vmhba2
2012-03-04T18:41:53.516Z cpu0:2626)DMA: 524: DMA Engine 'vmhba2' created.
2012-03-04T18:41:53.982Z cpu6:2650)<6>e1000e: vmnic0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
2012-03-04T18:41:54.886Z cpu3:2083)NetPort: 1427: disabled port 0x2
2012-03-04T18:41:54.886Z cpu3:2083)Uplink: 5244: enabled port 0x2 with mac 50:e5:49:a3:b3:86
2012-03-04T18:42:33.518Z cpu4:2178)megasas: ABORT sn 52 cmd=0xa0 retries=0 tmo=0
2012-03-04T18:42:33.518Z cpu4:2178)<5>0 :: megasas: RESET -52 cmd=a0 retries=0
2012-03-04T18:42:33.518Z cpu4:2178)megaraid_sas: HBA reset handler invoked without an internal reset condition.
2012-03-04T18:45:35.676Z cpu0:2178)<7>megaraid_sas: megasas_wait_for_outstanding: line 2133: AFTER HBA reset handler invoked without an internal reset condition:   took 180 seconds. Max is 180.
2012-03-04T18:45:35.676Z cpu0:2178)megaraid_sas: megasas_do_ocr: line 4136: pdev->device:  0x73 :: disableonlineCtrlReset: 0x0 :: issuepend_done: 0x1
2012-03-04T18:45:35.676Z cpu0:2178)megasas: moving cmd[0]:0x41000ac013a0:0:0x41240151a000 on the defer queue as internal reset in progress.
2012-03-04T18:45:35.676Z cpu0:2178)megasas: waiting_for_outstanding: after issue OCR. 
2012-03-04T18:45:35.676Z cpu0:2178)<5>megasas: reset successful

2012-03-04T18:45:35.676Z cpu0:2178)WARNING: LinScsi: SCSILinuxAbortCommands:1798:Failed, Driver LSI Logic SAS based MegaRAID driver, for vmhba2
2012-03-04T18:45:35.676Z cpu2:2656)<7>megaraid_sas: process_fw_state_change_wq:  instance addr:  0x0x410014a047c0, adprecovery:  0x1
2012-03-04T18:45:35.676Z cpu0:2178)WARNING: ScsiPath: 6045: Set retry timeout for failed TaskMgmt abort for CmdSN  0x0, status Failure, path vmhba2:C0:T0:L0
2012-03-04T18:45:35.676Z cpu2:2656)megaraid_sas: FW detected to be in fault state, restarting it...
2012-03-04T18:45:36.688Z cpu2:2656)ADP_RESET_GEN2: HostDiag=a0
2012-03-04T18:45:37.678Z cpu0:2178)megasas: ABORT sn 52 cmd=0xa0 retries=0 tmo=0
2012-03-04T18:45:37.678Z cpu0:2178)<5>0 :: megasas: RESET -52 cmd=a0 retries=0
2012-03-04T18:45:37.678Z cpu0:2178)megasas: HBA reset handler invoked while adapter internal reset in progress, wait till that's over...
2012-03-04T18:45:46.806Z cpu2:2656)megaraid_sas: FW was restarted successfully, initiating next stage...
2012-03-04T18:45:46.806Z cpu2:2656)megaraid_sas: HBA recovery state machine, state 2 starting...
2012-03-04T18:46:16.818Z cpu2:2656)<6>megasas: Waiting for FW to come to ready state
2012-03-04T18:46:16.840Z cpu2:2656)megasas: FW now in Ready state
2012-03-04T18:46:16.884Z cpu0:2656)megaraid_sas: second stage of reset complete, FW is ready now.
2012-03-04T18:46:17.146Z cpu0:2178)megasas: HBA internal reset condition discovered to be cleared.
2012-03-04T18:46:17.146Z cpu0:2178)megasas: 0:0x41000ac013a0 reset scsi command [a0], 0x34
2012-03-04T18:46:17.146Z cpu0:2178)megaraid_sas: All pending commands have been cleared for reset condition.
2012-03-04T18:46:17.146Z cpu0:2178)<5>megasas: reset successful

2012-03-04T18:46:17.162Z cpu0:2626)<6>megasas_register_aen[5]: already registered
2012-03-04T18:46:17.162Z cpu0:2626)PCI: driver megaraid_sas claimed device 0000:02:00.0
2012-03-04T18:46:17.162Z cpu0:2626)PCI: driver megaraid_sas claimed 1 device 
2012-03-04T18:46:17.162Z cpu0:2626)ScsiNpiv: 1525: GetInfo for adapter vmhba2, [0x4100080d0f80], max_vports=0, vports_inuse=0, linktype=0, state=0, failreason=0, rv=-1, sts=bad0020
2012-03-04T18:46:17.162Z cpu0:2626)Mod: 4015: Initialization of megaraid_sas succeeded with module ID 32.
2012-03-04T18:46:17.162Z cpu0:2626)megaraid_sas loaded successfully.
2012-03-04T18:46:17.255Z cpu5:2675)WARNING: LinuxSignal: 761: ignored unexpected signal flags 0x2 (sig 17)
 
I would call IBM. No reason the firmware should come up in a fault state at boot.
 
Might be, but that'd be the first time I've ever seen something like that. Normally it just comes up, reports no logical units, and life moves on.
 
Thanks for the help lopoetve, gonna try to see what IBM/Lenovo have to say if anything.
 
Just an update to this, did some searching again for new info, and came across a link at the vmware community forums (http://communities.vmware.com/message/2023326).

I searched for this specific option with regards to a TS430 and got a link at lenovo support (http://support.lenovo.com/en_US/downloads/detail.page?DocID=HT072721) while that does not directly relate to ESXi 5.0, it does involve the 9240-8i raid controller.

So, I rebooted the machine and went about changing this option in the bios, saved it, and restarted.

With fingers crossed and watching the boot sequence of ESXi, it loaded right past the megaraid_sas driver loading without stalling/freezing, it just worked! The machine booted quickly, like its suppose to.

After it was booted, I grabbed the vmkernel.log file and searched for the loading of the megaraid_sas driver, and the output is so much different this time, it doesn't reset the controller twice, it just load on through like it should

Here is the snippet from my log pertaining to the loading of megaraid_sas:

Code:
2012-04-13T17:12:55.602Z cpu2:2626)Loading module megaraid_sas ...
2012-04-13T17:12:55.603Z cpu2:2626)Elf: 1862: module megaraid_sas has license GPL
2012-04-13T17:12:55.604Z cpu2:2626)module heap: Initial heap size: 1048576, max heap size: 43311104
2012-04-13T17:12:55.604Z cpu2:2626)vmklnx_module_mempool_init: Mempool max 43311104 being used for module: 32

2012-04-13T17:12:55.604Z cpu2:2626)vmk_MemPoolCreate passed for 256 pages

2012-04-13T17:12:55.604Z cpu2:2626)module heap: using memType 2
2012-04-13T17:12:55.604Z cpu2:2626)module heap vmklnx_megaraid_sas: creation succeeded. id = 0x41000ae00000
2012-04-13T17:12:55.604Z cpu2:2626)<6>megasas: 00.00.05.34-1vmw Mon. May 2 17:00:00 PDT 2011
2012-04-13T17:12:55.604Z cpu2:2626)PCI: driver megaraid_sas is looking for devices
<6>megasas: 0x1000:0x73:0x1000:0x9240: 2012-04-13T17:12:55.604Z cpu2:2626)domain 0 bus 2:slot 0:func 0
2012-04-13T17:12:55.604Z cpu2:2626)DMA: 524: DMA Engine 'vmklnxpci-0:2:0.0' created.
2012-04-13T17:12:55.604Z cpu2:2626)DMA: 524: DMA Engine 'vmklnxpci-0:2:0.0' created.

 megasas_init_mfi:  Line 4371: skinny Template. instance->pdev->device = 0x732012-04-13T17:12:55.604Z cpu2:2626)megasas: FW now in Ready state
2012-04-13T17:12:55.604Z cpu2:2626)IntrVector: 283: 0x31
2012-04-13T17:12:55.604Z cpu2:2626)VMK_PCI: 1177: device 000:002:00.0 allocated 1 vectors (intrType 3)
2012-04-13T17:12:55.604Z cpu2:2626)MSIX enabled for dev 0000:02:00.0
megasas_init: fw_support_ieee=671088642012-04-13T17:12:55.626Z cpu2:2626)<3>megasas: INIT adapter done 
2012-04-13T17:12:55.691Z cpu2:2626)<3>megasas: max_sectors_per_req = 0x1e0
2012-04-13T17:12:55.691Z cpu2:2626)<3>megasas: tmp_sectors = 0x140
2012-04-13T17:12:55.691Z cpu2:2626)IDT: 991: 0x31 <megasas> sharable (entropy source), flags 0x10
2012-04-13T17:12:55.691Z cpu2:2626)VMK_VECTOR: 137: Added handler for shared vector 49, flags 0x10
2012-04-13T17:12:55.691Z cpu2:2626)<6>megasas: io_attach:  host->irq: 184  host->unique_id: 512  host->can_queue: 25.
2012-04-13T17:12:55.691Z cpu2:2626)<7>megasas: max_sectors : 0x100, cmd_per_lun : 0x80
2012-04-13T17:12:55.691Z cpu2:2626)LinPCI: LinuxPCI_DeviceIsPAECapable:532: PAE capable device at 0000:02:00.0
2012-04-13T17:12:55.691Z cpu2:2626)VMK_PCI: 684: Device 000:002:00.0 name: vmhba2
2012-04-13T17:12:55.691Z cpu2:2626)DMA: 524: DMA Engine 'vmhba2' created.
2012-04-13T17:12:55.709Z cpu2:2626)PCI: driver megaraid_sas claimed device 0000:02:00.0
2012-04-13T17:12:55.709Z cpu2:2626)PCI: driver megaraid_sas claimed 1 device 
2012-04-13T17:12:55.709Z cpu2:2626)ScsiNpiv: 1525: GetInfo for adapter vmhba2, [0x4100080d1080], max_vports=0, vports_inuse=0, linktype=0, state=0, failreason=0, rv=-1, sts=bad0020
2012-04-13T17:12:55.709Z cpu2:2626)Mod: 4015: Initialization of megaraid_sas succeeded with module ID 32.
2012-04-13T17:12:55.709Z cpu2:2626)megaraid_sas loaded successfully.

Anyways, thought I'd post this bit of information, in the hopes it helps anyone else who might be struggling with this issue.

Now I'll admit, I really have no idea what this option changed that caused proper behavior, but I'm just glad it did.
 
Back
Top