ESXi 5.5. U1/2 hangs randomly at install time

dxun1

n00b
Joined
Aug 8, 2014
Messages
21
I am trying to install ESXi 5.5 for two weeks now, have been plagued with ESXi 5.5 U1/U2 hanging at random stages for no apparent reason. I am driven mad with absence of any log or error hint whatsoever.

Random stages involved are:
- user loading successfully
- loading lgb
- loading lacp
- loading ipmi_si_drv
I could not discern any pattern between these occurences (I have reset the system at least 100 times).

I have a full console capture of one of these hangs here -> https://www.youtube.com/watch?v=KFzFjyw9kBU
This crash's been captured with ignoreHeadless=TRUE flag passed.....without that flag, this step is successfully resolved but fails on "loading lgb...." step.....result is the same - just a blank screen and either it resets the physical machine or the machine must be manually restarted.

This has been captured with ESXi 5.5 U1 but same thing happens with U2. I am literally lost and cannot even begin to think what to do next.....any help is greatly appreciated

Hardware is given in video, but here it is again:
- SuperMicro X10SLM+-F
- 16 GB Crucial RAM ECC
- booting off of Sandisk Extreme 32GB USB stick (used Rufus to put the ISO on stick, using MBR and not using UEFI)

ignoreHeadless doesn't help.

The ignoreHeadless=TRUE flag used to help until I had decided to further lower the fan speed and did the following:
- used freeIpmi 1.3.4 to checkout fan settings and commit new ones (went without problems)
- since the fans wouldn't change their speed initially I tried to Load Optimized Defaults (that didn't help but I soon discovered I had to turn off/on power to motherboard fully until IPMI loaded the new settings)

So I currently suspect that one of these things "broke" ESXi install - even though I would find that incredible. Trying to load Factory IPMI defaults didn't do anything.

Any ideas or suggestions? I am beyond desperate at these point.
 
I don't have that MB, but when you loaded the defaults, did you make sure that AHCI was still on?
 
AHCI?

I am not sure what you mean - if you're referring to SATA AHCI mode, then yes AHCI was on. In my further experiments I even turned the SATA controller completely off and it didn't do anything. I have also turned off all serial ports, redirections, virtualization capabilities and VT-d instructions in BIOS - to no effect.

My current thinking is simply disabling the as much features in BIOS as I can and trying to see if it changes the way installation behaves:
- disable on-board LAN totally (if there even is such an option, I can't see it. I've just seen the option to provide options for on-board LAN ROM [Disabled/PXE/iScsi] which I am setting to Disabled)
- I'll be removing one RAM stick (out of two - I've seen people reporting that sometimes help even though I would find that bizzare)
- doing a full memtest (even though I've been able to boot Linux live distros)
- clear CMOS (perhaps there is simply a setting hidden that makes ESXi go crazy)
- try to perform installation without KVM-over-IP, so using physical monitor and keyboard (perhaps I'll glean some additional info that I am not seeing over IPMI console); I just hope I'll be able to find some vga cable, haven't used those guys in years
- start looking into Xen virtualization software :confused:

Other than that, I cannot imagine what else is there to do. This a bare-metal system as it is - no hard-disks, no discrete video-cards, no expansion cards; just Xeon CPU, cooler, MBO, memory and a usb drive. Power supply is Antec/Seasonic 400 W unit that should be ridiculously overpowered for these components. I have not seen any signs of CPU overheating (temperature rarely exceeds 40 C).

The components are "server-grade" and if I am having so much trouble even running the install as it is due to any of the components, installing ESXi on any consumer-grade hardware would have to be a complete nightmare and disaster (which it isn't - I have seen very few instances of people having the kind of trouble I am having and they do regularly install ESXi hosts on commodity hardware).

I find it incredible to have such problems with such respectable software and hardware - I am especially dismayed at ESXi installation error reporting - I find it incredible that a critical/enterprise piece of software would simply crash and reboot in 2014 without ANY hint/error message however cryptic it may be. This is nearing the simplest possible installation with fewest moving parts there is - what could I be possibly doing wrong?
 
Try to replace sandisk usb with another brand. I remember seeing something where sandisks usb sticks caused issues.
 
I second trying another USB stick. I have the X10SL7-F, which should be very similar to your setup in basic operation. Running ESXi 5.5U2 off a MicroCenter USB 3.0 16GB stick without problems. Installed a custom ESXi ISO from the IPMI virtual media interface - custom install ISO was just so I could include the NIC drivers for the two onboard ports.

If you haven't updated the BIOS, should just be able to reset to optimized defaults and go.
 
I've found that even though this board is "server" grade, it is beyond picky about hardware. Neither an older Corsair 64GB SSD or a Sandisk 64GB SSD would be reliably recognized on either the onboard SATA ports or the LSI SAS controller. Half the time I'd boot and the disks wouldn't be found. Same with an Intel Pro/1000 PT Quad-Port NIC.

Now I just reboot until the Quad-Port NIC comes online - which once it does it works just fine. For hard drives, I'm using two Crucial MX100 256GB drives and two Samsung 2TB HDDs. All of those work just fine. Also have an older AMD GPU installed, eventually will pass that to a Win7 VM.
 
I just built the setup listed below. When I went to load ESXi on it, I was having problems also. After some research, I found that people were having to update the BIOS, Even though it showed as the same version number and date. I re-flashed mine and now it works fine.

-RaidMax CobraZ case
-Antec 500W PSU
-SM X10SLH-F-O Motherboard
-Intel Xeon E3-1231v3
-32GB Kingston ECC
-2X1GBe Intel NIC
-3x 256GB Samsung SSD
-2x WD Red 2TB
-1x WD Black 1TB
 
So after much fussing around and hair pulling, I must admit I had accidentally stumbled upon the solution - perhaps this might even merit being a sticky or should be added to a general virtualization guide for beginners (if we have one) and for people wanting to build a lean, diskless home VM lab (relying on existing NAS as datastore).

The issue I was having was not the odd hardware or some technical failure (RAM sticks are healthy and removing them or disabling LAN did nothing to help) - it was the fact that I was trying to install ESXi host without and SATA disks attached at all. As soon as I attached the SATA drive I just happened to have handy, everything worked flawlessly.

This was a completely accidental discovery - in a particular moment of desperation, I tried to run CentOS 7 install ISO in an effort to install and boot it from USB flash drive (hopefully to go even deeper and do additional diagnostics). It failed with exactly the same symptoms as the ESXi except - it threw a message akin to "no controller attached". That immediately got my attention as this is the latest stable and server-grade free distribution that isn't supposed have to have any issues. Therefore it must be the kernel itself (which is probably the only thing ESXi distro and CentOS7 have in common) is behaving erratically when no SATA drives are connected. Helped by the message (which was lacking from ESXi efforts), I connected the HDD and - lo and behold - everything ran without a hitch.

My conclusion is this: for the reasons that escape me, it seems (stock?) Linux kernel is having major issues being installed on a system without SATA disk attached. Is it just this is manifesting in this fashion with only this particular motherboard or some other hardware component? I have no idea and did not want to test it any further - I am just glad it works right now. Also, this begs the question of how much a radical departure from Linux ESXi host really is - it seems they still have much in common (even though I've read a few articles claiming they've practically written their own proprietary distro with their own, heavily-customized kernel).

Lastly, I've been working with computers for the past 25 years and this has to be one of the top 5 bizzare/wtf moments I have ever had. HTH someone.
 
So after much fussing around and hair pulling, I must admit I had accidentally stumbled upon the solution - perhaps this might even merit being a sticky or should be added to a general virtualization guide for beginners (if we have one) and for people wanting to build a lean, diskless home VM lab (relying on existing NAS as datastore).

The issue I was having was not the odd hardware or some technical failure (RAM sticks are healthy and removing them or disabling LAN did nothing to help) - it was the fact that I was trying to install ESXi host without and SATA disks attached at all. As soon as I attached the SATA drive I just happened to have handy, everything worked flawlessly.

This was a completely accidental discovery - in a particular moment of desperation, I tried to run CentOS 7 install ISO in an effort to install and boot it from USB flash drive (hopefully to go even deeper and do additional diagnostics). It failed with exactly the same symptoms as the ESXi except - it threw a message akin to "no controller attached". That immediately got my attention as this is the latest stable and server-grade free distribution that isn't supposed have to have any issues. Therefore it must be the kernel itself (which is probably the only thing ESXi distro and CentOS7 have in common) is behaving erratically when no SATA drives are connected. Helped by the message (which was lacking from ESXi efforts), I connected the HDD and - lo and behold - everything ran without a hitch.

My conclusion is this: for the reasons that escape me, it seems (stock?) Linux kernel is having major issues being installed on a system without SATA disk attached. Is it just this is manifesting in this fashion with only this particular motherboard or some other hardware component? I have no idea and did not want to test it any further - I am just glad it works right now. Also, this begs the question of how much a radical departure from Linux ESXi host really is - it seems they still have much in common (even though I've read a few articles claiming they've practically written their own proprietary distro with their own, heavily-customized kernel).

Lastly, I've been working with computers for the past 25 years and this has to be one of the top 5 bizzare/wtf moments I have ever had. HTH someone.

You dont need a sata drive hooked up to install ESXI on a thumb drive. I have done many installs without one hooked up.
 
You dont need a sata drive hooked up to install ESXI on a thumb drive. I have done many installs without one hooked up.

Exactly, that's what I would've sworn to a couple of days ago as well - and had I not seen it all with my own eyes (and lost 2+ weeks), I would've had a hard time believing it. However, on 4 separate tests I have done, each time booting ESXi (or CentOS 7) from a thumb drive has lead to unexplicable random install freezes (ESXi) or simply failing to boot, printing out some spurious error message and then freezing (CentOS 7). And each time I hooked up the SATA drive, everything happily chugged on.

Mind you, I've changed the boot thumb drive as well (I've replaced Sandisk with Kingston's 8GB drive) - no changes, everything was the same: no SATA drive, freeze; SATA drive, all green.

If there is a BIOS setting I've overlooked or some other faux pas commited - I would really like to know but as it stands right now....it doesn't boot without SATA drive no matter what I did to it.

* * *

I have noticed one other thing (probably orthogonal to this issue) - if I plug in HDD to SATA Ports 0 - 3, the hard drive spins up, starts to click as if it's on its death throes and after clicking for 5-10 s, it stops all together. If I plug it in SATA Ports 5 or 6, it runs without any problems. :confused:

Of course, I've checked and re-checked the disk on two separate computers using a USB HDD dock and naturally it checked out fine, SMART didn't complain at all and surface scan reported no errors. That's also another thing I could not explain rationally.

The disk in question is Samsung Spinpoint HD103J 1TB 3.5'' drive.
 
Some bios separate 1-4 and 5-6. That could be the issue. They can different settings for RAID, AHCI, or IDE.
 
That is an incorrect assumption. I currently have 5 hosts in my lab and in production running wtih 0 sata drives in them. All of them are using usb sticks and SAN for vmfs. i had zero issues with installing esxi on them nor am I having any issues running vsphere 5.5 on them
 
Almost sounds like a motherboard issue to me. If having a SATA disk present was in any way required, Cisco UCS wouldn't be possible, nor would quite a lot of huge enterprise customer's infrastructure for vmware products. While I'm glad that worked for you, that is not "normal" behavior. For what it's worth, my lab is 3 hosts, none with a single harddrive in them. At work, we have around 50 hosts with no harddrives at all.
 
My vote is a motherboard issue as well, or at least a quirk. My X10SL7-F had weird issues with two separate SATA SSDs, but no issues with a third. Could install ESXi directly to USB w/o a drive attached.

I played with a ton of BIOS settings, none seemed to make a difference. I wouldn't be surprised if there is a reason why your system won't run w/o a SATA drive attached, but that's based on my experience with my SM board.

Overall I really like my SM board... but I'm not convinced their QA is all that great. Would be curious to other people's input on other motherboard vendors with similar features (onboard SAS, IPMI, etc).
 
I would also agree with you - this seems like a motherboard quirk to me as well. This is a X10SLM+-F which is supposed to work great with ESXi 5.5 and haven't seen any reports of behaviour I had described.

Do you think I should RMA this board or at least inform their support of my findings or you don't think I should bother with this at all?
 
Have you tried reflashing the BIOS, even if it's the same version? Also, I'd try a few different hard drives. If your experience with hard drives is like my experience with SSDs, then that particular hard drive could be incompatible.

If neither of those make a difference, sounds like an RMA would be the next step. You should be able to boot & install directly to USB with the IPMI virtual media without anything else attached.
 
I haven't tried it and I don't think I will - currently, it's working ok and I don't want to waste more time chasing ghosts. If it becomes a problem, I'll deal with it.

All, thanks for your help.
 
Back
Top