ESXi 4.1 on X8SIL-F

nate1280

n00b
Joined
Feb 14, 2011
Messages
35
After lots of reading, I've finally have an ESXi 4.1u1 setup running on an X8SIL-F (w/ X3450, 8GB Ram) but I seem to be running into a slight problem. The problem persists on 4.1, and 4.1u1, haven't tried 4.0.

The /var/log/ipmi/0 folder grows to the point where there is no space left, so any configuration changes/etc don't get saved; it happened twice until I finally figured out it was this causing the other propblems. I've done searching on vmware communities and on here and have seen only a handful of postings on it and they all say disabling the IPMI (VMKernel.Boot.ipmiEnabled) or disable CIM (UserVars.CIMEnabled) as a work around fixes it; and from testing it works. But of course, one loses all the sensor data as a result. With these enabled, after about 6-7 minutes the SEL files in the folder fill up causing the out of space errors, which intern means any configuration changes made after this point aren't saved.

Since this board seems to be fairly popular among [H] users for lab setups, I'm surprised I've never seen this mentioned in my readings on here (which i've done a lot of, long time lurker). Is there something I'm missing as for configuring this so this does not happen?

I hope I've explained everything properly...

Anyways, is there a way to change the logging behavior, like set a hard file size limit, or something?

=== SOLLUTION ===

I have a solution to this issue, that doesn't involve disabling IPMI in ESXi or CIM.

It seems as tho, from a complete power loss (so new board, or unplugging the power) the sensor statuses are reset. So all fans are listed in a normal state.

Now, if you do not use all the fan headers, the remaining fan headers they will stay in a normal state with an RPM of 0 which seems to cause issues for ESXi (causing the /var/log/ipmi/0/sel and /var/log/ipmi/0/sel.raw file to fill up till MAINSYS is full).

So the solution is to either:

a) use all the fan headers if you have room for the fans

b) boot the machine to the bios and sit there, then take a spare fan, or one you're already got plugged in and plug it into each unused fan header, wait a few seconds, then unplug it.

The effect of doing (b) will cause an alert on the unused fan headers causing them to go into a Lower Non-recoverable state, and have a red box in the IPMI page under sensor readings. Depending on the number of fans you're not using, you'll have 3 events per fan in your event log now (and subsequently, the sel and sel.raw will have these as well but will no longer grow).

I've tested this on my board, I had it powered down and unplugged it for a day or two, when I went to power it back on, the log files started filling again, so I did solution b) and everything is working again.

Additional Note: with using solution (b), this will need to be redone every time there is a complete power loss.
 
Last edited:
Interesting. I'm running an X8DTH-F on my ESXi 4.1 machine and my /var/log/ipmi/0 directory has about 36 kb of data in it. It's really tiny.

Subbed hoping you find an answer. :)
 
So I managed to get this working, I don't know if it was a fluke or what.

But I had decided to set the FRU Asset Tag (I don't know why, but I did), then all of a sudden it started behaving. It's almost as if that alone caused the IPMI to initialize and start working properly. Because after that, two of my fan headers went from normal to alert status (cause that is nothing connected to them).

Now I see all the hardware information in the vSphere client, fan speeds, temps, voltages, etc.

One temperature I don't see is the CPU temp, which I do find strange but haven't looked into it.
 
So I managed to get this working, I don't know if it was a fluke or what.

But I had decided to set the FRU Asset Tag (I don't know why, but I did), then all of a sudden it started behaving. It's almost as if that alone caused the IPMI to initialize and start working properly. Because after that, two of my fan headers went from normal to alert status (cause that is nothing connected to them).

Now I see all the hardware information in the vSphere client, fan speeds, temps, voltages, etc.

One temperature I don't see is the CPU temp, which I do find strange but haven't looked into it.

How do you set this tag?
 
Using the program from http://ipmiutil.sourceforge.net/

And used the ifru app, with the -a switch and just set the asset tag to X8SIL-F, be sure to specify the node (-N), user and password (-U and -P).

As I had mentioned, it almost seems like this kick started something into working cause since then everything has been fine.
 
Using the program from http://ipmiutil.sourceforge.net/

And used the ifru app, with the -a switch and just set the asset tag to X8SIL-F, be sure to specify the node (-N), user and password (-U and -P).

As I had mentioned, it almost seems like this kick started something into working cause since then everything has been fine.

so how do you install this?

on the site it says "The ipmiutil software compiles under Linux (Makefile), Windows (buildwin.cmd), Solaris (Makefile), and FreeBSD (Makefile).
See descriptions of each utility function below. "

there is no buildwin.cmd in the files that you download.

When i double click on the ifru one it does nothing
 
Grab the win32 zip (ipmiutil-2.7.4-win32.zip) extract it to a folder, open a command prompt, goto that folder, and ifru -N <ip> -U <user> -P <pass> -a <tag you want>

That should do it.
 
Grab the win32 zip (ipmiutil-2.7.4-win32.zip) extract it to a folder, open a command prompt, goto that folder, and ifru -N <ip> -U <user> -P <pass> -a <tag you want>

That should do it.

thanks, I will give this a try. Do i only have to do it once or every time IPMI is powered down?

so for default settings it should be

ifru -N 192.168.1.X -U ADMIN -P ADMIN -a

does that look right?
 
yep, make sure to put a string for the asset tag, and only need to set it once.

did you check to see if it was the IPMI logs filling up?

after you set it, i'd reboot the machine so esx starts off from a fresh boot, and monitor the /var/log/ipmi/0 folder for those 2 files, if you have some fans not attached you'll get some initial alerts cause of that.

also, in the IPMI page sensor readings, you should see any fans that aren't connected change to red from green (at least it did for me).
 
ifru -N 192.168.1.X -U ADMIN -P ADMIN -a X8SIL-F

X8SIL-F becomes the asset tag, limited to 16 characters. or use whatever you'd like for the asset tag
 
Ok looks like i did it, how does it look?



you only need to do this once right? not every time ipmi loses power?

oCGyC.jpg
 
this has not worked, maybe it is something else you did?

I kept doing the vdf -h watching the MAINSYS fill all the way up:confused:
 
When I was dealing with that issue, it worked for me, i'm guessing you rebooted after setting the asset tag?
 
I'm not sure then, when I was going through everything, that's the only thing I did that seemed to correct it.

Are you using all the fan headers?
 
in the IPMI web page, under server health, sensor readings; do the remaining 4 fans show as red with a Lower Non-recoverable as the status? or green with status as normal?
 
in the IPMI web page, under server health, sensor readings; do the remaining 4 fans show as red with a Lower Non-recoverable as the status? or green with status as normal?

they do not show a color, just this

FAN 1 Normal 1880 R.P.M
FAN 2 N/A Not Present!
FAN 3 N/A Not Present!
FAN 4 N/A Not Present!
FAN 5 N/A Not Present!
 
I know one other thing I had tried when I was troubleshooting this, I grabbed some extra fans and just plugged them in so it registered them as being there (don't think i rebooted when i did this tho), but the log files still grew... they shows as normal with the rpm afterwards so they were being sensed, but I just disconnected them since it seemed as tho they had no effect.

thats when i tried setting the ipmi asset tag, then i just left the machine alone then swapped the two variables i had mentioned (disabled the ipmi boot variable, and re-enabled the CIM) then just left the machine alone while i worked on some VMs

so i went and checked the log file out of curiosity again, and the folder was there (altho, it wasn't when I had rebooted, it came back cause i renabled the CIM) the sel and sel.raw were there, but, they weren't growing, and vcenter confirmed this, it showed me the alerts that were generated from the fan status.

not sure if any of that really made sense, was kinda getting frustrated over it at the time, so didn't really document what I was doing, and now just trying to recall what i did do just before it seemed to correct itself.
 
ok looks like disabling ipmi support in ESXi solves the problem but you cannot monitor most of the system health.

RfFJM.jpg
 
I have a solution to this issue, that doesn't involve disabling IPMI in ESXi or CIM.

It seems as tho, from a complete power loss (so new board, or unplugging the power) the sensor statuses are reset. So all fans are listed in a normal state.

Now, if you do not use all the fan headers, the remaining fan headers they will stay in a normal state with an RPM of 0 which seems to cause issues for ESXi (causing the /var/log/ipmi/0/sel and /var/log/ipmi/0/sel.raw file to fill up till MAINSYS is full).

So the solution is to either:

a) use all the fan headers if you have room for the fans

b) boot the machine to the bios and sit there, then take a spare fan, or one you're already got plugged in and plug it into each unused fan header, wait a few seconds, then unplug it.

The effect of doing (b) will cause an alert on the unused fan headers causing them to go into a Lower Non-recoverable state, and have a red box in the IPMI page under sensor readings. Depending on the number of fans you're not using, you'll have 3 events per fan in your event log now (and subsequently, the sel and sel.raw will have these as well but will no longer grow).

I've tested this on my board, I had it powered down and unplugged it for a day or two, when I went to power it back on, the log files started filling again, so I did solution b) and everything is working again.

I'll update my first post with this as well.
 
Thanks nate. That would explain why I haven't seen this...I'm using all the fan headers. Mind if I use your info on a blog post for my blog articles?
 
Forgot to mention a side-effect of solution (b), every time power is completely lost, the IPMI will be reset, so you need to do the fan thing again.
 
Thanks Nate, but to save on the power bill I turn off my ESXi box when I am not using it. I'm sure a lot of people in the home lab environment do the same.
 
By complete power loss I don't mean just turning it off, I mean a physical loss of power, so either unplugging it from the outlet, or a power outage.

Even if you just power the machine down, its still drawing power (the IPMI and ethernet lights will still be active, as well you'll see the IPMI heartbeat led blinking on the board)
 
I'm having the same issue as well. However, if I restart the machine after disabling IPMI support, I can't connect to the host via the VSphere. I have to click on "Reset System Configuration" on the esxi host.
 
Nate, you are the man. I had a spare fan lying around and plugged into the fan registers that were open until BIOS registered it. Now main sys is at 3%. Excellent work.
 
An update to this...

Went looking for patches today and noticed the patch bundle ESXi410-201104001. After reading about the 2 updates in it, the ESXi410-201104401-SG update, has an interesting entry in its KB (http://kb.vmware.com/kb/1035108) which i pasted below.

The CPU usage of sfcbd becomes higher than the normal, which is around 40% to 60%. The /var/log/sdr_content.raw and /var/log/sel.raw log files might contain the efefefefefefefefefefefef text, and the /var/log might contain a SDR response buffer was wrong size message. This issue occurs because IpmiProvider might use CPU for a long time to process meaningless text such as efefef.
This issue is seen particularly on Fujitsu PRIMERGY servers, but might occur on any other system.

I went about applying this bundle patch to my ESXi, rebooted, re-enabled all the IPMI that I had disabled, even removed all power so fan states were reset then powered everything back up.

After monitoring the sel and sel.raw for the past couple hours, there has only been a 2 asserts (upon boot chassi intrustion ones) but they don't seem to be growing out of control. I also reset the sensors, and deleted the sel and sel.raw just to clear them out, and they haven't come back (yet).

So it looks like that patch fixed this for the X8SIL-F board (and probably other supermicro boards that had this issue). Going to be monitoring for the next couple of days of course, but, it looks good so far :D
 
I tried the update and it didnt seem to fix it... I might be applying it wrong, I used the vsphere update manager thing.

how did you apply yours Nate?


The weird thing is I have 2 of these motherboards in my test lab, and only one is having the issue.
 
I used the command line esxupdate via the tech support shell since I don't have vCenter loaded.

I put ESXi in maintenance mode first, then uploaded the bundle zip to a data store and from did an esxupdate

Code:
esxupdate --bundle=ESXi410-201104001.zip stage
esxupdate --bundle=ESXi410-201104001.zip update
then rebooted, re-enabled ipmi, shutdown, took away power to reset sensors, then powered it back up.
 
Something to think about, if you don't have (or can't use) fans to cover up the unused headers:

As I recall, the tachometer signal from a computer fan is just one pulse per rotation. You could take the power and ground off the header, and run it into a 555 timer to generate the pulses and keep the motherboard happy. If anyone is interested, I can put a computer fan on an oscilloscope to figure out a good frequency to run the 555 at (and post up a schematic/parts list).
 
well shit, apparently its applied but didn't help...
Gurrrrr

Im going to try reflashing my usb stick and see if i can install it
 
Back
Top