ESXi 4.1 on X8SIL-F

Discussion in 'Virtualized Computing' started by nate1280, Feb 14, 2011.

  1. nate1280

    nate1280 n00b

    Messages:
    35
    Joined:
    Feb 14, 2011
    After lots of reading, I've finally have an ESXi 4.1u1 setup running on an X8SIL-F (w/ X3450, 8GB Ram) but I seem to be running into a slight problem. The problem persists on 4.1, and 4.1u1, haven't tried 4.0.

    The /var/log/ipmi/0 folder grows to the point where there is no space left, so any configuration changes/etc don't get saved; it happened twice until I finally figured out it was this causing the other propblems. I've done searching on vmware communities and on here and have seen only a handful of postings on it and they all say disabling the IPMI (VMKernel.Boot.ipmiEnabled) or disable CIM (UserVars.CIMEnabled) as a work around fixes it; and from testing it works. But of course, one loses all the sensor data as a result. With these enabled, after about 6-7 minutes the SEL files in the folder fill up causing the out of space errors, which intern means any configuration changes made after this point aren't saved.

    Since this board seems to be fairly popular among [H] users for lab setups, I'm surprised I've never seen this mentioned in my readings on here (which i've done a lot of, long time lurker). Is there something I'm missing as for configuring this so this does not happen?

    I hope I've explained everything properly...

    Anyways, is there a way to change the logging behavior, like set a hard file size limit, or something?

    === SOLLUTION ===

    I have a solution to this issue, that doesn't involve disabling IPMI in ESXi or CIM.

    It seems as tho, from a complete power loss (so new board, or unplugging the power) the sensor statuses are reset. So all fans are listed in a normal state.

    Now, if you do not use all the fan headers, the remaining fan headers they will stay in a normal state with an RPM of 0 which seems to cause issues for ESXi (causing the /var/log/ipmi/0/sel and /var/log/ipmi/0/sel.raw file to fill up till MAINSYS is full).

    So the solution is to either:

    a) use all the fan headers if you have room for the fans

    b) boot the machine to the bios and sit there, then take a spare fan, or one you're already got plugged in and plug it into each unused fan header, wait a few seconds, then unplug it.

    The effect of doing (b) will cause an alert on the unused fan headers causing them to go into a Lower Non-recoverable state, and have a red box in the IPMI page under sensor readings. Depending on the number of fans you're not using, you'll have 3 events per fan in your event log now (and subsequently, the sel and sel.raw will have these as well but will no longer grow).

    I've tested this on my board, I had it powered down and unplugged it for a day or two, when I went to power it back on, the log files started filling again, so I did solution b) and everything is working again.

    Additional Note: with using solution (b), this will need to be redone every time there is a complete power loss.
     
    Last edited: Mar 1, 2011
  2. Mindflux

    Mindflux Limp Gawd

    Messages:
    251
    Joined:
    Feb 5, 2011
    Interesting. I'm running an X8DTH-F on my ESXi 4.1 machine and my /var/log/ipmi/0 directory has about 36 kb of data in it. It's really tiny.

    Subbed hoping you find an answer. :)
     
  3. nate1280

    nate1280 n00b

    Messages:
    35
    Joined:
    Feb 14, 2011
    So I managed to get this working, I don't know if it was a fluke or what.

    But I had decided to set the FRU Asset Tag (I don't know why, but I did), then all of a sudden it started behaving. It's almost as if that alone caused the IPMI to initialize and start working properly. Because after that, two of my fan headers went from normal to alert status (cause that is nothing connected to them).

    Now I see all the hardware information in the vSphere client, fan speeds, temps, voltages, etc.

    One temperature I don't see is the CPU temp, which I do find strange but haven't looked into it.
     
  4. AMD_Gamer

    AMD_Gamer [H]ard as it Gets

    Messages:
    18,277
    Joined:
    Jan 20, 2002
    How do you set this tag?
     
  5. nate1280

    nate1280 n00b

    Messages:
    35
    Joined:
    Feb 14, 2011
    Using the program from http://ipmiutil.sourceforge.net/

    And used the ifru app, with the -a switch and just set the asset tag to X8SIL-F, be sure to specify the node (-N), user and password (-U and -P).

    As I had mentioned, it almost seems like this kick started something into working cause since then everything has been fine.
     
  6. AMD_Gamer

    AMD_Gamer [H]ard as it Gets

    Messages:
    18,277
    Joined:
    Jan 20, 2002
    so how do you install this?

    on the site it says "The ipmiutil software compiles under Linux (Makefile), Windows (buildwin.cmd), Solaris (Makefile), and FreeBSD (Makefile).
    See descriptions of each utility function below. "

    there is no buildwin.cmd in the files that you download.

    When i double click on the ifru one it does nothing
     
  7. nate1280

    nate1280 n00b

    Messages:
    35
    Joined:
    Feb 14, 2011
    Grab the win32 zip (ipmiutil-2.7.4-win32.zip) extract it to a folder, open a command prompt, goto that folder, and ifru -N <ip> -U <user> -P <pass> -a <tag you want>

    That should do it.
     
  8. AMD_Gamer

    AMD_Gamer [H]ard as it Gets

    Messages:
    18,277
    Joined:
    Jan 20, 2002
    thanks, I will give this a try. Do i only have to do it once or every time IPMI is powered down?

    so for default settings it should be

    ifru -N 192.168.1.X -U ADMIN -P ADMIN -a

    does that look right?
     
  9. nate1280

    nate1280 n00b

    Messages:
    35
    Joined:
    Feb 14, 2011
    yep, make sure to put a string for the asset tag, and only need to set it once.

    did you check to see if it was the IPMI logs filling up?

    after you set it, i'd reboot the machine so esx starts off from a fresh boot, and monitor the /var/log/ipmi/0 folder for those 2 files, if you have some fans not attached you'll get some initial alerts cause of that.

    also, in the IPMI page sensor readings, you should see any fans that aren't connected change to red from green (at least it did for me).
     
  10. AMD_Gamer

    AMD_Gamer [H]ard as it Gets

    Messages:
    18,277
    Joined:
    Jan 20, 2002
    what do you mean a string for the asset tag
     
  11. nate1280

    nate1280 n00b

    Messages:
    35
    Joined:
    Feb 14, 2011
    ifru -N 192.168.1.X -U ADMIN -P ADMIN -a X8SIL-F

    X8SIL-F becomes the asset tag, limited to 16 characters. or use whatever you'd like for the asset tag
     
  12. AMD_Gamer

    AMD_Gamer [H]ard as it Gets

    Messages:
    18,277
    Joined:
    Jan 20, 2002
  13. AMD_Gamer

    AMD_Gamer [H]ard as it Gets

    Messages:
    18,277
    Joined:
    Jan 20, 2002
    do you know if a BIOS update fixes this problem.
     
  14. AMD_Gamer

    AMD_Gamer [H]ard as it Gets

    Messages:
    18,277
    Joined:
    Jan 20, 2002
    Ok looks like i did it, how does it look?



    you only need to do this once right? not every time ipmi loses power?

    [​IMG]
     
  15. Mindflux

    Mindflux Limp Gawd

    Messages:
    251
    Joined:
    Feb 5, 2011
    You can set this with ipmiview too.
     
  16. AMD_Gamer

    AMD_Gamer [H]ard as it Gets

    Messages:
    18,277
    Joined:
    Jan 20, 2002
    this has not worked, maybe it is something else you did?

    I kept doing the vdf -h watching the MAINSYS fill all the way up:confused:
     
  17. nate1280

    nate1280 n00b

    Messages:
    35
    Joined:
    Feb 14, 2011
    When I was dealing with that issue, it worked for me, i'm guessing you rebooted after setting the asset tag?
     
  18. AMD_Gamer

    AMD_Gamer [H]ard as it Gets

    Messages:
    18,277
    Joined:
    Jan 20, 2002
    yes i did a reboot
     
  19. nate1280

    nate1280 n00b

    Messages:
    35
    Joined:
    Feb 14, 2011
    I'm not sure then, when I was going through everything, that's the only thing I did that seemed to correct it.

    Are you using all the fan headers?
     
  20. AMD_Gamer

    AMD_Gamer [H]ard as it Gets

    Messages:
    18,277
    Joined:
    Jan 20, 2002
    no just the CPU one.
     
  21. nate1280

    nate1280 n00b

    Messages:
    35
    Joined:
    Feb 14, 2011
    in the IPMI web page, under server health, sensor readings; do the remaining 4 fans show as red with a Lower Non-recoverable as the status? or green with status as normal?
     
  22. AMD_Gamer

    AMD_Gamer [H]ard as it Gets

    Messages:
    18,277
    Joined:
    Jan 20, 2002
    they do not show a color, just this

    FAN 1 Normal 1880 R.P.M
    FAN 2 N/A Not Present!
    FAN 3 N/A Not Present!
    FAN 4 N/A Not Present!
    FAN 5 N/A Not Present!
     
  23. nate1280

    nate1280 n00b

    Messages:
    35
    Joined:
    Feb 14, 2011
    I know one other thing I had tried when I was troubleshooting this, I grabbed some extra fans and just plugged them in so it registered them as being there (don't think i rebooted when i did this tho), but the log files still grew... they shows as normal with the rpm afterwards so they were being sensed, but I just disconnected them since it seemed as tho they had no effect.

    thats when i tried setting the ipmi asset tag, then i just left the machine alone then swapped the two variables i had mentioned (disabled the ipmi boot variable, and re-enabled the CIM) then just left the machine alone while i worked on some VMs

    so i went and checked the log file out of curiosity again, and the folder was there (altho, it wasn't when I had rebooted, it came back cause i renabled the CIM) the sel and sel.raw were there, but, they weren't growing, and vcenter confirmed this, it showed me the alerts that were generated from the fan status.

    not sure if any of that really made sense, was kinda getting frustrated over it at the time, so didn't really document what I was doing, and now just trying to recall what i did do just before it seemed to correct itself.
     
  24. AMD_Gamer

    AMD_Gamer [H]ard as it Gets

    Messages:
    18,277
    Joined:
    Jan 20, 2002
    ok looks like disabling ipmi support in ESXi solves the problem but you cannot monitor most of the system health.

    [​IMG]
     
  25. nate1280

    nate1280 n00b

    Messages:
    35
    Joined:
    Feb 14, 2011
    I have a solution to this issue, that doesn't involve disabling IPMI in ESXi or CIM.

    It seems as tho, from a complete power loss (so new board, or unplugging the power) the sensor statuses are reset. So all fans are listed in a normal state.

    Now, if you do not use all the fan headers, the remaining fan headers they will stay in a normal state with an RPM of 0 which seems to cause issues for ESXi (causing the /var/log/ipmi/0/sel and /var/log/ipmi/0/sel.raw file to fill up till MAINSYS is full).

    So the solution is to either:

    a) use all the fan headers if you have room for the fans

    b) boot the machine to the bios and sit there, then take a spare fan, or one you're already got plugged in and plug it into each unused fan header, wait a few seconds, then unplug it.

    The effect of doing (b) will cause an alert on the unused fan headers causing them to go into a Lower Non-recoverable state, and have a red box in the IPMI page under sensor readings. Depending on the number of fans you're not using, you'll have 3 events per fan in your event log now (and subsequently, the sel and sel.raw will have these as well but will no longer grow).

    I've tested this on my board, I had it powered down and unplugged it for a day or two, when I went to power it back on, the log files started filling again, so I did solution b) and everything is working again.

    I'll update my first post with this as well.
     
  26. NetJunkie

    NetJunkie [H]ardForum Junkie

    Messages:
    9,682
    Joined:
    Mar 16, 2001
    Thanks nate. That would explain why I haven't seen this...I'm using all the fan headers. Mind if I use your info on a blog post for my blog articles?
     
  27. nate1280

    nate1280 n00b

    Messages:
    35
    Joined:
    Feb 14, 2011
    Sure, go for it...
     
  28. nate1280

    nate1280 n00b

    Messages:
    35
    Joined:
    Feb 14, 2011
    Forgot to mention a side-effect of solution (b), every time power is completely lost, the IPMI will be reset, so you need to do the fan thing again.
     
  29. AMD_Gamer

    AMD_Gamer [H]ard as it Gets

    Messages:
    18,277
    Joined:
    Jan 20, 2002
    Thanks Nate, but to save on the power bill I turn off my ESXi box when I am not using it. I'm sure a lot of people in the home lab environment do the same.
     
  30. nate1280

    nate1280 n00b

    Messages:
    35
    Joined:
    Feb 14, 2011
    By complete power loss I don't mean just turning it off, I mean a physical loss of power, so either unplugging it from the outlet, or a power outage.

    Even if you just power the machine down, its still drawing power (the IPMI and ethernet lights will still be active, as well you'll see the IPMI heartbeat led blinking on the board)
     
  31. jenuster

    jenuster Limp Gawd

    Messages:
    154
    Joined:
    Mar 8, 2011
    I'm having the same issue as well. However, if I restart the machine after disabling IPMI support, I can't connect to the host via the VSphere. I have to click on "Reset System Configuration" on the esxi host.
     
  32. jenuster

    jenuster Limp Gawd

    Messages:
    154
    Joined:
    Mar 8, 2011
    Nate, you are the man. I had a spare fan lying around and plugged into the fan registers that were open until BIOS registered it. Now main sys is at 3%. Excellent work.
     
  33. Dave Mishchenko

    Dave Mishchenko n00b

    Messages:
    1
    Joined:
    Apr 19, 2011
  34. Netwerkz101

    Netwerkz101 Gawd

    Messages:
    663
    Joined:
    Jul 11, 2010
    Welcome to the [H], Dave!
     
  35. nate1280

    nate1280 n00b

    Messages:
    35
    Joined:
    Feb 14, 2011
    An update to this...

    Went looking for patches today and noticed the patch bundle ESXi410-201104001. After reading about the 2 updates in it, the ESXi410-201104401-SG update, has an interesting entry in its KB (http://kb.vmware.com/kb/1035108) which i pasted below.

    I went about applying this bundle patch to my ESXi, rebooted, re-enabled all the IPMI that I had disabled, even removed all power so fan states were reset then powered everything back up.

    After monitoring the sel and sel.raw for the past couple hours, there has only been a 2 asserts (upon boot chassi intrustion ones) but they don't seem to be growing out of control. I also reset the sensors, and deleted the sel and sel.raw just to clear them out, and they haven't come back (yet).

    So it looks like that patch fixed this for the X8SIL-F board (and probably other supermicro boards that had this issue). Going to be monitoring for the next couple of days of course, but, it looks good so far :D
     
  36. NetJunkie

    NetJunkie [H]ardForum Junkie

    Messages:
    9,682
    Joined:
    Mar 16, 2001
    Nice. Thanks for the find.
     
  37. ManateeMatt

    ManateeMatt Limp Gawd

    Messages:
    145
    Joined:
    May 27, 2009
    I tried the update and it didnt seem to fix it... I might be applying it wrong, I used the vsphere update manager thing.

    how did you apply yours Nate?


    The weird thing is I have 2 of these motherboards in my test lab, and only one is having the issue.
     
  38. nate1280

    nate1280 n00b

    Messages:
    35
    Joined:
    Feb 14, 2011
    I used the command line esxupdate via the tech support shell since I don't have vCenter loaded.

    I put ESXi in maintenance mode first, then uploaded the bundle zip to a data store and from did an esxupdate

    Code:
    esxupdate --bundle=ESXi410-201104001.zip stage
    esxupdate --bundle=ESXi410-201104001.zip update
    
    then rebooted, re-enabled ipmi, shutdown, took away power to reset sensors, then powered it back up.
     
  39. cymon

    cymon Limp Gawd

    Messages:
    453
    Joined:
    Apr 16, 2009
    Something to think about, if you don't have (or can't use) fans to cover up the unused headers:

    As I recall, the tachometer signal from a computer fan is just one pulse per rotation. You could take the power and ground off the header, and run it into a 555 timer to generate the pulses and keep the motherboard happy. If anyone is interested, I can put a computer fan on an oscilloscope to figure out a good frequency to run the 555 at (and post up a schematic/parts list).
     
  40. ManateeMatt

    ManateeMatt Limp Gawd

    Messages:
    145
    Joined:
    May 27, 2009
    well shit, apparently its applied but didn't help...
    Gurrrrr

    Im going to try reflashing my usb stick and see if i can install it