Need Help starting up 4P SM H8QGI+-F

Discussion in 'Distributed Computing' started by thinklet, Jul 21, 2012.

  1. thinklet

    thinklet Limp Gawd

    Messages:
    142
    Joined:
    Jul 4, 2012
    My 2nd folding rig is not starting up at all. Just assembled.
    Symptoms:
    No Video, No sound from speaker during POST
    6128's, 16 X 1 GB, Power switch and Reset switch work normal
    Fans all power up
    DP1 Led comes on solid, stays on for 30 seconds then fast blink
    DP3 Power Led never lights.
    Monitor was hooked up to first machine along with KB and Mouse
    New Modular power supply

    I need some direction to help me solve this problem, all equipment
    bought from another folder on Team 33.

    Any help would be most appreciated.
    Best regards, Charlie
     
  2. tear

    tear [H]ard|DCer of the Year 2011

    Messages:
    1,567
    Joined:
    Jul 25, 2011
    When building such a big system it's always good practice to make it
    incremental process... so in situations like yours you don't go nuts when someone
    advises to...

    ...remove all CPUs and RAM except for the first CPU and its RAM.

    Ok, I'm not advising it just *yet*... it may come to that though.



    Also, very important, don't let the board rest on ESD bag. Make sure it's lifted a bit.
    Thick magazine works short-term, too; not sure if this applies to you.

    Make sure that JPW2 and JPW3 are connected.

    Reset CMOS (per the manual). With power completely removed from the board
    (PSU unplugged) short JPB1 contacts for 30 seconds (to be certain).
     
  3. thinklet

    thinklet Limp Gawd

    Messages:
    142
    Joined:
    Jul 4, 2012
    Removed Power supply plug, waited 2 min, removed battery, checked measures 3.03, shorted JBT1 pads for 45 seconds, put back together and nothing happened until I pushed power switch, unit came on with no change. Measured Power Supply all outputs are normal +12.14, -11.96, +3.41, +5.04, 750 watt single rail modular.Best, Charlie
     
  4. KMac

    KMac [H]ard|DCer of the Month - June 2012

    Messages:
    554
    Joined:
    Dec 30, 2002
    I had a similar issue with a GL board. Swapping CPU's 1 and 2 fixed it for me.
    Good luck.
     
  5. thinklet

    thinklet Limp Gawd

    Messages:
    142
    Joined:
    Jul 4, 2012
    tear, I forgot to answer about the mounting, I installed on a custom wire frame with 1" standoffs. JPW2 and JPW3 are connected, the power supply has dual CPU sockets.Thanks, Charlie

    So should I remove the CPU's and memory in reverse order 4, 3, 2 or should I remove everything but CPU 1 and it's memory 4 sticks in the blue sockets?
    I used musky's mod for the fans but I also used 1- 1/8" Socket head cap screws with a 7/64 ball driver for attachment, worked great for me and made it easy to install.
    Best, Charlie
     
    Last edited: Jul 22, 2012
  6. tear

    tear [H]ard|DCer of the Year 2011

    Messages:
    1,567
    Joined:
    Jul 25, 2011
    No other ideas than to boot only with CPU1 and its RAM...

    I'm a bit tired though so I may not be thinking clearly... luckily, we have many 4P users :)
     
  7. thinklet

    thinklet Limp Gawd

    Messages:
    142
    Joined:
    Jul 4, 2012
  8. thinklet

    thinklet Limp Gawd

    Messages:
    142
    Joined:
    Jul 4, 2012
    tear, I removed the #4 CPU and the board powered up and loaded Ubuntu off the HDD
    How do you recommend that I proceed? I have another set of 6128's that I could substitute for # 4. I assume that if other CPU will work in #4 that original CPU is bad, alternately if no other CPU will work in # 4 then the socket is bad? Thanks for all your help. Best regards, Charlie
     
  9. bowlinra

    bowlinra Limp Gawd

    Messages:
    195
    Joined:
    Apr 30, 2012
    I believe I remember some other threads about some troubleshooting steps, so I'm running off memory.

    I understand, the motherboard should be able to boot normally with just one CPU and one stick of ram. I'd suggest going to the minimum, until you have something to build on. If can't post, swap to another CPU, then a different stick of memory.

    I would print out the manual and check all the jumpers are set to the default positions.

    Best of luck.
     
  10. bowlinra

    bowlinra Limp Gawd

    Messages:
    195
    Joined:
    Apr 30, 2012
    Good to hear you have some progress.

    I'd isolate the CPU and see if you can boot. I'd remove all the cpus and put the CPU in question to CPU #1 with memory and see if you can boot.
     
  11. Grandpa_01

    Grandpa_01 [H]ard|DCer of the Year 2013

    Messages:
    1,157
    Joined:
    Jun 4, 2011
    Some times a cpu will not boot in 1 socket but it will in another socket but since you have extra cpu's laying around I would just put one of the others in that socket and see if it works. Also you may have a bad stick of memory you could remove all of the memory except 1 stick at socket #4 try and boot it see what happens if it does not boot try a different stick etc.etc.etc.
     
  12. thinklet

    thinklet Limp Gawd

    Messages:
    142
    Joined:
    Jul 4, 2012
    Sounds like a great plan of action, system is running as of now with 3 CPU's, but is reporting only 10 GB of memory, should be 12 GB. Thanks everyone for all the help.
    Best regards, Charlie
     
  13. tear

    tear [H]ard|DCer of the Year 2011

    Messages:
    1,567
    Joined:
    Jul 25, 2011
    TIM residue on CPU pads or (worse) in a socket could be causing both your issues.
    I've seen cases of TIM "dropping" from CPU or LGA load plate into the socket...;always
    carefully check CPU pads and sockets w/used hardware.

    Bent socket pins are another (though unwelcome) possibility.

    I think thorough visual inspection is in order. You could also speculatively clean CPU
    pads with Q-tips + alcohol...

    Grandpa's suggestion of retrying CPU4 w/o its RAM is also worth trying (after visual
    inspection of CPU pads and the socket, ofc).
    ________________________________________________________________________


    TPC should help you identify CPU w/failing DIMM: http://turionpowercontrol.googlecode.com/files/tpc-0.44-rc1-src.tar.gz

    Install it:
    Code:
    cd ~
    wget http://turionpowercontrol.googlecode.com/files/tpc-0.44-rc1-src.tar.gz
    tar -xzf tpc-0.44-rc1-src.tar.gz
    cd tpc-0.44-rc1-src
    make
    sudo make install
    sudo cp /usr/bin/TurionPowerControl /usr/bin/tpc
    Run it:
    Code:
    sudo modprobe msr
    sudo modprobe cpuid
    sudo tpc -dram
    Look for Node + DCT that has a missing or failed LDIMM. Paste complete output when in doubt.

    Node/DCT map (assuming all CPUs are populated) below:
    [​IMG]
     
    Last edited: Dec 19, 2012
  14. thinklet

    thinklet Limp Gawd

    Messages:
    142
    Joined:
    Jul 4, 2012
    tear, it started to install and got down to the make: g++: Command not found
    I am not sure what this means, I'm only starting to learn Linux

    thinklet@SamsungSSD:~$ wget http://darkswarm.org/tpc-svn64-tear2.tar.gz
    --2012-07-22 06:43:12-- http://darkswarm.org/tpc-svn64-tear2.tar.gz
    Resolving darkswarm.org... 85.11.66.60
    Connecting to darkswarm.org|85.11.66.60|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 102258 (100K) [application/x-gzip]
    Saving to: `tpc-svn64-tear2.tar.gz'

    100%[======================================>] 102,258 121K/s in 0.8s

    2012-07-22 06:43:14 (121 KB/s) - `tpc-svn64-tear2.tar.gz' saved [102258/102258]

    thinklet@SamsungSSD:~$ tar -xzf tpc-svn64-tear2.tar.gz
    thinklet@SamsungSSD:~$ cd tpc-svn64-tear2
    thinklet@SamsungSSD:~/tpc-svn64-tear2$ make
    mkdir -p obj/x86_64
    g++ -O2 -MMD -MF obj/x86_64/.TurionPowerControl.d -MT obj/x86_64/TurionPowerControl.o -c -o obj/x86_64/TurionPowerControl.o TurionPowerControl.cpp
    make: g++: Command not found
    make: *** [obj/x86_64/TurionPowerControl.o] Error 127
    thinklet@SamsungSSD:~/tpc-svn64-tear2$ sudo make install
    [sudo] password for thinklet:

    Best, Charlie
     
  15. tear

    tear [H]ard|DCer of the Year 2011

    Messages:
    1,567
    Joined:
    Jul 25, 2011
    You're missing the compiler...

    Run: sudo apt-get install g++

    Then resume at the failing step.
     
  16. thinklet

    thinklet Limp Gawd

    Messages:
    142
    Joined:
    Jul 4, 2012
    thinklet@SamsungSSD:~/fah$ Run: sudo apt-get install g++
    Run:: command not found
    thinklet@SamsungSSD:~/fah$
     
  17. tear

    tear [H]ard|DCer of the Year 2011

    Messages:
    1,567
    Joined:
    Jul 25, 2011
    Without the Run: part :)
    Code:
    sudo apt-get install g++
     
  18. thinklet

    thinklet Limp Gawd

    Messages:
    142
    Joined:
    Jul 4, 2012
    usr/bin/tpcsudo modprobe msr
    mv: target `msr' is not a directory
    thinklet@SamsungSSD:~/fah/tpc-svn64-tear2$ sudo modprobe cpuid
    thinklet@SamsungSSD:~/fah/tpc-svn64-tear2$ sudo tpc -dram
    sudo: tpc: command not found
    thinklet@SamsungSSD:~/fah/tpc-svn64-tear2$
     
  19. thinklet

    thinklet Limp Gawd

    Messages:
    142
    Joined:
    Jul 4, 2012
    thinklet@SamsungSSD:~/fah/tpc-svn64-tear2$ sudo make install
    install -ps TurionPowerControl /usr/bin
    thinklet@SamsungSSD:~/fah/tpc-svn64-tear2$ sudo mv /usr/bin/TurionPowerControl /usr/bin/tpcsudo modprobe msr
    mv: target `msr' is not a directory
    thinklet@SamsungSSD:~/fah/tpc-svn64-tear2$ sudo modprobe cpuid
    thinklet@SamsungSSD:~/fah/tpc-svn64-tear2$ sudo tpc -dram
    sudo: tpc: command not found
     
  20. tear

    tear [H]ard|DCer of the Year 2011

    Messages:
    1,567
    Joined:
    Jul 25, 2011
    sudo mv /usr/bin/TurionPowerControl /usr/bin/tpcsudo modprobe msr
    ^^ two commands got combined here

    Redo them please:
    Code:
    sudo mv /usr/bin/TurionPowerControl /usr/bin/tpc
    Code:
    sudo modprobe msr
     
  21. thinklet

    thinklet Limp Gawd

    Messages:
    142
    Joined:
    Jul 4, 2012
    Turion Power States Optimization and Control - by blackshard - v0.43


    DRAM Configuration Status

    Node 0 ---
    DCT0: memory type: DDR3 frequency: 1332 MHz
    Tcl=9 Trcd=9 Trp=9 Tras=24 Access Mode:1T Trtp=5 Trc=33 Twr=10 Trrd=4 Tcwl=7 Tfaw=20
    TrwtWB=8 TrwtTO=7 Twtr=5 Twrrd=2 Twrwr=4 Trdrd=3 Tref=2 Trfc0=0 Trfc1=2 Trfc2=0 Trfc3=0 MaxRdLatency=52
    LDIMM0=EMPTY/EMPTY LDIMM1=OK/EMPTY LDIMM2=EMPTY/EMPTY LDIMM3=EMPTY/EMPTY

    DCT1: memory type: DDR3 frequency: 1332 MHz
    Tcl=9 Trcd=9 Trp=9 Tras=24 Access Mode:1T Trtp=5 Trc=33 Twr=10 Trrd=4 Tcwl=7 Tfaw=20
    TrwtWB=8 TrwtTO=7 Twtr=5 Twrrd=2 Twrwr=4 Trdrd=3 Tref=2 Trfc0=0 Trfc1=2 Trfc2=0 Trfc3=0 MaxRdLatency=52
    LDIMM0=EMPTY/EMPTY LDIMM1=OK/EMPTY LDIMM2=EMPTY/EMPTY LDIMM3=EMPTY/EMPTY


    Node 1 ---
    DCT0: memory type: DDR3 frequency: 1332 MHz
    Tcl=9 Trcd=9 Trp=9 Tras=24 Access Mode:1T Trtp=5 Trc=33 Twr=10 Trrd=4 Tcwl=7 Tfaw=20
    TrwtWB=8 TrwtTO=7 Twtr=5 Twrrd=2 Twrwr=4 Trdrd=3 Tref=2 Trfc0=0 Trfc1=2 Trfc2=0 Trfc3=0 MaxRdLatency=51
    LDIMM0=EMPTY/EMPTY LDIMM1=OK/EMPTY LDIMM2=EMPTY/EMPTY LDIMM3=EMPTY/EMPTY

    DCT1: memory type: DDR3 frequency: 1332 MHz
    Tcl=9 Trcd=9 Trp=9 Tras=24 Access Mode:1T Trtp=5 Trc=33 Twr=10 Trrd=4 Tcwl=7 Tfaw=20
    TrwtWB=8 TrwtTO=7 Twtr=5 Twrrd=2 Twrwr=4 Trdrd=3 Tref=2 Trfc0=0 Trfc1=2 Trfc2=0 Trfc3=0 MaxRdLatency=51
    LDIMM0=EMPTY/EMPTY LDIMM1=OK/EMPTY LDIMM2=EMPTY/EMPTY LDIMM3=EMPTY/EMPTY


    Node 2 ---
    DCT0: memory type: DDR3 frequency: 1332 MHz
    Tcl=9 Trcd=9 Trp=9 Tras=24 Access Mode:1T Trtp=5 Trc=33 Twr=10 Trrd=4 Tcwl=7 Tfaw=20
    TrwtWB=6 TrwtTO=5 Twtr=5 Twrrd=2 Twrwr=4 Trdrd=3 Tref=2 Trfc0=0 Trfc1=2 Trfc2=0 Trfc3=0 MaxRdLatency=64
    LDIMM0=EMPTY/EMPTY LDIMM1=FAILED/EMPTY LDIMM2=EMPTY/EMPTY LDIMM3=EMPTY/EMPTY

    DCT1: memory type: DDR3 frequency: 1332 MHz
    Tcl=9 Trcd=9 Trp=9 Tras=24 Access Mode:1T Trtp=5 Trc=33 Twr=10 Trrd=4 Tcwl=7 Tfaw=20
    TrwtWB=8 TrwtTO=7 Twtr=5 Twrrd=2 Twrwr=4 Trdrd=3 Tref=2 Trfc0=0 Trfc1=2 Trfc2=0 Trfc3=0 MaxRdLatency=51
    LDIMM0=EMPTY/EMPTY LDIMM1=OK/EMPTY LDIMM2=EMPTY/EMPTY LDIMM3=EMPTY/EMPTY


    Node 3 ---
    DCT0: memory type: DDR3 frequency: 1332 MHz
    Tcl=9 Trcd=9 Trp=9 Tras=24 Access Mode:1T Trtp=5 Trc=33 Twr=10 Trrd=4 Tcwl=7 Tfaw=20
    TrwtWB=8 TrwtTO=7 Twtr=5 Twrrd=2 Twrwr=4 Trdrd=3 Tref=2 Trfc0=0 Trfc1=2 Trfc2=0 Trfc3=0 MaxRdLatency=51
    LDIMM0=EMPTY/EMPTY LDIMM1=OK/EMPTY LDIMM2=EMPTY/EMPTY LDIMM3=EMPTY/EMPTY

    DCT1: memory type: DDR3 frequency: 1332 MHz
    Tcl=9 Trcd=9 Trp=9 Tras=24 Access Mode:1T Trtp=5 Trc=33 Twr=10 Trrd=4 Tcwl=7 Tfaw=20
    TrwtWB=8 TrwtTO=7 Twtr=5 Twrrd=2 Twrwr=4 Trdrd=3 Tref=2 Trfc0=0 Trfc1=2 Trfc2=0 Trfc3=0 MaxRdLatency=50
    LDIMM0=EMPTY/EMPTY LDIMM1=OK/EMPTY LDIMM2=EMPTY/EMPTY LDIMM3=EMPTY/EMPTY


    Node 4 ---
    DCT0: memory type: DDR3 frequency: 1332 MHz
    Tcl=9 Trcd=9 Trp=9 Tras=24 Access Mode:1T Trtp=5 Trc=33 Twr=10 Trrd=4 Tcwl=7 Tfaw=20
    TrwtWB=6 TrwtTO=5 Twtr=5 Twrrd=2 Twrwr=4 Trdrd=3 Tref=2 Trfc0=0 Trfc1=2 Trfc2=0 Trfc3=0 MaxRdLatency=60
    LDIMM0=EMPTY/EMPTY LDIMM1=FAILED/EMPTY LDIMM2=EMPTY/EMPTY LDIMM3=EMPTY/EMPTY

    DCT1: memory type: DDR3 frequency: 1332 MHz
    Tcl=9 Trcd=9 Trp=9 Tras=24 Access Mode:1T Trtp=5 Trc=33 Twr=10 Trrd=4 Tcwl=7 Tfaw=20
    TrwtWB=8 TrwtTO=7 Twtr=5 Twrrd=2 Twrwr=4 Trdrd=3 Tref=2 Trfc0=0 Trfc1=2 Trfc2=0 Trfc3=0 MaxRdLatency=51
    LDIMM0=EMPTY/EMPTY LDIMM1=OK/EMPTY LDIMM2=EMPTY/EMPTY LDIMM3=EMPTY/EMPTY


    Node 5 ---
    DCT0: memory type: DDR3 frequency: 1332 MHz
    Tcl=9 Trcd=9 Trp=9 Tras=24 Access Mode:1T Trtp=5 Trc=33 Twr=10 Trrd=4 Tcwl=7 Tfaw=20
    TrwtWB=8 TrwtTO=7 Twtr=5 Twrrd=2 Twrwr=4 Trdrd=3 Tref=2 Trfc0=0 Trfc1=2 Trfc2=0 Trfc3=0 MaxRdLatency=51
    LDIMM0=EMPTY/EMPTY LDIMM1=OK/EMPTY LDIMM2=EMPTY/EMPTY LDIMM3=EMPTY/EMPTY

    DCT1: memory type: DDR3 frequency: 1332 MHz
    Tcl=9 Trcd=9 Trp=9 Tras=24 Access Mode:1T Trtp=5 Trc=33 Twr=10 Trrd=4 Tcwl=7 Tfaw=20
    TrwtWB=8 TrwtTO=7 Twtr=5 Twrrd=2 Twrwr=4 Trdrd=3 Tref=2 Trfc0=0 Trfc1=2 Trfc2=0 Trfc3=0 MaxRdLatency=51
    LDIMM0=EMPTY/EMPTY LDIMM1=OK/EMPTY LDIMM2=EMPTY/EMPTY LDIMM3=EMPTY/EMPTY



    Done.
    thinklet@SamsungSSD:~/fah/tpc-svn64-tear2$
     
  22. thinklet

    thinklet Limp Gawd

    Messages:
    142
    Joined:
    Jul 4, 2012
    Node 2 ---
    DCT0: memory type: DDR3 frequency: 1332 MHz
    Tcl=9 Trcd=9 Trp=9 Tras=24 Access Mode:1T Trtp=5 Trc=33 Twr=10 Trrd=4 Tcwl=7 Tfaw=20
    TrwtWB=6 TrwtTO=5 Twtr=5 Twrrd=2 Twrwr=4 Trdrd=3 Tref=2 Trfc0=0 Trfc1=2 Trfc2=0 Trfc3=0 MaxRdLatency=64
    LDIMM0=EMPTY/EMPTY LDIMM1=FAILED/EMPTY LDIMM2=EMPTY/EMPTY LDIMM3=EMPTY/EMPTY

    Node 4 ---
    DCT0: memory type: DDR3 frequency: 1332 MHz
    Tcl=9 Trcd=9 Trp=9 Tras=24 Access Mode:1T Trtp=5 Trc=33 Twr=10 Trrd=4 Tcwl=7 Tfaw=20
    TrwtWB=6 TrwtTO=5 Twtr=5 Twrrd=2 Twrwr=4 Trdrd=3 Tref=2 Trfc0=0 Trfc1=2 Trfc2=0 Trfc3=0 MaxRdLatency=60
    LDIMM0=EMPTY/EMPTY LDIMM1=FAILED/EMPTY LDIMM2=EMPTY/EMPTY LDIMM3=EMPTY/EMPTY
     
  23. thinklet

    thinklet Limp Gawd

    Messages:
    142
    Joined:
    Jul 4, 2012
    tear, I am assuming that since it shows 0-5 nodes this ='s 3 CPU's, does this mean the second memory slot has failed on CPU # 2 and #3?
     
  24. tear

    tear [H]ard|DCer of the Year 2011

    Messages:
    1,567
    Joined:
    Jul 25, 2011
    Ow, that hurts. These correspond to P2_DIMM1A and P3_DIMM1A.

    You could try and reseat them or otherwise check for proper contact (foreign bodies in the slot or whatnot)...

    Is this board running [H] OCNG BIOS?
     
  25. thinklet

    thinklet Limp Gawd

    Messages:
    142
    Joined:
    Jul 4, 2012
    It is running what ever it came with, what is easy way to check?
     
  26. tear

    tear [H]ard|DCer of the Year 2011

    Messages:
    1,567
    Joined:
    Jul 25, 2011
    Yes, load optimal defaults in the BIOS, then reboot. If you see [H] logo on the screen, you're using custom BIOS.
     
  27. thinklet

    thinklet Limp Gawd

    Messages:
    142
    Joined:
    Jul 4, 2012
    I have a meeting that will last about 2 hours, the current fold will be finished also. If I reseat / replace chips in those two locations and it does not change does that mean next step clean pads and thoroughly check sockets? I think I was told that it has the stock Bios. What should I look for?
    Thanks again for all the help! Best regards, Charlie
     
  28. thinklet

    thinklet Limp Gawd

    Messages:
    142
    Joined:
    Jul 4, 2012
    I did not realise that you had sent the last message when I sent mine, is there anyway for the thread to update without refreshing or ?
     
  29. tear

    tear [H]ard|DCer of the Year 2011

    Messages:
    1,567
    Joined:
    Jul 25, 2011
    No problem.

    Forum doesn't refresh automatically if I remember correctly...
     
  30. thinklet

    thinklet Limp Gawd

    Messages:
    142
    Joined:
    Jul 4, 2012
    tear, Sequence of events:
    At present system is operating with 3 CPU's and can see 12 GB Memory, able to boot 10.10 and run folding. I cleaned everything.
    Booted with # 1 CPU and 4 GB works
    Found TIM on pads of # 3 CPU, cleaned off with swab and alcohol
    Moved # 4 CPU (original would not boot) to socket 2 works reads 5 GB added 3 Gb reads 8
    Moved CPU # 3 back to # 3 socket and 1 GB Booted but only read 8 not 9 GB
    Moved # 2 CPU to # 3 socket with 1 GB Booted reads 9GB added 3 GB reads 12
    Moved # 3 CPU to Socket #4 Failed would not boot, It aapears there might be a pin in socket
    #4 that looks strange but only because it does not quite look like the others, old eyes!
    Moved a different 6128 to socket #4 Failed no boot
    Moved a 2nd 6128 to socket # 4 Failed no boot

    At this point it would appear that the board has a flaw in socket #4, The Bios is dated 10/28/2011 and appears to be original?
    Does SuperMicro repair these or is it cost effective? I do not know when this board was originally purchased.
    What would you suggest at this point? Board will limp along on 3 CPU's and 12 GB's, what type of folding would it be capable of? Will the OC mod still work? I need to explore all options. Thanks again for all your help, Best regards, Charlie
    Wow, what a learning experience!
     
  31. 402blownstroker

    402blownstroker [H]ard|DCer of the Month - Nov. 2012

    Messages:
    3,156
    Joined:
    Jan 5, 2006
    Request a RMA from SM. They will get back without if the board is still in warranty or not and if it can be fixed. I have had very good experiences with SM customer service :)
     
  32. thinklet

    thinklet Limp Gawd

    Messages:
    142
    Joined:
    Jul 4, 2012
    Thanks, I have contacted SuperMicro about the procedure to check out the board, seemed like real nice folks to deal with. Since I live on the Central Coast of California, shipping to their office should be easy as they are in San Jose, Ca. At this point the board is functional but wounded. Best regards, Charlie