Supermicro H8QGi+-F (tear ?)

torin3

n00b
Joined
Apr 3, 2014
Messages
13
I've been having a problem with my Supermicro H8QGi+-F. It has gone through a few RMAs. This is the most recent contact.

Hello, sorry to bother you again, but I found the cause of the failure of the board I sent back to you.

It was not the power supply. When the component failed on the first motherboard, it damaged one or more of the CPUs. This is what resulted in the second board failing. Unfortuantely, I discovered this when assembling my system with the new board. I had purchased a totally new 1000 watt Corsair power supply, so the only components that were the same as before were the CPUs and the memory. The first attempt to power up had several pins on CPU socket 2 short out and leave the tips of a few of the pins attached to the CPU pads.

Since the other 3 sockets and CPUs appeared undamaged, I started to follow the troubleshooting procedures you suggested the last time. However with one CPU in socket 1, and 1 memory RDIMM in, the fans started up as soon as power was supplied to the board, even before trying the power switch. Trying the power switch had no effect. Holding the power switch for more than 4 seconds did not turn off the motherboard fan. There was no video output through all this.

So, currently I have one verified bad CPU, 3 possibly bad CPUs, and a non-functional board. Since I believe this is from damage caused by the first board failing, I would like to request an RMA. Also, I would like to send the verified bad CPU, and if you have a means of testing the other 3 CPUs, I would like to send them as well. If you don't have a means of testing them, I will purchase a single socket G34 motherboard and test each CPU individually.

Is there any more information you need from me, or any more troubleshooting that you would like me to do?

Thank you.

torin3

I wound up getting a phone call from the tech. The upshot is they are denying the RMA based on my using an unrecommended PSU (though there doesn't appear to be a list of recommended PSUs) that voided my warranty because it damaged the first board due to 'having a different startup timing'. The PSU was an OCZ 1000 Watt model: OCZ1000PXS. It worked for a week after I had the problem with the first board in another single processor system.

Orion456 over at OCF suggested I come over here and see if tear can help or have a suggestion.

Thanks!
 
I guess I'm asking if anybody here has used that model power supply with one of the SM 4P boards and what their experience was. Or if anybody has an idea of how to repair it. Or how to deal with SM RMA in this situation. Sorry if I wasn't clear earlier.

Thank you.
 
I do have 1 OCZ 1000w on my rig and it's been working fine.. don't have the model on top of my head now. Will check and let you know.
 
Question. Are you running the stock BIOS or the [H] OC one?

Normally a PSU should have zero affect on a MB. If you are OCing the boards, the VRM could draw much more voltage/amperage then they where designed for and cause them to burn out. Even if OCing the boards, I have not heard of a CPU being fried.
 
I do have 1 OCZ 1000w on my rig and it's been working fine.. don't have the model on top of my head now. Will check and let you know.

Thank you, much appreciated!

Question. Are you running the stock BIOS or the [H] OC one?

Normally a PSU should have zero affect on a MB. If you are OCing the boards, the VRM could draw much more voltage/amperage then they where designed for and cause them to burn out. Even if OCing the boards, I have not heard of a CPU being fried.

I was planning on eventually switching to the [H] OC one, but I hadn't gotten that far. I was running with desktop DDR3 for about a week, and then I switched over to RDIMM DDR3, but it wouldn't run under load with all 16 slots filled. It would run with 12 slots filled. I thought it might be a memory problem and got a few more of the RDIMMs, and started swapping them to try and find a good set to run with 16. I wasn't able to. And it started failing with 12 sticks. When it fails, I have to turn off the PSU and let the caps on the board drain before restarting, or it wouldn't restart. One of these times, when I turned on the PSU, a component near the 2 8-pin EPS connectors sparked, and let out some smoke. After I sent the board in for RMA, I moved the power supply to a different system and ran that one off of it for about a week with no problems. It was running 2 7970 cards folding.
 
So you cannot get it to POST no matter what?

What chips? And we are loading only socket 1?

Do try to clear the bios and clean the pads. What cooler are you using? For kicks, try to just set it on there. We want to verify that your pads are clean, and that nothing is over tightened which may also cause incorrect contact with the pads. Do also inspect the socket to insure that there are no bent pins or contamination. Do blow out all dimm sockets as well with compressed air.

Try a single chip and a single stick of ram. If you get it to post, ditch their bios. I had a hard time getting things to work with their latest bios, iirc, 3.x. I would have to look but I should have an older bios available. The OCN bios is based off of 1.x iirc.

I had troubles getting things to work with their bios and would go to an older bios and it would just work. Not sure what the changes were.

That's pretty shitty they will not warranty it. I used OCZ 1200w psus as well as 500w/620w/750w psus.
 
No post. And the fans headers power on as soon as the PSU is turned on, before using the power switch on the motherboard. I've tried one of the visually undamaged CPUs in socket 1 with one RDIMM in the blue tabbed memory slot farthest away from the CPU. Heatsink was just sitting on the CPU for the test (CM 212+). I blew out the board well before trying to start it up the first time before it was damaged by the bad CPU. Also, blew it out again with compressed air before trying with just the single CPU.

I'm going to be getting a known good CPU, either another 6172, or a 6168 (a friend close by recently upgraded from 6168s to 6172s), and I will try again, but given the fans behavior, I'm not that hopeful.
 
Try tabs closest, iirc, white is where you start.(been awhile).
 
Have you also cleaned the contact pads on the CPU with rubbing alcohol? Sometime dirty pads can cause problems.

If the fans turn on, but the system does not boot, sounds like the mb might be in power failure mode. Unplug the PSU, clear the BIOS/CIOS, pull the battery, and then reassemble. Try this with only one CPU install in socket #1 and a single DIMM installed. Make sure you have the right CPU and DIMM sockets.
 
I'll clean the pads and test again following the procedure you recommend 402.

Btw, the conversation I'm having with the tech at SM is getting real frustrating. They don't have a list of compatible power supplies and the tech doesn't seem to get why I'm saying they don't have a real warranty on the bare motherboards if they can deny warranty work for using an incompatible power supply.
:mad:
 
I'll clean the pads and test again following the procedure you recommend 402.

Btw, the conversation I'm having with the tech at SM is getting real frustrating. They don't have a list of compatible power supplies and the tech doesn't seem to get why I'm saying they don't have a real warranty on the bare motherboards if they can deny warranty work for using an incompatible power supply.
:mad:

Who are you talking to over there? ALL of the women there are rude and do not listen. There are 2 guys in tech that I spoke with. One was very helpful and the other one "no you round eye dooo eeet wong". I am going to hell.

If they cannot produce a list of approved PSUs, I would agree they are in the wrong. But you also said that you have a known bad cpu?
 
Who are you talking to over there? ALL of the women there are rude and do not listen. There are 2 guys in tech that I spoke with. One was very helpful and the other one "no you round eye dooo eeet wong". I am going to hell.

A guy named Jerry.

If they cannot produce a list of approved PSUs, I would agree they are in the wrong. But you also said that you have a known bad cpu?

The known bad CPU is, I believe, because it was damaged by the component failing on the first motherboard. It worked for about 3 weeks. After the board died, it appears to have killed the 2nd board (though there was no visible signs of damage in the sockets or on the CPU). The 3rd board, it shorted 2 sets of 4-6 pins, and welded the tips of 3 of them to the pads on the CPU.

So, I'm pretty sure this CPU will kill any board it is put in and powered up, but I'm sure it became like this due to being damaged by the failure of the first board.
 
I feel the need to share my pain....

torin3 said:
From: torin3@
Sent: Friday, April 04, 2014 8:03 AM

To: Technical Support
Subject: Re: ticket #SM1402200243---System will not power on. [JH]



I have been looking and I am unable to find a list of approved/recommended power supplies on your website. Could you send me the link for it, please?


Jerry said:

torin3 said:
So, effectively, you don't have a warranty on your motherboards, only the server barebones.

Jerry said:
Not sure what do you mean. We do have warranty on the motherboard. It’s 3 year warranty.


torin3 said:
If you are going to deny a warranty based on using the wrong power supply, then you need to have a list of right / acceptable power supplies. If not, you don't have a warranty, you have a discretionary repair policy.

Jerry said:
If you buy a Corsair power supply you can’t find a compatible motherboard list as well. There is too many power supplies out there that we cannot guarantee any one of them you picked up is 100% compatible with Supermicro motherboards.

torin3 said:
I'm sorry if I'm not being clear, but your point is largely irrelevant to what I'm saying.

I'm certainly not asking SM to test all the power supplies on the market. But if SM is able to deny warranty work based on an incompatible power supply, SM then needs to have at least one power supply listed that they have verified is compatible. You are doing that for the barebones, though it only lists the wattage and 80 plus rating, so the customer has no idea the manufacturer or model until purchasing a barebones.

I am planning on getting another 4 processor motherboard, and was looking to find what power supply I would have to get to make sure it would be under the warranty. If it had been listed before I bought my first one, I would have gotten one on that list to avoid this problem in the first place. So I now need to decide if I'm going to spring for a full barebones server to make sure I'm covered by the warranty, or if I will buy the motherboard, knowing that it does not have a real warranty, but that it will be up to Supermicro's discretion as to whether or not they will repair or replace it if it fails due to a bad component.

Also, I have purchased quite a few motherboards from other companies that do indeed list compatible items, usually specific brands/models of RAM, and state that they won't warranty the board if items that were not listed as compatible are used.

Jerry said:
We do have compatible memory modules and hard drive list as well.



It’s under “links and resources” http://www.supermicro.com/aplus/motherboard/opteron6000/sr56x0/h8qgi_-f.cfm

torin3 said:
Is there a supervisor I can speak to?
 
Torin,
The OCZ psu i use is OCZ-XZ 1000 ... and 2 PC-power cooling 950w .. they are all working fine.
 
I've heard of them doing repairs on boards for $50. Their service does suck. I got a board back once they said was "new", but was smeared with TIM all over it, front and back. It was gross.

tear has been unplugged taking care of some other business, you may be able to PM him but I would not hold my breath for a response. I know him and FNtastic had done a "no post" bios flash.
 
Well, I didn't get to talk to a supervisor, but he did agree to take it back and check/repair it. Though I'm still not sure he really understood what I was saying to him.
 
Well, I didn't get to talk to a supervisor, but he did agree to take it back and check/repair it. Though I'm still not sure he really understood what I was saying to him.

That's the feel I get from them. One board I sent them, they sent it back and said nothing was wrong. WTF. Wasted 4-5 weeks x2 to send it right back to them.
 
So you cannot get it to POST no matter what?

What chips? And we are loading only socket 1?

Do try to clear the bios and clean the pads. What cooler are you using? For kicks, try to just set it on there. We want to verify that your pads are clean, and that nothing is over tightened which may also cause incorrect contact with the pads. Do also inspect the socket to insure that there are no bent pins or contamination. Do blow out all dimm sockets as well with compressed air.

Try a single chip and a single stick of ram. If you get it to post, ditch their bios. I had a hard time getting things to work with their latest bios, iirc, 3.x. I would have to look but I should have an older bios available. The OCN bios is based off of 1.x iirc.

I had troubles getting things to work with their bios and would go to an older bios and it would just work. Not sure what the changes were.

That's pretty shitty they will not warranty it. I used OCZ 1200w psus as well as 500w/620w/750w psus.

Clearing the BIOS seems to have stopped the fans coming on as soon as power is applied. However with all 3 unwelded CPUs, I get no video, beep, or post. Also, it will not turn the fans off when the power button is held for more than 4 seconds once it has been turned on with that button.

Try tabs closest, iirc, white is where you start.(been awhile).

The closest white tab slot is labeld P1DIMM4B. The one farthest away that is blue is labled P1DIMM1A. I tried both, but there was no difference.

I did order a 2P G34 board today from Ebay, and I've got a friend who has a set of 4 6100 series CPUs (I don't rmember the model number, but they are 8 cores) that is giving them to me. I'll be able to verify the board is good with those chips, and then use the board to see if my 6172s are good or not. (I'm not going to try the one that got welded) Then, when I get the board back, I should have at least one set of known good CPUs, and fully tested ram to work with.
 
If I understand the email correctly, they will replace the board.

I got a 2P board and have tested my RAM and 3 remaining 6172 CPUs. I found 2 bad sticks, and I seem to have one marginal stick that I haven't isolated yet. I got 1 set of errors on test 7 out of a 24 hour run with 16 sticks of 1 GB RDIMMs. I'll gradually rotate my RAM to try and find the bad stick.

But when I get the board back I should be good to go.

Thanks everybody for the help.
 
Apparently I did not understand the email correctly.

The tech, knowing I had burnt pins in the socket, told me to send it in. The RMA department there denied the RMA for burnt pins.

When I asked the tech why he would have me waste the $30 to ship it to them when it would be denied for burnt pins, I was told their techs needed to verify it.

@()%&*)@#($&* git!
 
Did you ask them what it would take to repair? I had a older Socket F 4P that died after a couple months of owning it. When I sent it in, they denied the RMA because it was out of warranty( OEM vs Retail ). I asked if anything could be done and they said they could repair for $100. At the time, the board was worth $600, so the $100 for repairing was worth it.
 
Following up on it, they said that the reason I had to send it in was to see if they could repair the socket. They say it is unrepairable, even if I paid them to replace the socket.
 
Found a deal on 3 bent 6174s for $100, as-is. 1 out of the 3 worked. I'm thinking the other two might be fine if I could find some way to bend them back so that the pins can touch all the pads.

H8QG6+-F arrived yesterday. Working just fine right now with 1 6174 and 3 6172s.
 
Back
Top