• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

dual xeon E5310 help

RAD_MAN

n00b
Joined
Dec 5, 2005
Messages
31
I have a dual Xeon 5310 (1.6Ghz. 2x4core) box on an Intel S5000PSL mainboard. I used to run SMP on it without issue but she keeps erroring inside the first few percent and the core crashes. I have a dual X5550 right next to it running -bigadv without issues. Any tips on getting that puppy stable again?

CPU: 2x E5310
MB: Intel S5000PSLSATA
RAM: 8GB DDR2 FB-DIMM
 
Having tons of Clovertown experience I would look at the memory first. FB-DIMMs run hot, really hot and they are prone to failure without active cooling. Even with additional cooling they can still develop problems especially with all slots populated. Beyond that, the next most common problem I've encountered is SATA-related. Change ports to determine if the problem goes away. I've seen two S-771 boards that developed onboard SATA problem forcing me to use a PCI-E SATA card with each issue-plagued system, albeit they were Supermicro boards.
 
If the crashes you're seeing are all the same WU then I'd point towards a corrupt WU. Sometimes a bad WU will continuously be resent to your client until it's reported and removed, it's the way Stanford assigns work. If the crashes are different WUs then the problem lies elsewhere. Hope you get things resolved.
 
i have tried SMP and -bigadv (even though i know it won't finish) and it keeps crashing within an hour or 2. i have run a bunch of stress tests on this guy and i can't get anything to error. i have all the dimm slots populated but they are cooled via ducting in the case with 2 120 mm fans blowing on them.
i really don't use SATA on the board except for the CD-ROM drive, HDD's are on a SAS RAID controller.

i would really hate to have to repalce the RAM on that board as that memory is kinda expensive =(

any tests i can run short of pulling all but one DIMM at a time and running an SMP unit?
 
i have tried SMP and -bigadv (even though i know it won't finish) and it keeps crashing within an hour or 2. i have run a bunch of stress tests on this guy and i can't get anything to error. i have all the dimm slots populated but they are cooled via ducting in the case with 2 120 mm fans blowing on them.
i really don't use SATA on the board except for the CD-ROM drive, HDD's are on a SAS RAID controller.

i would really hate to have to repalce the RAM on that board as that memory is kinda expensive =(

any tests i can run short of pulling all but one DIMM at a time and running an SMP unit?

I have some fbram if you have bad sticks... I would try those...
I have nothing that needs the ram anymore...
 
i have tried SMP and -bigadv (even though i know it won't finish) and it keeps crashing within an hour or 2. i have run a bunch of stress tests on this guy and i can't get anything to error. i have all the dimm slots populated but they are cooled via ducting in the case with 2 120 mm fans blowing on them.
i really don't use SATA on the board except for the CD-ROM drive, HDD's are on a SAS RAID controller.
Then it's a good bet the storage subsystem is not the issue and we can eliminate it.

i would really hate to have to repalce the RAM on that board as that memory is kinda expensive =(
Yes, memory was the pitfall with that architecture until Intel released San Clemente but that was late in the game.

any tests i can run short of pulling all but one DIMM at a time and running an SMP unit?
If you've already ran a battery of tests, I can't really suggest anything else for you to try except removing the DIMMs and testing them individually, unfortunately. :(

BTW, I also have one system that is prone to crashes and I could not narrow it down, but that system is OCed and I'm not going to change anything because I have one foot out of Stanford's door and don't have patience to fiddle with settings anymore.

I have some fbram if you have bad sticks... I would try those... I have nothing that needs the ram anymore...
Well, if I wasn't getting out of folding next month I would take you up on that offer since I have 6 systems that use FB-DIMMs. But, in such a hypothetical scenario, I would only be interested in 800MHz RAM because it's very hard to find around here, faster, easier to OC and apparently runs cooler.
 
i have changed the flags on this box and set it to -smp 4 (instead of -smp 8) and she is running solid for 2 days. would that most likely be heat from the CPUs overheating the RAM? (or something close to that nature). thanks
 
i have changed the flags on this box and set it to -smp 4 (instead of -smp 8) and she is running solid for 2 days. would that most likely be heat from the CPUs overheating the RAM? (or something close to that nature). thanks
It could be if there's a lot of heat blowing onto the memory modules. You said there's active cooling on the RAM, I doubt that would be a problem unless something occurred to the fans that rendered them less efficient. More likely it's CPU(s) overheating if the heat sinks are standard low-profile server models with standard cooling. Many servers aren't designed for folding in mind and use less than optimal HSF components compared to the enthusiast aftermarket.

Regarding the flags, you should try -smp 7, -smp 6 and continue to lower the thread count until stability is obtained. That way we can better determine if it's a heat-related issue or possibly something else. Dropping to half the number of threads from a full thread count and acquiring stability could indicate a number of possible causes to the problem. Did you open the case to see if overall heat dissipation is at 100% efficiency?
 
I had some old xeons... p4 class 2p server... no thermal paste on the heatsinks by default... I couldn't believe it...
 
it is an Intel server case (SC5400 series). the issue is most likely that fact that there is a duct and fan for each CPU that blows air through the CPU heatsink and then blows that (now heated air) through ducting for 4 DIMM modules (2 CPU, 4 dimm per cpu, 8 total). i built a ton of these servers but i never ran them this hard (CPU wise... was always disk I/O bound when customers bought them). i could rig the fans so they run 100% but it is just SOO loud! i might check into adding a fan in the back to increase air flow without blowing my eardrums out.

i guess as just a test i can run the fans 100% and see if it stays stable. i'll let ya'll know.
 
it is an Intel server case (SC5400 series). the issue is most likely that fact that there is a duct and fan for each CPU that blows air through the CPU heatsink and then blows that (now heated air) through ducting for 4 DIMM modules (2 CPU, 4 dimm per cpu, 8 total). i built a ton of these servers but i never ran them this hard (CPU wise... was always disk I/O bound when customers bought them). i could rig the fans so they run 100% but it is just SOO loud! i might check into adding a fan in the back to increase air flow without blowing my eardrums out.

i guess as just a test i can run the fans 100% and see if it stays stable. i'll let ya'll know.

kk... I understand the noise thing... its about 90db in my lab... I keep my earbuds in to preserve my hearing...
 
it is an Intel server case (SC5400 series). the issue is most likely that fact that there is a duct and fan for each CPU that blows air through the CPU heatsink and then blows that (now heated air) through ducting for 4 DIMM modules (2 CPU, 4 dimm per cpu, 8 total). i built a ton of these servers but i never ran them this hard (CPU wise... was always disk I/O bound when customers bought them).
From what I'm gathering, the RAM isn't really actively cooled, not directly in any case. I custom built ducts that direct room-temp air from the back of the case right to the modules. There is a fan at the duct intake (case back) and another fan at the end where the memory modules are. It might seem like overkill but in mid-summer the modules still get toasty and I run all my cases OPEN... This is what I meant by active cooling. Anything less than this and it's error city necessitating my overclocks a substantial decrease. Without such and other like measures, I doubt I would have maintained my top 20 position since '07.

i could rig the fans so they run 100% but it is just SOO loud! i might check into adding a fan in the back to increase air flow without blowing my eardrums out.
Thanks for mentioning these details, it could make all the difference. I never run my systems, servers or otherwise, at anything less than 100% speed when folding. That includes the GPUs, all one dozen of them. Consequently, it sounds like a server room here but I'm guaranteed good OCs with dependable stability year round.

i guess as just a test i can run the fans 100% and see if it stays stable. i'll let ya'll know.
Please do because I'm much more certain that is the problem now. These systems really run on the hot side, especially with factory server cooling components and closed cases. They are simply not designed for 24/7 folding, unfortunately. If you discover heat to be the issue, lower the thread count one core at a time until you establish stability. Each time you lower the flag number, you should drop a couple of degrees.
 
I changed the settings in the BIOS from acustic to performance. This made the fans run significantly faster but it also seems to do much less throttling of the RAM. Now I can't even run with -smp 3. I'm starting to think I'll need dedicated ducting for the memory to keep it cool. Running so few cores on this guy makes it not worth the electricity. Don't know what else I can try.
 
I fully understand where you're coming from. That's one of the big reasons this architecture wasn't a huge success story in the performance arena. I think Intel made a mistake to go with FB-DIMM memory. Glad they decided to ditch it.
 
Back
Top