• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

6903 Frame Time Question

Edward2

n00b
Joined
Jul 8, 2002
Messages
63
I got my new 4p server up and running this morning. Here are the parts that I have.
  • Tyan S8812 mobo
  • 4x 6168 12-core 1.9 GHz cpu's
  • 4x Dynatron A6 heatsinks/fans
  • 16x G Skill DDR3-1600 CL9 1GB memory (16GB total)
  • WD 80GB SATA hard drive
  • Turiq 1000w psu

I installed Ubuntu 10.10, theKraken pre15, Langouste, FAH, HFM, etc. Everything seems to be working, but I wonder about my frame times (25 min for the 6903). From what I have seen, other people are running in the 15-20 min range. Any ideas on what I might have setup wrong or what I need to change? Any help would be much appreciated.
 
Yes you should be faster than that. I would type top in the terminal and make sure the kraken is at 4800%. The flags on the client might be set wrong. If that is running at full speed on all cores then also run lscpu to make sure it is reporting the right speed and specs.
 
TOP shows theKraken-FahCo at roughly 4800% cpu
LSCPU shows 48 cpus, 4 sockets, 12 cores per cpu, 1.9GHz, 64-bit

I don't know how to make a screenshot in Ubuntu, or I would post one.

I have flags set: big, no advmethods, smp, bigadv, verbosity 9
 
What else does top show running ? HFM, Sensors applet anything like that.
 
Congratulations on the build :)

First, I'd recommend going over performance checklist:
http://hardforum.com/showpost.php?p=1038322969
with emphasis on BIOS settings, memory population, et al.

Also, non-FAH CPU activity will result in performance penalty
so make sure to take measurements when the machine is idling.

Personally, I don't like having background tasks on dedicated folders
-- HFM, system monitor, gkrellm, and other things do cause
performance hit. Web browser/irc client are absolute no-nos.

I also recommend turning IRQ balancer off (then reboot):
Code:
sudo  sed   -ie   s/^ENABLED.*/ENABLED=\"0\"/   /etc/default/irqbalance
 
Thanks for the replies everyone.

TOP shows thekraken-fahco, xorg, top, irqbalance, events/9, gnome-terminal, init, kthreadd, and then a bunch of watchdog, migration, ksoftirqd. I'm not running anything besides FAH on the server on purpose.

Using TPC, my CPU temps are between 57-67.

All nodes show the same memory (except node 0 and 1, which are slightly different). 2097152 kB

All nodes show DDR3-1332 MHz, 9-9-9-24-1T, and latency of 49-53.

TEAR, I did see that thread previously, and I verified the BIOS settings you mentioned. I did have a couple of them set wrong, but I am still getting 25 min frame times.

I have not turned the IRQ balancer off yet. I will do that next.
 
Temps of 67 deg are very close to thermal threshold (70) == CPUs may be throttling back.
Use this to check for overheating events:
Code:
sudo TurionPowerControl -htc | grep in.past
 
That may be the problem. I got the following - true, false, true, false, true, false, true, false

Does that mean that all 4 cpu's are overheating at some point? Or just 1 or 2 cpu's overheating multiple times?

Also, I've never used such large cpu's before. Maybe I didn't put enough AS5 paste between the cpu and the heatsink.
 
That means every CPU overheated at some point in the past (HTC is only enabled on first node/die
of each CPU*). Are your fans set up at full speed? As you're suspecting, TIM may also be a factor.

*) run plain: sudo TurionPowerControl -htc for full information

Also, I remember I had an issue w/PWM control on Tyan board == wouldn't set PWM to 100%; needed
to remove PWM wire from HSF connector.
 
I removed all 4 heatsinks and the TIM did not look good. Small dot slightly smaller than a dime. So I cleaned it all, and re-applied the TIM. I then rapidly monitored the CPU temps.

Node 0/1 - varies between 65 - 69
Node 2/3 - varies between 57 - 61
Node 4/5 - varies between 61 - 65
Node 6/7 - varies between 55 - 59

Auto Fan Control is Disabled in the BIOS. I assume this means that the fans will run at 100%. Owners manual is not very clear about this.
 
Good find :)

I'm pretty sure that nodes 0/1 are still hitting thermal threshold... nodes 4/5 are (relatively) pretty hot too
Per http://hardforum.com/showthread.php?t=1647656, they correspond to CPU0 (0/1) and CPU1 (4/5).

As far as auto fan control goes -- you can run a quick test: compare board header vs molex->fan
adapter == "pitch" and level should be the same... if they aren't then BIOS is not setting things up
properly.

Any possibility of cooling upgrade? 212+ work nicely with Tyan boards with minimal mods.
Noctuas are (always) an option too...
 
I agree that CPU0 is still hitting the thermal threshold, and CPU1 is warmer than I would like.

I don't have a molex to 4-pin fan adapter. I may have to pick up a few of those. I actually have 4x hyper 212+ heatsinks. I bought them before I bought this setup. I don't have the parts to mod them with yet, but I guess I need to start looking into that also.

Thank you for all of your help. I really appreciate it. It looks like I've got some work to do to make this setup run at full speed.
 
Two pins are good enough. The idea is to take PWM control out of the equation (the "4th" pin).

You don't need tach pin (the "3rd" pin) either -- if there is a difference, you'll hear it right away.

EDIT: alternatively, you can examine CPU temp difference w/board header vs the adapter
 
Last edited:
I modded a molex to 3-pin fan adapter, so that I could connect the 4-pin cpu fan adapter. It made a HUGE difference. The fan was much, much louder. So I went ahead and modded 3 more, and I now have all 4 cpu fans running at 100%. The CPU temps are 60, 51, 52, 48, and "sudo tpc -htc" says that there has been no throttling.

A new problem arose. The Turiq 1000w PSU has 4 separate 12V rails. Apparently I overloaded one of the rails and 2 of the cpu fans stopped turning, so I immediately killed the power to the computer. I replaced the power supply with a Corsair 850w PSU that I had. It has a single 70A rail, so hopefully I won't have anymore PSU problems.

Now I am just waiting to see what my frame times are. Will let you know.
 
THANK YOU TEAR !!!!

You're assistance and persistence has paid off big. I have now completed 3 frames with the fans at 100%, and my frame time has dropped from 25 min to 16 min 12 sec.
 
np :)

BTW, it's also possible that your memory can be flashed so the board
uses XMP timings thus further improving performance.

What's the full model name of the memory?
 
G Skill ----- F3-12800CL9D-2GBNQ
It is DDR3-1600, PC3-12800, 2 x 1GB, CL9-9-9-24, 1.5V.
 
Nevermind, you posted that before. 9-9-9-24.
 
Last edited:
How big of a difference does the memory speed affect PPD? I also have some G Skill F3-10666CL7Q-8GBXH, DDR-1333, PC3-10666, 4x 2GB, CL7-7-7-21 1.5V. I bought this memory along with the Hyper 212+ heatsinks before I bought this setup.
 
Jump from CL9 to CL7/1333 is noticeable (measured 15s on a P6901. 4p 6174).
 
I will wait on trying the eeprog and d3sak procedures for now, but I will save the links for possible future use. Next time I reboot the server, I'm going to install the CL7 memory. I also want to do something about the CPU0 temp (60C). I may try installing the Hyper 212+ heatsinks. Hopefully they will be much quieter. The Dynatron's are very noisy with the fans at 100%.

Thanks again for all of your assistance. I have learned a lot.
 
......... I also want to do something about the CPU0 temp (60C). I may try installing the Hyper 212+ heatsinks. Hopefully they will be much quieter. The Dynatron's are very noisy with the fans at 100%.....

So much quieter you will want to replace them all. The difference is so drastic that you may be able to hear yourself think again. ;)
 
So much quieter you will want to replace them all. The difference is so drastic that you may be able to hear yourself think again. ;)

Definitely with the 212+. My initial set was $120ish for 4 of them and definitely the best $120 I have spend in a while. 10C lower temps and I can stand to be in the same room :D
 
FYI, I re-applied the AS5 to CPU0 and it seems to have helped a lot. All 4 cpu temps are in the 48-53C range for the 6903 WU, and the temps are in the 50-55C range for the 8101 WU that it just downloaded.

Even with all the reboots and frame time problems I had with the 6903 WU, I still got 400k points for it.
 
I tried to "dump" the SPD settings, but I cannot seem to get it to work.

sudo /usr/bin/eeprog-spd-dump-g34 all
Checking for id head sed od tr dd dmidecode modprobe i2cset eeprog-tear ...done.
Processing CPU 0 ...
Reading DIMM at address 0x50 ...
DIMM at address 0x50 not populated (or encountered I2C error)
Reading DIMM at address 0x51 ...
ERROR: cannot access /dev/i2c-0

The "i2c-0" file does exist in the "dev" directory. Any ideas on what I am doing wrong?
 
Is this 0.7.6-tear10 ?
Hmm, I've just checked it on my Tyan and it's working fine.

No, it doesn't seem you're doing anything wrong. It was able to determine there's
nothing at 0x50. Yet, it failed at 0x51. Very odd.

Looks like we may need to dig into this one if you're up for it.

If you are, what does ls -l /sys/bus/i2c/devices give?
And (if you haven't rebooted the rig since) -- dmesg | tail -10
 
Yes, it is eeprog-0.7.6-tear10.tar.gz.

edward2@Tyan-S8812:~/eeprog$ ls -l /sys/bus/i2c/devices
total 0
lrwxrwxrwx 1 root root 0 2012-07-08 15:17 0-0051 -> ../../../devices/pci0000:00/0000:00:14.0/i2c-0/0-0051
lrwxrwxrwx 1 root root 0 2012-07-08 15:17 0-0053 -> ../../../devices/pci0000:00/0000:00:14.0/i2c-0/0-0053
lrwxrwxrwx 1 root root 0 2012-07-08 15:17 0-0055 -> ../../../devices/pci0000:00/0000:00:14.0/i2c-0/0-0055
lrwxrwxrwx 1 root root 0 2012-07-08 15:17 0-0057 -> ../../../devices/pci0000:00/0000:00:14.0/i2c-0/0-0057
lrwxrwxrwx 1 root root 0 2012-07-04 11:44 i2c-0 -> ../../../devices/pci0000:00/0000:00:14.0/i2c-0
 
Cool, that explains it. Module "eeprog" has been loaded.
Can you unload it (sudo modprobe -r eeprog) and try again, please?
 
edward2@Tyan-S8812:~/eeprog$ sudo modprobe -r eeprog
[sudo] password for edward2:
FATAL: Module eeprog not found.
edward2@Tyan-S8812:~/eeprog$ ls -l /sys/bus/i2c/devices
total 0
lrwxrwxrwx 1 root root 0 2012-07-08 15:17 0-0051 -> ../../../devices/pci0000:00/0000:00:14.0/i2c-0/0-0051
lrwxrwxrwx 1 root root 0 2012-07-08 15:17 0-0053 -> ../../../devices/pci0000:00/0000:00:14.0/i2c-0/0-0053
lrwxrwxrwx 1 root root 0 2012-07-08 15:17 0-0055 -> ../../../devices/pci0000:00/0000:00:14.0/i2c-0/0-0055
lrwxrwxrwx 1 root root 0 2012-07-08 15:17 0-0057 -> ../../../devices/pci0000:00/0000:00:14.0/i2c-0/0-0057
lrwxrwxrwx 1 root root 0 2012-07-04 11:44 i2c-0 -> ../../../devices/pci0000:00/0000:00:14.0/i2c-0
 
I don't know if it matters, but I renamed the eeprog..... directory to be just eeprog. Shorter to type.
 
I overdosed eeprog, that was meant to read eeprom :)
 
Yeah, that worked. I was able to dump all 16 SPD files, and it appears that the md5sum's are the same.
 
Thanks again Tear. :)

I flashed the memory 1 cpu at a time without any problems. I ran Memtest86+ for about 2 hours without any problems. The server is now back up and running again, with the memory at 7-7-7-21-1T.
 
Back
Top