I have some problems with my new Xeon 4P rig, need some help?

I don't actively monitor power draw, and it wouldn't be a good comparison anyway since my machines are all 4p G34s.

Maybe I misunderstood since you wrote the quote "Your results are similar to mine - I see ~ 740K ppd Wed 8102s and ~ 440K ppd Wed 8101s." End quote. So I thought you had the same type of machine?
 
The TPF for p8101 is too slow. Did you have a look on the exact frequency your 4p E5 is running on?
You could check it by using a frequency-check tool like turbostat, i7z, or by a script provided by tear:
http://darkswarm.org/freqcheck.sh
The right value should be 3.1GHz for your rig.

How do I run this freqcheck.sh script tool?

I have downloaded it, but can not find out how to run it?
 
You need run:
Code:
chmod +x freqcheck.sh
./freqcheck.sh 2700
and then press enter.

Thanks. I run it and ended up in a loop like this:

vidar@RiggE54650:~/fah$ ./freqcheck.sh 2700
CPUS: 64
FREQ(hex): 0xA8C
Press ENTER to continue or Ctrl+C to exit.

[QUOTE./freqcheck.sh: line 28: wrmsr: command not found

Is there not any kind of spoiler I could put the tekst in?
./freqcheck.sh: line 27: wrmsr: command not found
./freqcheck.sh: line 28: wrmsr: command not found
Deleted some tekst
./freqcheck.sh: line 33: rdmsr: command not found
(standard_in) 2: syntax error
CPU0: MHz
(standard_in) 2: syntax error
CPU1: MHz
(standard_in) 2: syntax error
CPU2: MHz

CPU62: MHz
(standard_in) 2: syntax error
CPU63: MHz
][/QUOTE]
 
It looks like you don't have msr-tools in your system. You could download one from kernel.org:
Code:
wget http://www.kernel.org/pub/linux/utils/cpu/msr-tools/msr-tools-1.1.2.tar.gz
and then make a installation by typing in the following cmds:
Code:
tar -zxf msr-tools-1.1.2.tar.gz
cd msr-tools-1.1.2
make install
Now you should be able to run freqcheck.sh without problems.
 
It looks like you don't have msr-tools in your system. You could download one from kernel.org:
Code:
wget http://www.kernel.org/pub/linux/utils/cpu/msr-tools/msr-tools-1.1.2.tar.gz
and then make a installation by typing in the following cmds:
Code:
tar -zxf msr-tools-1.1.2.tar.gz
cd msr-tools-1.1.2
make install
Now you should be able to run freqcheck.sh without problems.

Make install did not work as expected?
Quote
vidar@RiggE54650:~/msr-tools-1.1.2$ make install
gcc -Wall -g -O2 -fomit-frame-pointer -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -o wrmsr wrmsr.c
gcc -Wall -g -O2 -fomit-frame-pointer -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -o rdmsr rdmsr.c
install -m 755 wrmsr rdmsr /usr/sbin
install: cannot create regular file `/usr/sbin/wrmsr': Permission denied
install: cannot create regular file `/usr/sbin/rdmsr': Permission denied
make: *** [install] Error 1
vidar@RiggE54650:~/msr-tools-1.1.2$
End quote

Update: I used: "sudo make install" and tings didi work, but then I get this error when running the script:

wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
CPU0: MHz
(standard_in) 2: syntax error
CPU1: MHz
(standard_in) 2: syntax error
CPU2: MHz
(standard_in) 2: syntax error
CPU3: MHz
(standard_in) 2: syntax error

Deleted some tekst

CPU61: MHz
(standard_in) 2: syntax error
CPU62: MHz
(standard_in) 2: syntax error
CPU63: MHz
====
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
 
Last edited:
Try running 'sudo modprobe msr' and 'sudo modprobe cpuid' first. And you also need root permission to run freqcheck.sh:
sudo ./freqcheck.sh

Make install did not work as expected?
Quote
vidar@RiggE54650:~/msr-tools-1.1.2$ make install
gcc -Wall -g -O2 -fomit-frame-pointer -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -o wrmsr wrmsr.c
gcc -Wall -g -O2 -fomit-frame-pointer -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -o rdmsr rdmsr.c
install -m 755 wrmsr rdmsr /usr/sbin
install: cannot create regular file `/usr/sbin/wrmsr': Permission denied
install: cannot create regular file `/usr/sbin/rdmsr': Permission denied
make: *** [install] Error 1
vidar@RiggE54650:~/msr-tools-1.1.2$
End quote

Update: I used: "sudo make install" and tings didi work, but then I get this error when running the script:

wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
CPU0: MHz
(standard_in) 2: syntax error
CPU1: MHz
(standard_in) 2: syntax error
CPU2: MHz
(standard_in) 2: syntax error
CPU3: MHz
(standard_in) 2: syntax error

Deleted some tekst

CPU61: MHz
(standard_in) 2: syntax error
CPU62: MHz
(standard_in) 2: syntax error
CPU63: MHz
====
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
wrmsr:eek:pen: No such file or directory
 
Try running 'sudo modprobe msr' and 'sudo modprobe cpuid' first. And you also need root permission to run freqcheck.sh:
sudo ./freqcheck.sh

Thanks, that did the trick!

Look like it running in 2.7GHz or slower, not 3.1GHz.

CPU0: 2697 MHz
CPU1: 2699 MHz
CPU2: 2699 MHz
CPU3: 2699 MHz
CPU4: 2673 MHz
CPU5: 2674 MHz
CPU6: 2674 MHz
CPU7: 2674 MHz
CPU8: 2675 MHz
CPU9: 2676 MHz
CPU10: 2676 MHz
CPU11: 2676 MHz
CPU12: 2677 MHz
CPU13: 2693 MHz
CPU14: 2699 MHz
CPU15: 2699 MHz
CPU16: 2699 MHz
CPU17: 2699 MHz
CPU18: 2699 MHz
CPU19: 2699 MHz
CPU20: 2699 MHz
CPU21: 2699 MHz
CPU22: 2699 MHz
CPU23: 2699 MHz
CPU24: 2699 MHz
CPU25: 2679 MHz
CPU26: 2678 MHz
CPU27: 2686 MHz
CPU28: 2678 MHz
CPU29: 2679 MHz
CPU30: 2679 MHz
CPU31: 2679 MHz
CPU32: 2699 MHz
CPU33: 2699 MHz
CPU34: 2699 MHz
CPU35: 2699 MHz
CPU36: 2699 MHz
CPU37: 2699 MHz
CPU38: 2699 MHz
CPU39: 2699 MHz
CPU40: 2690 MHz
CPU41: 2688 MHz
CPU42: 2687 MHz
CPU43: 2697 MHz
CPU44: 2694 MHz
CPU45: 2687 MHz
CPU46: 2680 MHz
CPU47: 2680 MHz
CPU48: 2686 MHz
CPU49: 2699 MHz
CPU50: 2699 MHz
CPU51: 2699 MHz
CPU52: 2699 MHz
CPU53: 2699 MHz
CPU54: 2699 MHz
CPU55: 2699 MHz
CPU56: 2698 MHz
CPU57: 2696 MHz
CPU58: 2698 MHz
CPU59: 2699 MHz
CPU60: 2699 MHz
CPU61: 2699 MHz
CPU62: 2699 MHz
CPU63: 2699 MHz

I reset the machine and did a Default Optimal, then turned of power saving for the CPUs and voila:

vidar@RiggE54650:~$ sudo ./freqcheck.sh 2700
CPUS: 64
FREQ(hex): 0xA8C
Press ENTER to continue or Ctrl+C to exit.

CPU0: 3099 MHz
CPU1: 3109 MHz
CPU2: 3101 MHz
CPU3: 3099 MHz
CPU4: 3099 MHz
CPU5: 3101 MHz
CPU6: 3101 MHz
CPU7: 3100 MHz
CPU8: 3099 MHz
CPU9: 3099 MHz
CPU10: 3099 MHz
CPU11: 3099 MHz
CPU12: 3099 MHz
CPU13: 3099 MHz
CPU14: 3099 MHz
CPU15: 3099 MHz
CPU16: 3099 MHz
CPU17: 3099 MHz
CPU18: 3099 MHz
CPU19: 3099 MHz
CPU20: 3099 MHz
CPU21: 3099 MHz
CPU22: 3099 MHz
CPU23: 3099 MHz
CPU24: 3099 MHz
CPU25: 3099 MHz
CPU26: 3098 MHz
CPU27: 3090 MHz
CPU28: 3095 MHz
CPU29: 3098 MHz
CPU30: 3097 MHz
CPU31: 3099 MHz
CPU32: 3099 MHz
CPU33: 3099 MHz
CPU34: 3096 MHz
CPU35: 3096 MHz
CPU36: 3096 MHz
CPU37: 3097 MHz
CPU38: 3099 MHz
CPU39: 3100 MHz
CPU40: 3094 MHz
CPU41: 3097 MHz
CPU42: 3099 MHz
CPU43: 3098 MHz
CPU44: 3099 MHz
CPU45: 3099 MHz
CPU46: 3099 MHz
CPU47: 3099 MHz
CPU48: 3099 MHz
CPU49: 3099 MHz
CPU50: 3099 MHz
CPU51: 3099 MHz
CPU52: 3099 MHz
CPU53: 3099 MHz
CPU54: 3099 MHz
CPU55: 3099 MHz
CPU56: 3099 MHz
CPU57: 3099 MHz
CPU58: 3099 MHz
CPU59: 3099 MHz
CPU60: 3099 MHz
CPU61: 3099 MHz
CPU62: 3099 MHz
CPU63: 3099 MHz

Thanks quickz

I will be back with the result.
 
Last edited:
Update: TPF is now 9:22, PPD ~ 606K, not optimal, but better! These result are 15 seconds of what Patriot/Eagle07 get.
Power draw is now 695W.

Do you have any more tricks quickz?
 
Last edited:
After a change of WU (from 8101 to another 8101) the TPF is fallen back from 9:22 to 9:50 and stay there. Is there anything one can do to make it fold on its best?

Update: After a reset of the rig the TPF is now down again, and better then before to 9:11 and PPD ~ 624K.
 
Last edited:
8101 varies quite a lot, i have seen a 2 minute tpf swing on my rig, not sure what causes it but it is inherent in the WU
 
This is a mysterious phenomenon, I've been curious about it for a long time.
Assuming the TPF is 9:11 now, what will the TPF be if you stop all FAH clients and wait for several minitunes and then restart them again? (make sure DLB is engaged)
That is:
Code:
killall -9 fah6
sleep 100
./fah6 -bigadv -smp -forceasm -verbosity 9
After a change of WU (from 8101 to another 8101) the TPF is fallen back from 9:22 to 9:50 and stay there. Is there anything one can do to make it fold on its best?

Update: After a reset of the rig the TPF is now down again, and better then before to 9:11 and PPD ~ 624K.
 
This is a mysterious phenomenon, I've been curious about it for a long time.
Assuming the TPF is 9:11 now, what will the TPF be if you stop all FAH clients and wait for several minitunes and then restart them again? (make sure DLB is engaged)
That is:
Code:
killall -9 fah6
sleep 100
./fah6 -bigadv -smp -forceasm -verbosity 9

Restarted Wu's frame times cannot be trusted...especially the first frame as you are normally a checkpoint inside the % and therefore it ends more quickly.

The first few frames starting back are often quicker then the later ones longer...it typically averages back to where you were.
 
Thank you. I understand that if current status is "slow TPF", restarting FAH cannot make it faster (only a system reboot would do it).
However, I want to know if current status is "fast TPF", would restarting FAH make it turn to a "slow TPF" status?

Restarted Wu's frame times cannot be trusted...especially the first frame as you are normally a checkpoint inside the % and therefore it ends more quickly.

The first few frames starting back are often quicker then the later ones longer...it typically averages back to where you were.
 
Thank you. I understand that if current status is "slow TPF", restarting FAH cannot make it faster (only a system reboot would do it).
However, I want to know if current status is "fast TPF", would restarting FAH make it turn to a "slow TPF" status?

I would look into system processes that can autostart if I were you.

Many of us have noticed oddities in system performance fixed by a restart.
 
This is a mysterious phenomenon, I've been curious about it for a long time.
Assuming the TPF is 9:11 now, what will the TPF be if you stop all FAH clients and wait for several minitunes and then restart them again? (make sure DLB is engaged)
That is:
Code:
killall -9 fah6
sleep 100
./fah6 -bigadv -smp -forceasm -verbosity 9

Just out of curiosity I will try this when the next slow WU comes! This one have been very stable, TPF have varies between 9:11 - 9:13.

Update: After a careful review of the TPF log I can see that it has had a significant drop in the TPF between 70 - 71% up to 10:12, but already at 72% it was down again to 9:11. I do not know what causes this.
 
Last edited:
This is a mysterious phenomenon, I've been curious about it for a long time.
Assuming the TPF is 9:11 now, what will the TPF be if you stop all FAH clients and wait for several minitunes and then restart them again? (make sure DLB is engaged)
That is:
Code:
killall -9 fah6
sleep 100
./fah6 -bigadv -smp -forceasm -verbosity 9

killall -9 fah6
sleep 100

did not stop the fah client. I have to use Ctrl+C to make it stop. I start it again after a while with the command line: ./fah6 -bigadv -smp -forceasm -verbosity 9

The TPF went higher then 10 minute and did not come down again, so I stopped it once again and reset the machine. When it start again the TPF was back to normal with 9:11.
 
Yes, and this phenomenon is also very similar to the E5-2690 test I referred to.
When the system dropped into the "slow TPF" state, all the frequency-check tools still report a frequency of 3.3GHz, and even the power draw is no change, but the real performance is precisely the same as that of 2.9GHz. I don't think this is by chance. I guess the CPU was running into some "protected" mode.
Another question: what is the CPU temperature of your rig when 8101's TPF is 9:11?
 
Last edited:
The CPU temperature is hard to tell exactly at the moment because I do not have software that work with the sensors of the mobo. It should be possible to use the IPMI functions, but I do not know how at the moment. I believe I have to connect me to the IPMI network port from another machine, but before I can do that I have to get my self a new switch. Anyway, there is only cold air coming out of the rig so I believe the temperature is quite low.

After the last reset of the rig, TPF is now as low as 9:08, and that is only 1 sec behind Patriot, so the total performance is about equal now. It seems like there are some small differences from WU to WU for 8101. The power draw is still as low as 705W from the wall, but I guess that is mostly because of the AC voltage system that in my place that is 230V.
 
Last edited:
I have a small program for measuring CPU temperature of intel Xeons. Would you have a try?
Save the code as mytemp.c, and then:
Code:
cc mytemp.c -o mytemp
./mytemp

Code:
#include "stdio.h"

int main() {
        FILE *fp;
        int i;
        for(i=0; i<8; i++) {
                char cmd[256];
                sprintf(cmd, "/sys/devices/platform/coretemp.%d/name", i);
                fp=fopen(cmd, "r");
                if(!fp) continue;
                fclose(fp);
                sprintf(cmd, "cat /sys/devices/platform/coretemp.%d/*input*", i);
                fp=popen(cmd, "r");
                if(fp) {
                        char buf[256];
                        int n;
                        double s, v;
                        s=0;
                        n=0;
                        printf("CPU%d: ", i);
                        while(1) {
                                if(!fgets(buf, 256, fp)) break;
                                v=atoi(buf);
                                s+=v/1000.0;
                                printf("%4.1f ", v/1000.0);
                                n++;
                        }
                        printf("avg:%.1f\n", s/n);
                        pclose(fp);
                }
        }
}
 
Last edited:
I will give it a try!

Update: I tried on my Xeon 5645 first, but nothing happen:

E5645@vidar:~$ cc mytemp.c -o mytemp
E5645@vidar:~$ ./mytemp

Or did the output go elsewhere?
 
Last edited:
Try ''sudo modprobe coretemp" first?
Here is my result on a dual-E5645:
Code:
[root@pc1 ~]# ./mytemp
CPU0: 28.0 27.0 17.0 28.0 29.0 32.0 avg:26.8
CPU1: 26.0 21.0 22.0 26.0 26.0 30.0 avg:25.2
 
Thanks, Working then,but only for CPU0.

CPU0: 54.0 51.0 50.0 50.0 48.0 51.0 53.0 52.0 54.0 avg:51.4
 
After a reset earlier today the TPF went down to 9:6 for 8101, and thats a new record for me with this rig. And it have been folding steady between 9:6 - 9:8 for a while now.
 
This result is a little strange to me. The code should work for all CPUs if it works for one.
Could you show me the result of following cmds?
Code:
ls -al /sys/devices/platform/
ls -al /sys/devices/platform/*/name
9:06 for 8101 is not bad, but it still could be improved I think. Is the power draw for folding on 8101 still significantly lower than 8102?

Thanks, Working then,but only for CPU0.

CPU0: 54.0 51.0 50.0 50.0 48.0 51.0 53.0 52.0 54.0 avg:51.4
 
After a reset earlier today the TPF went down to 9:6 for 8101, and thats a new record for me with this rig. And it have been folding steady between 9:6 - 9:8 for a while now.

What is the average... (whole unit /100)
I have seen a few frames dip quite nicely below my 9:7 average but there are typically a random high frame to keep the average at 9:7.


9:07 for 8101 is not bad, but it still could be improved I think. Is the power draw for folding on 8101 still significantly lower than 8102?

I doubt it can be done from our end. Different units have different bottlenecks based on what they are computing. Some stress a different part of the chips more than others. I had not heard that 8102 power draw was more but I never got to see one... 8101 had the highest power draw of any unit to date so I doubt you will get more out of it.
 
I have been away this weekend, and before I can answer exact on that I have to wait until the next WU comes along. To achive a TPF between 9:6 and 9:10 I have to reset the machine after 1% have been made, but then it will fold steady between 9:6 and 9:10. So I guess the answer for an best average TPF is something like 9:8.

If I do not reset the machine the average TPF is around 9:18 as it is now after 67%, maybe a little better after 100% because it improves until the end.

This result is a little strange to me. The code should work for all CPUs if it works for one.
Could you show me the result of following cmds?
Code:
ls -al /sys/devices/platform/
ls -al /sys/devices/platform/*/name
9:06 for 8101 is not bad, but it still could be improved I think. Is the power draw for folding on 8101 still significantly lower than 8102?

After the rig was tuned the power draw when folding 8102 has stabilized on 700W from the wall. The rig with 8101 use 705W when the rig is tuned as it is now, so this WUs uses almost the same power in my experiences, on this rig. On my G34 rigs the different is bigger.

I will be back with an answer for the cmds you gave me.

Update results:
vidar@RiggE54650:~$ ls -al /sys/devices/platform/
total 0
drwxr-xr-x 11 root root 0 2012-09-01 12:22 .
drwxr-xr-x 20 root root 0 2012-09-01 12:22 ..
drwxr-xr-x 3 root root 0 2012-09-01 12:22 alarmtimer
drwxr-xr-x 4 root root 0 2012-09-01 12:22 Fixed MDIO bus.0
drwxr-xr-x 3 root root 0 2012-09-01 12:22 GHES.0
drwxr-xr-x 3 root root 0 2012-09-01 12:22 GHES.1
drwxr-xr-x 3 root root 0 2012-09-01 12:22 pcspkr
drwxr-xr-x 2 root root 0 2012-09-02 21:26 power
drwxr-xr-x 3 root root 0 2012-09-01 12:22 reg-dummy
drwxr-xr-x 4 root root 0 2012-09-01 12:22 serial8250
-rw-r--r-- 1 root root 4096 2012-09-01 10:22 uevent
drwxr-xr-x 4 root root 0 2012-09-01 10:22 vesafb.0
vidar@RiggE54650:~$ ls -al /sys/devices/platform/*/name
ls: cannot access /sys/devices/platform/*/name: No such file or directory
 
Last edited:
Are you running kraken? Sounds like your turning on DLB by restarting the client. Which would result in shorter frame times.
 
As you mentioned in #181, the power draw when folding 8102 was 730W at that time. How could it be tuned to only 700W?
I'm sorry that I didn't make it clear, you also need run "sudo modprobe coretemp" before running the ls cmds. Would you try it again?
One good news is that I will have a chance to test several 4p E5 rigs this week. The ideal TPF for 8101 I expected is 8:30. I will try my best to make it happen.

After the rig was tuned the power draw when folding 8102 has stabilized on 700W from the wall. The rig with 8101 use 705W when the rig is tuned as it is now, so this WUs uses almost the same power in my experiences, on this rig. On my G34 rigs the different is bigger.

I will be back with an answer for the cmds you gave me.

Update results:
vidar@RiggE54650:~$ ls -al /sys/devices/platform/
total 0
drwxr-xr-x 11 root root 0 2012-09-01 12:22 .
drwxr-xr-x 20 root root 0 2012-09-01 12:22 ..
drwxr-xr-x 3 root root 0 2012-09-01 12:22 alarmtimer
drwxr-xr-x 4 root root 0 2012-09-01 12:22 Fixed MDIO bus.0
drwxr-xr-x 3 root root 0 2012-09-01 12:22 GHES.0
drwxr-xr-x 3 root root 0 2012-09-01 12:22 GHES.1
drwxr-xr-x 3 root root 0 2012-09-01 12:22 pcspkr
drwxr-xr-x 2 root root 0 2012-09-02 21:26 power
drwxr-xr-x 3 root root 0 2012-09-01 12:22 reg-dummy
drwxr-xr-x 4 root root 0 2012-09-01 12:22 serial8250
-rw-r--r-- 1 root root 4096 2012-09-01 10:22 uevent
drwxr-xr-x 4 root root 0 2012-09-01 10:22 vesafb.0
vidar@RiggE54650:~$ ls -al /sys/devices/platform/*/name
ls: cannot access /sys/devices/platform/*/name: No such file or directory
 
Last edited:
Are you running kraken? Sounds like your turning on DLB by restarting the client. Which would result in shorter frame times.

Yes I am running kraken and DLB, and it looks like you are right that DLB is not turning on again after a change of WU. Is there any way to make that happen automatic without have to reset the rig?

As you mentioned in #181, the power draw when folding 8102 was 730W at that time. How could it be tuned to only 700W?
I'm sorry that I didn't make it clear, you also need run "sudo modprobe coretemp" before running the ls cmds. Would you try it again?
One good news is that I will have a chance to test several 4p E5 rigs this week. The ideal TPF for 8101 I expected is 8:30. I will try my best to make it happen.

Thats right, at first the consume of power was 730W, here: http://hardforum.com/showpost.php?p=1039069903&postcount=181

But after a while and some adjustments it stabilized at 700W for 8102, here: http://hardforum.com/showpost.php?p=1039071792&postcount=191

For a time it was actually down at 695W, but the TPF was slow so I reset the values in bios to optimal default, and turn of the power saving options, and after that tha power draw have been 705W for 8101, here: http://hardforum.com/showpost.php?p=1039081669&postcount=211

I will do sudo modprobe coretemp and then again this:
ls -al /sys/devices/platform/
ls -al /sys/devices/platform/*/name

But first I have to do some breakfast.:D

Update, output results:

vidar@RiggE54650:~$ ls -al /sys/devices/platform/
total 0
drwxr-xr-x 15 root root 0 2012-09-03 10:30 .
drwxr-xr-x 20 root root 0 2012-09-03 10:30 ..
drwxr-xr-x 3 root root 0 2012-09-03 10:30 alarmtimer
drwxr-xr-x 4 root root 0 2012-09-03 09:48 coretemp.0
drwxr-xr-x 4 root root 0 2012-09-03 09:48 coretemp.16
drwxr-xr-x 4 root root 0 2012-09-03 09:48 coretemp.24
drwxr-xr-x 4 root root 0 2012-09-03 09:48 coretemp.8
drwxr-xr-x 4 root root 0 2012-09-03 10:30 Fixed MDIO bus.0
drwxr-xr-x 3 root root 0 2012-09-03 10:30 GHES.0
drwxr-xr-x 3 root root 0 2012-09-03 10:30 GHES.1
drwxr-xr-x 3 root root 0 2012-09-03 10:30 pcspkr
drwxr-xr-x 2 root root 0 2012-09-03 09:48 power
drwxr-xr-x 3 root root 0 2012-09-03 10:30 reg-dummy
drwxr-xr-x 4 root root 0 2012-09-03 10:30 serial8250
-rw-r--r-- 1 root root 4096 2012-09-03 08:30 uevent
drwxr-xr-x 4 root root 0 2012-09-03 08:30 vesafb.0
vidar@RiggE54650:~$ ls -al /sys/devices/platform/*/name
-r--r--r-- 1 root root 4096 2012-09-03 09:48 /sys/devices/platform/coretemp.0/name
-r--r--r-- 1 root root 4096 2012-09-03 09:48 /sys/devices/platform/coretemp.16/name
-r--r--r-- 1 root root 4096 2012-09-03 09:48 /sys/devices/platform/coretemp.24/name
-r--r--r-- 1 root root 4096 2012-09-03 09:48 /sys/devices/platform/coretemp.8/name
vidar@RiggE54650:~$

And at the moment the power draw is down to 700W again!
I think that there is variations between the WUs that may cause this?

And / or it also could it be caused by different temperatures in the room that make the fans go slower an then use less power. In this box there is 6 fans that run at 8000 RPM (I think it was) at max an 2000 RPM at minimum. The temperature has fallen from 20 Celsius yesterday until 12 - 13 Celsius today, and this cause the fans to run slower. The temperature in the room is almost the same as outdoor.
 
Last edited:
OK, I see it, so following modified code should work for all of your CPUs.

Code:
#include "stdio.h"

int main() {
        FILE *fp;
        int i;
        for(i=0; i<64; i++) {
                char cmd[256];
                sprintf(cmd, "/sys/devices/platform/coretemp.%d/name", i);
                fp=fopen(cmd, "r");
                if(!fp) continue;
                fclose(fp);
                sprintf(cmd, "cat /sys/devices/platform/coretemp.%d/*input*", i);
                fp=popen(cmd, "r");
                if(fp) {
                        char buf[256];
                        int n;
                        double s, v;
                        s=0;
                        n=0;
                        printf("CPU%d: ", i);
                        while(1) {
                                if(!fgets(buf, 256, fp)) break;
                                v=atoi(buf);
                                s+=v/1000.0;
                                printf("%4.1f ", v/1000.0);
                                n++;
                        }
                        printf("avg:%.1f\n", s/n);
                        pclose(fp);
                }
        }
}
 
Thanks, but no change I think!

vidar@RiggE54650:~$ ./mytemp
CPU0: 54.0 52.0 53.0 51.0 48.0 51.0 53.0 54.0 54.0 avg:52.2
vidar@RiggE54650:~$
 
Sorry, it's a C code, so you also need to compile it first as described in #224. (and 'sudo modprobe coretemp' as well)

vidar@RiggE54650:~$ ./mytemp
./mytemp: line 3: syntax error near unexpected token `('
./mytemp: line 3: `int main() {'
 
I am sorry to, I did forget to compile at first, but remember then, compiled and run the 'sudo modprobe coretemp' but there is no change from before.
 
Yes, I have the Kraken 0.7-pre15.

Is the issue with NUMA and node interleaving importent for Xeon MP rig as it is for G34 & SR-2?

Update: Logged in with the IPMU GUI and get this report on temps etc.


HTML:
Name   Status   Reading 
CPU1 Temp Low
CPU2 Temp Low
CPU3 Temp Medium
CPU4 Temp Medium
System TempNormal20 degrees C
Peripheral TempNormal30 degrees C
PCH TempNormal43 degrees C
FAN1Normal4950 R.P.M
FAN2N/ANot Present!
FAN3Normal4875 R.P.M
FAN4N/ANot Present!
FAN5Normal4950 R.P.M
FAN6N/ANot Present!
FAN7N/ANot Present!
FAN8Normal5400 R.P.M
FAN9Normal5250 R.P.M
FAN10Normal5250 R.P.M
VTTNormal1.056 Volts
CPU1 VcoreNormal1.136 Volts
CPU2 VcoreNormal1.104 Volts
CPU3 VcoreNormal1.104 Volts
CPU4 VcoreNormal1.088 Volts
VDIMM ABCDNormal1.52 Volts
VDIMM EFGHNormal1.52 Volts
VDIMM JKLMNormal1.52 Volts
VDIMM NPRTNormal1.52 Volts
3.3VNormal3.312 Volts
+3.3VSBNormal3.312 Volts
12VNormal11.872 Volts
VBATNormal3.12 Volts
Chassis Intru General Chassis Intrusion.
PS Status Presence detected.
It all looks normale to me!
 
Last edited:
Back
Top