Drool on this, gents. Too bad it won't stay at 100%... and I can't keep it!

fastgeek

[H]ard|DCOTM x4 aka "That Company"
Joined
Jun 6, 2000
Messages
6,520
So I have this for a very, VERY short amount of time. It's. Well. Let's just say it's the most expensive box I've ever had in my lab. Take a gander at those specs! :cool:

However the damn thing refuses to stay solid at 100%. It'll hangout there for a while, then drop down to something-or-other, then eventually get back up to 100. Thought it might have been cooling, but that's not it. (Ramped the fans up to obnoxious levels, which dropped the temps big time; but no difference) Turned off all power savings. Just installed all the available updates thinking that might've helped. Told BOINC to use 128 cores via cc_config (or whatever it is). Maybe BOINC simply can't deal? Did look at the disk usage (RAID0 with two SAS12 SSDs because it's for testing and I'm lazy) and there's no usage to speak of.

Sure would be nice if this would stay solid. Of course this is more worrying than for just this little experiment. Will have to try some other tests to make damn sure this thing will peg and stay there... else we've problems! :eek:

Kahuna2.jpg


Kahuna2a.jpg
 
That's is beautiful, keep up the good work trying to get all 128 pegged at 100%!

Have you tried Prime95 in an attempt to get everything at 100%?
 
It's on my to-do list for tomorrow. This will get to almost 100% (as shown in 1st pic) and stay there for an hour or so before doing the slow yo-yo thing.
 
fastgeek, I love the Prime95 suggestion. Hint* there are two DC projects that use that client and we rarely have team members helping on them. Wink Wink. Check the All inclusive DC list. We also have installation guides...

My guess is that the threads are running into memory bandwidth issues of some kind. Maybe too much hitting the cache at once or something. You should hit up Patriot in IRC and see what he thinks.

GIMPS and Seventeen or Bust are the two projects...
 
Gilthanis, it's a R930 with a bloody terrabyte of RAM spread across upteen DDR4 modules... I can't believe that running 128 SkyNet jobs would saturate the memory bus! The weird thing is it fluctuates. A minute ago it was at ~60% and now it's at 90.

I'll try what I can try tomorrow. Normally I have these boxes at my disposal for weeks/months; but alas not this time. :(
 
What is happening in the BOINC manager? Are the work units pausing/suspending at all when the cores go unused? I know it can be hard to track 128 work units in the client, so I'm sure it isn't an easy task to figure out.
 
Many BOINC workers (don't know about Skynet specifically) frequently reach to disk, be it to read
data to process, write (partial) results or checkpoints. When they do the computation is halted.

On desktop boards (few threads), it's not a big deal but when you multiply this by 64 or more, BOINC
can saturate your disk and one sees something you're seeing. Ofc, it is not the only explanation
but I've seen exactly same effect on a 4p.

That said,
  1. Can you determine if the delay (aka "off-time") is caused by I/O? You could do that (to
    some extent) with Windows Resource Monitor or with Process Monitor (by filtering only
    single worker PID and checking how much disk I/O it does).
     
  2. Alternatively, can you try purely computational load? LINPACK or even Prime95 [!]
     
  3. Not sure how to do that on Windows but there should be a way to determine (from process'
    scheduling statistics) run times and I/O wait times. On dedicated system (no other running
    processes), their sum should be equal almost to total run time (on the wall clock).
    Examining them would also tell you if it's a scheduling problem or I/O problem.
     
  4. Gilthanis' suggestion is also viable, BOINC itself may be pausing tasks to satisfy CPU load
    limits (can you double check there are no limits as far as CPU usage is concerned? note: I've
    seen BOINC ignore "run always" and resort to punched-in numbers); another thing happening
    there might be simply tasks being completed and new tasks being started (no idea how
    quickly it burns through a Skynet task)
 
so much for what I think I'm helping with my 3930k when not in use at home.......
 
so much for what I think I'm helping with my 3930k when not in use at home.......

Hopefully you're just kidding around. Every little bit helps and there's always going to be people with access to more hardware. Don't let it get you down!

As for the system in general...

Recorded an insanely boring screen cap video of the system running with Perfmon open to the disks tab. It's about as exciting as watching paint dry. Absolutely nothing of consequence happens. Disk usage level never hits 1%, QD stayed around 0.1 and maybe a few MB/s of transfer. These are SAS12 SSDs connected to a PERC H730... one would hope they can cope with a lot more than this.

Prime95 crashes after trying to start all the workers. Turned off HT, thus dropping the "cores" from 128 to 64; still crashes.

Ran LinX for giggles. Hit 460Gflops peak. Will try the Intel Stress test later. If anyone has some good Windows based utils, mention it and I'll give it a go (time permitting).

Might try PrimeGrid later. Also trying to see just how much power I can make this thing pull down. Right now it peaks around ~6.3A @ 208V.
 
gawkgawk, I run DC projects on cell phones as well... sooo like fastgeek says... if you can afford to run it, it is worth running.
 
Results for Intel LINPACK with Hyper-Threading DISABLED

Code:
Number of equations to solve (problem size) : 1000  2000  3000  4000  5000  10000 15000 20000 25000 30000 35000 40000
Leading dimension of array                  : 1000  2000  3000  4000  5000  10000 15000 20000 25000 30000 35000 40000
Number of trials to run                     : 4     4     4     4     4     2     2     2     2     1     1     1    
Data alignment value (in Kbytes)            : 4     4     4     4     4     4     4     4     4     4     4     4    
Intel(R) Optimized LINPACK Benchmark data

Current date/time: Fri Sep 04 15:22:28 2015

CPU frequency:    2.692 GHz
Number of CPUs: 4
Number of cores: 64
Number of threads: 64

Parameters are set to:

Number of tests: 12

Maximum memory requested that can be used=4210869504, at the size=40000

=================== Timing linear equation system solver ===================

Size   LDA    Align. Time(s)    GFlops   Residual     Residual(norm) Check
1000   1000   4      0.013      51.7699  8.453724e-013 2.882937e-002   pass
1000   1000   4      0.008      83.0579  8.453724e-013 2.882937e-002   pass
1000   1000   4      0.008      83.2125  8.453724e-013 2.882937e-002   pass
1000   1000   4      0.008      84.4632  8.453724e-013 2.882937e-002   pass
2000   2000   4      0.027      197.1362 3.928358e-012 3.417190e-002   pass
2000   2000   4      0.027      197.9035 3.928358e-012 3.417190e-002   pass
2000   2000   4      0.039      135.9905 3.928358e-012 3.417190e-002   pass
2000   2000   4      0.027      200.3234 3.928358e-012 3.417190e-002   pass
3000   3000   4      0.041      437.5907 8.775924e-012 3.379398e-002   pass
3000   3000   4      0.039      463.6948 8.775924e-012 3.379398e-002   pass
3000   3000   4      0.039      463.4198 8.775924e-012 3.379398e-002   pass
3000   3000   4      0.039      463.7638 8.775924e-012 3.379398e-002   pass
4000   4000   4      0.070      609.6111 1.281314e-011 2.792745e-002   pass
4000   4000   4      0.065      655.9171 1.281314e-011 2.792745e-002   pass
4000   4000   4      0.065      657.7768 1.281314e-011 2.792745e-002   pass
4000   4000   4      0.065      652.3550 1.281314e-011 2.792745e-002   pass
5000   5000   4      0.122      682.7128 1.869693e-011 2.607137e-002   pass
5000   5000   4      0.110      758.2912 2.442485e-011 3.405849e-002   pass
5000   5000   4      0.108      770.5042 1.968273e-011 2.744598e-002   pass
5000   5000   4      0.108      770.8491 2.049766e-011 2.858233e-002   pass
10000  10000  4      0.484      1376.7964 6.077157e-011 2.142867e-002   pass
10000  10000  4      0.470      1418.2639 7.044615e-011 2.484003e-002   pass
15000  15000  4      1.339      1680.9144 1.564782e-010 2.464557e-002   pass
15000  15000  4      1.317      1708.2513 1.602432e-010 2.523857e-002   pass
20000  20000  4      2.860      1865.2687 2.403335e-010 2.127478e-002   pass
20000  20000  4      2.859      1865.8210 2.403335e-010 2.127478e-002   pass
25000  25000  4      5.343      1949.6990 3.636776e-010 2.068104e-002   pass
25000  25000  4      5.359      1943.8980 3.636776e-010 2.068104e-002   pass
30000  30000  4      9.012      1997.6439 5.755403e-010 2.268786e-002   pass
35000  35000  4      14.240     2007.3896 8.181071e-010 2.374841e-002   pass
40000  40000  4      21.578     1977.4606 8.983340e-010 1.997925e-002   pass

Performance Summary (GFlops)

Size   LDA    Align.  Average  Maximal
1000   1000   4       75.6259  84.4632 
2000   2000   4       182.8384 200.3234
3000   3000   4       457.1173 463.7638
4000   4000   4       643.9150 657.7768
5000   5000   4       745.5893 770.8491
10000  10000  4       1397.5301 1418.2639
15000  15000  4       1694.5829 1708.2513
20000  20000  4       1865.5448 1865.8210
25000  25000  4       1946.7985 1949.6990
30000  30000  4       1997.6439 1997.6439
35000  35000  4       2007.3896 2007.3896
40000  40000  4       1977.4606 1977.4606

Residual checks PASSED

End of tests

Fri 09/04/2015 
03:29 PM


Results for Intel LINPACK with Hyper-Threading ENABLED

Code:
Number of equations to solve (problem size) : 1000  2000  3000  4000  5000  10000 15000 20000 25000 30000 35000 40000
Leading dimension of array                  : 1000  2000  3000  4000  5000  10000 15000 20000 25000 30000 35000 40000
Number of trials to run                     : 4     4     4     4     4     2     2     2     2     1     1     1    
Data alignment value (in Kbytes)            : 4     4     4     4     4     4     4     4     4     4     4     4    
Intel(R) Optimized LINPACK Benchmark data

Current date/time: Fri Sep 04 15:44:47 2015

CPU frequency:    2.691 GHz
Number of CPUs: 4
Number of cores: 64
Number of threads: 128

Parameters are set to:

Number of tests: 12

Maximum memory requested that can be used=4210869504, at the size=40000

=================== Timing linear equation system solver ===================

Size   LDA    Align. Time(s)    GFlops   Residual     Residual(norm) Check
1000   1000   4      0.022      30.6336  8.453724e-013 2.882937e-002   pass
1000   1000   4      0.009      70.9545  8.453724e-013 2.882937e-002   pass
1000   1000   4      0.009      72.5890  8.453724e-013 2.882937e-002   pass
1000   1000   4      0.009      73.0926  8.453724e-013 2.882937e-002   pass
2000   2000   4      0.031      174.2066 3.928358e-012 3.417190e-002   pass
2000   2000   4      0.056      95.9648  3.928358e-012 3.417190e-002   pass
2000   2000   4      0.030      178.9065 3.928358e-012 3.417190e-002   pass
2000   2000   4      0.030      180.2913 3.928358e-012 3.417190e-002   pass
3000   3000   4      0.038      476.9612 8.775924e-012 3.379398e-002   pass
3000   3000   4      0.036      504.7496 8.775924e-012 3.379398e-002   pass
3000   3000   4      0.036      505.5226 8.775924e-012 3.379398e-002   pass
3000   3000   4      0.059      305.1830 8.775924e-012 3.379398e-002   pass
4000   4000   4      0.067      634.8317 1.281314e-011 2.792745e-002   pass
4000   4000   4      0.063      675.3228 1.281314e-011 2.792745e-002   pass
4000   4000   4      0.064      666.0002 1.281314e-011 2.792745e-002   pass
4000   4000   4      0.065      655.2243 1.281314e-011 2.792745e-002   pass
5000   5000   4      0.123      677.1031 1.608047e-011 2.242292e-002   pass
5000   5000   4      0.114      729.6020 1.776682e-011 2.477439e-002   pass
5000   5000   4      0.115      725.8751 1.895514e-011 2.643142e-002   pass
5000   5000   4      0.103      811.9679 1.664718e-011 2.321316e-002   pass
10000  10000  4      0.619      1076.7933 6.903533e-011 2.434256e-002   pass
10000  10000  4      0.778      856.7418 6.447082e-011 2.273306e-002   pass
15000  15000  4      1.881      1196.4614 1.511855e-010 2.381197e-002   pass
15000  15000  4      2.599      865.7998 1.620746e-010 2.552701e-002   pass
20000  20000  4      4.286      1244.5057 2.403335e-010 2.127478e-002   pass
20000  20000  4      4.216      1265.3435 2.403335e-010 2.127478e-002   pass
25000  25000  4      10.735     970.4831 3.636776e-010 2.068104e-002   pass
25000  25000  4      7.962      1308.4262 3.636776e-010 2.068104e-002   pass
30000  30000  4      13.843     1300.4528 5.755403e-010 2.268786e-002   pass
35000  35000  4      22.055     1296.1197 8.181071e-010 2.374841e-002   pass
40000  40000  4      33.888     1259.1399 8.983340e-010 1.997925e-002   pass

Performance Summary (GFlops)

Size   LDA    Align.  Average  Maximal
1000   1000   4       61.8174  73.0926 
2000   2000   4       157.3423 180.2913
3000   3000   4       448.1041 505.5226
4000   4000   4       657.8447 675.3228
5000   5000   4       736.1370 811.9679
10000  10000  4       966.7676 1076.7933
15000  15000  4       1031.1306 1196.4614
20000  20000  4       1254.9246 1265.3435
25000  25000  4       1139.4547 1308.4262
30000  30000  4       1300.4528 1300.4528
35000  35000  4       1296.1197 1296.1197
40000  40000  4       1259.1399 1259.1399

Residual checks PASSED

End of tests

Fri 09/04/2015 
03:53 PM

With HT Enabled task manager would occasionally show 50% (showed 52% usage level with half the cores at 100%) of the cores shut off. This would corralate with a spike on the meter to ~7A. This would last for a few seconds, then it was back to 100% of the cores online. The percentages would never change. It was either 52 or 100.

With HT Disabled I didn't see this. While I didn't watch it like a hawk, I did look at it often and never saw it deviate from a solid 100% at all times.

As for POGS/TSN, with HT off it stays at 100% at all times too.
 
Just recalled this: https://msdn.microsoft.com/en-us/library/windows/desktop/dd405503(v=vs.85).aspx

Seems that unless an app explicitly spills over to another processor group,
it will never utilize more than 64 "logical" processors:
An application that requires the use of multiple groups so that it can run on more than 64 processors must explicitly determine where to run its threads and is responsible for setting the threads' processor affinities to the desired groups.

Out of curiosity, what do coreinfo -g and coreinfo -n (https://technet.microsoft.com/en-us/sysinternals/cc835722.aspx) report w/HT enabled?

EDIT: seems nothing has been done about that in BOINC: https://github.com/BOINC/boinc/issues/1357
     also seems that intel's LINPACK doesn't have proper support either -- oops!
 
Last edited:
Awesome!! Love big core machines :D. That would be at least 50% faster than my IL 64 core opteron machine. Run it for as long as you can :D.
 
 
Here's small utility that lets you identify application that's not coded with multiple groups in mind: http://darkswarm.org/pgatester/

Sample output:
Code:
C:\Users\Administrator\Downloads>pgatester.exe
Processor Group Affinity Tester
Copyright (c) 2015 by Kris Rusocki <[email protected]>
Licensed under GPL version 2

Usage:
        pgatester.exe <pid1> <pid2> ...

C:\Users\Administrator\Downloads>pgatester.exe 4 1196
Processor Group Affinity Tester
Copyright (c) 2015 by Kris Rusocki <[email protected]>
Licensed under GPL version 2

4 processor groups total.
Group 0: 16 processor(s)
Group 1: 16 processor(s)
Group 2: 16 processor(s)
Group 3: 16 processor(s)

PID 4: group(s) 0 1 2 3
PID 1196: group(s) 2

C:\Users\Administrator\Downloads>
In this picture:
1. PID 1196 can only run on processors from group 2.
2. Threads of PID 4 can run on processors from all groups (but only if they
&#8194;&#8194;have been deployed to proper groups with, say, SetThreadGroupAffinity).

Bottom-line: if you see single group next to a PID, it surely won't use > 64 processors;
&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8196;if you see multiple groups next to a PID, the process, if properly coded, may be using > 64 processors
 
Back
Top