Request from the DAB, 6903/04 TPFs

6903 and 6904 frame times below...

If this is ultimately for Stanford, perhaps Stanford should just mine the logs for this? When the work unit results are uploaded, log files are included in that, which has the TPF and various performance statistics. Stanford should be able to see this data

I'm sure that they do but maybe they just need to be "nudged" to evaluate the data.
 
Well, we have been going over the data.....

We have a system that is off balance
Cores is not a good measure
A mini bech should be built into the client to solve this long term
In short term possibly different flags to denote system speed is a possibility

The problem. Both soulutions takes a fair amount of coding (mini bench way more) and PG is up it's elbows, if not eyeballs in coding requirements and none of this can be implemented in the near future.

They are listening and know what needs to be done. It is just an issue of needing coding that they can't get done currently.
 
6903 and 6904 frame times below...

If this is ultimately for Stanford, perhaps Stanford should just mine the logs for this? When the work unit results are uploaded, log files are included in that, which has the TPF and various performance statistics. Stanford should be able to see this data across the entire folding population...

busy work to keep us out of their hair?
idk...that is a valid point...They already have the data...
They probably just don't want to use their resources to mine the data.
 
Kendrak, that sounds fair.

Am I correct to assume we're talking about server-side benchmark?
 
What has been talked about...... and this is ONLY talk, is to have a small bech run the first time you install the client. This might give an score, and then with that score the sever will know what kind of WU to send you.

This way a super fast 4 core system will get WU that will run well with it. And slower 12 core HE chips would get smaller WU in comparison.

I would like to say again this is talk and would take a large amount of time that PG just doesn't have free at the. Moment (or near future). This is however seen as a perm fix to the issue.
 
What has been talked about...... and this is ONLY talk, is to have a small bech run the first time you install the client. This might give an score, and then with that score the sever will know what kind of WU to send you.

This way a super fast 4 core system will get WU that will run well with it. And slower 12 core HE chips would get smaller WU in comparison.

I would like to say again this is talk and would take a large amount of time that PG just doesn't have free at the. Moment (or near future). This is however seen as a perm fix to the issue.

Sorry for the thread hijack but what are PGs priorities at the moment besides new WUs?
 
I think ironing out v7 is taking a large amount of time.
 
6903 and 6904 frame times below...

If this is ultimately for Stanford, perhaps Stanford should just mine the logs for this? When the work unit results are uploaded, log files are included in that, which has the TPF and various performance statistics. Stanford should be able to see this data across the entire folding population...

-smp 48, 2.7ghz 6-7-5-20-1440. NB 1944 (default @ 1800, but refclock=216).

with dlb engaged + kraken.

6903: 11:43TPF
6904: 16:04TPF

busy work to keep us out of their hair?
idk...that is a valid point...They already have the data...
They probably just don't want to use their resources to mine the data.

Busy work or not, is the improvement not worth your effort? (not that the hair messer-uppers would ever be detered by such trivial tasks) :p

PG wanting to do it is different than their ability and priority to do it. Donors have both the ability and the higher priority than PG.

Yes, TPF is in there, but is the core count? And if the core count is in there, is it accurate or was it fooled by some scripting to make an i7 look like 12 cores? Does the log even show the actually hardware config? What good is TPF without a valid hardware reference?

Please don't just assume "it's all in there, why don't they use it?" and then blame PG. Is the data really all in there in a way that is usable or not?
 
Busy work or not, is the improvement not worth your effort? (not that the hair messer-uppers would ever be detered by such trivial tasks) :p

PG wanting to do it is different than their ability and priority to do it. Donors have both the ability and the higher priority than PG.

Yes, TPF is in there, but is the core count? And if the core count is in there, is it accurate or was it fooled by some scripting to make an i7 look like 12 cores? Does the log even show the actually hardware config? What good is TPF without a valid hardware reference?

Please don't just assume "it's all in there, why don't they use it?" and then blame PG. Is the data really all in there in a way that is usable?

If its just busy work that they will keep ignoring yes my time is more valuable than that...
About the only good way to get good info is to read msr data... /proc/cpuinfo is not accurate.

The beta team also normally starts a tpf thread with hardware specs for new projects...

notes that the troll made a semi useful post... impressed.
 
Yes, TPF is in there, but is the core count? And if the core count is in there, is it accurate or was it fooled by some scripting to make an i7 look like 12 cores? Does the log even show the actually hardware config? What good is TPF without a valid hardware reference?

The point has always been, who cares about hardware specs? TPF is the ONLY thing that matters for system performance. We collect them outside of Stanford for guidance on what to expect for a potential new system. To Stanford, the only spec that might matter is memory amount, which is recorded in the config file (I think) and could be returned. Other hardware specs don't matter.
 
I agree for the most part.

If the optimum time, say to return a WU is 3 days or less, no matter the size of the WU, the deadline should be 3.5 to 4 days.

A system to assign the proper WU to the proper systems is key. I've said in the DAB forum and here, that will solve the problems and allow the project to get important data back even faster on average.

Sanford knows this, it is just a question on how to move from the current system, keep the project running, and find time for all the extra coding this would take. It isn't an easy thing.
 
The point has always been, who cares about hardware specs? TPF is the ONLY thing that matters for system performance. We collect them outside of Stanford for guidance on what to expect for a potential new system. To Stanford, the only spec that might matter is memory amount, which is recorded in the config file (I think) and could be returned. Other hardware specs don't matter.

I think 7ims point was of the data stanford has... how do they know what the systems specs are...

I agree that TPF is the only factor that should be looked at... as far as is the system capable...
but... for stanford to dish units out... they need to be able to assign units to systems that are capable...
for this there needs to be a finer grain of control for dishing units out... I would rather it be on the user side...
 
Client-side benchmark could work if the deadlines are tight
enough to discourage benchmark manipulation (which,
obviously, can be done as it's a *client*-side benchmark).

Personally, I recommend implementing server-side benchmark
(i.e. based exclusively on WU round-trip-times).
 
How do you go about doing server side with multi systems running?

Each sysem have their own passkey? That seems like a huge hassel.
 
AFAICT, per-machine data already exist (vide machinedependent.dat
w/Linux client) and are specifically used for the purpose of
machine identification.
 
AFAICT, per-machine data already exist (vide machinedependent.dat
w/Linux client) and are specifically used for the purpose of
machine identification.

Cool, learned something new.

However that would require a whole new DB with preformance data linked the the assign servers correct?
 
It would. Some interface work would need to be done there.

I'm not insisting -- if deadlines are tight enough to discourage*
manipulation things should work out all right (I think).

EDIT:
*) in a way that cheating the benchmark doesn't increase production
no matter what
 
I think it should be based off of completed work units. I understand it would take a lot of work, and resources are spread thin. My thought is that you down load a client. The first WU is say a uniprocessor, and based of the speed of completion the assignment server will assign the next highest work unit class, so in short uniprocessor --> smp ---> bigadv --> big beta (I know its beta, but its just an example to illustrate the point). The first two steps would sort themselves out on the first day of folding for even an average system today. Then we would not need specific hardware identification, just the results of a previous work unit to then assess what the next work unit would be.

Then stanford would no longer need to worry about thread and core counts. Then this also allows for further stratification. In the future this could lead to the assignment server dynamically assigning the best work unit to the client and would insure that the machine is capable of completing deadlines. This change could also allow them to push critical work units to reliable machines that can complete them on time.

There could also be a points reward to those who have the best turn around in a specific period with a metric based off of say downloaded WU to completed WU comparison or something.

Just a bunch of random thoughts, thanks for reading.
 
Server side is really the best way to do this.

Server already has all necessary information, at least for two permutations:
1. Effective work-unit turn-around time. This is already used to calculate bonus, etc.
2. TPF and effective gflops of system. This is included in the log file sent to the server. Excerpt of this from a completed 6903 below. Technically, this could be gamed (edit log on client side), but the effective turn-around time, bonus, and deadline takes care of this.

This could also be approached client side, but the server side work is likely going to be useful for other things down the road, like projecting the sweet spot of folding systems, gauging when larger work-units should be introduced, enable better planning of resource usage, etc. Client side, a small (~2MB) .TPR file could be run to assess the speed of the system. It's probably "good enough" to just cache this performance result when the machine dependent data is setup (or client config has been set).

I would keep core count, frequency, and other subjective things out of the picture when deciding on whether assignment happens, given the long history of problems here.



Code:
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

 Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
 Domain decomp.        48      49423   292197.176   112254.6     3.2
 DD comm. load         48      49423     3542.710     1361.0     0.0
 DD comm. bounds       48      49423     5030.404     1932.6     0.1
 Vsite constr.         48     247116    22950.320     8816.9     0.3
 Comm. coord.          48     247116    68109.266    26165.8     0.8
 Neighbor search       48      49424   855630.934   328711.3     9.5
 Force                 48     247116  5120394.462  1967123.5    56.9
 Wait + Comm. F        48     247116    55454.706    21304.3     0.6
 PME mesh              48     247116  2116600.881   813143.5    23.5
 Vsite spread          48     296540     9881.106     3796.1     0.1
 Write traj.           48         90     2227.164      855.6     0.0
 Update                48     247116   115858.100    44509.7     1.3
 Constraints           48     247116   211938.847    81421.4     2.4
 Comm. energies        48      49425    55127.483    21178.6     0.6
 Rest                  48               69522.109    26708.6     0.8
-----------------------------------------------------------------------
 Total                 48             9004465.669  3459283.6   100.0
-----------------------------------------------------------------------
-----------------------------------------------------------------------
 PME redist. X/F       48     494232   514607.445   197698.9     5.7
 PME spread/gather     48     494232  1008929.184   387604.6    11.2
 PME 3D-FFT            48     494232   534920.908   205502.8     5.9
 PME solve             48     247116    57877.097    22234.9     0.6
-----------------------------------------------------------------------

	Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:  72068.408  72068.408    100.0
                       20h01:08
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:   2993.220    157.449      1.185     20.253
Finished mdrun on node 0 Tue Jun  7 17:12:22 2011


Cool, learned something new.

However that would require a whole new DB with preformance data linked the the assign servers correct?
 
Client side benchmarking solves several problems, like improve WU matching to hardware specs. As others have noted, a fast 6 core box versus a slower 8 core box, and mabye the 6 core box should get a big WU, and the 8 core not.

Another thing client side benchmarking does is help to eliminate PPD fluctuations between projects. No more p2684 dogs. Your specific hardware should always perform about the same way after that.

Deadlines were never meant to be used as a tool, nor as a performance guideline. Deadlines were simply a "catch all" date to reassign a work unit if something went wrong with a download, a client, a computer, a network, etc.

Donors behavior changed that so PG is trying to adjust to that change. Points system changes, and other changes will help realign donor behavior with Folding's goals.

BTW, a client side benchmark would probably run one good benchmark at client setup time, and then do small random samples to make sure the system performance hasn't been "gamed." And do things like compare cpu time to real time, and to upload and download times so clocks can't be gamed.

These anti-gaming requirements are what will make rolling out a client side benchmark take a lot longer, but it will also work a lot better.

Until then, it's a manual process collecting and summarizing the performance data. Not "just" busy work, but manual work to help provide better feedback. To influence positive change. Whether you pull the data from the logs, or HFM, or wherever, PG doesn't have the bandwidth to do everything.

Kendrak is asking everyone to help him as DAB rep, which is to help PG directly, which in the end benefits you and your team. So I don't get why there is friction about "busy work" when people are making an effort to solve problems that benefit your team. You all know Kendrak better than me. Even so, I know he would not ask anyone to do worthless busy work. So get on board, or get out of the way.
 
I don't get why there is friction about "busy work" when people are making an effort to solve problems that benefit your team.
nuff said...

now... where is that 7rollbegone spray...
 
7im said:
BTW, a client side benchmark would probably run one good benchmark at client setup time, and then do small random samples to make sure the system performance hasn't been "gamed." And do things like compare cpu time to real time, and to upload and download times so clocks can't be gamed.
No matter how intricate, client-side benchmark can be gamed.
Something I will happily demonstrate if and when it comes to life.

EDIT: ofc, gaming the benchmark may be orthogonal to potential benefits -- something I emphasised in previous posts (vide: tight deadlines et al.)
 
Project: 6904
Average time/frame: 00:56:53 {in hh:mm:ss}
CPU: 970 @ 4.51 GHz
# of CPU sockets: 1
# of Physical cores: 6 (12 threads)

# of FAH CPU processes: 1
# of FAH GPU Clients: 0

RAM GB installed: 6
RAM Type: DDR3 1600
RAM Speed: 1720Mhz 9-9-9-24-1T

OS name/kernel version Ubuntu 10.10 2.6.35.30 generick-ck
Client: 6.34
Running in VM: No

Project: 6904
Average time/frame: 00:58:25 {in hh:mm:ss}
CPU: 970 @ 4.41 GHz
# of CPU sockets: 1
# of Physical cores: 6 (12 threads)

# of FAH CPU processes: 1
# of FAH GPU Clients: 0

RAM GB installed: 6
RAM Type: DDR3 1600
RAM Speed: 1680Mhz 9-9-9-24-1T

OS name/kernel version Ubuntu 10.10 2.6.35.30 generick-ck
Client: 6.34
Running in VM: No
 
Project: 6904
Average time/frame: 00:48:34 {in hh:mm:ss}
CPU: L5640 @ 3.60 GHz
# of CPU sockets: 2
# of Physical cores: 12 (24 threads)

# of FAH CPU processes: 1
# of FAH GPU Clients: 0

RAM GB installed: 12
RAM Type: DDR3 1600
RAM Speed: 1600Mhz 8-8-8-24-1T

OS name/kernel version Linux Kernel 3.0.1 BFS 406
Client: 6.34
Running in VM: Yes via Virtualbox with a Windows Server 2008 RC2 host
 
Project: 6903
Average time/frame: 00:52:18 {in hh:mm:ss}
CPU: 2600k @ 4.5GHz
# of CPU sockets: 1
# of Physical cores: 4 (8 threads)

# of FAH CPU processes: 1
# of FAH GPU Clients: 0

RAM GB installed: 8
RAM Type: DDR3 2133
RAM Speed: 2133Mhz 11-11-11-30-2T

OS name/kernel version Ubuntu 10.10 2.6.35.30 generick-ck
Client: 6.34
Running in VM: Yes


Project: 6904
Average time/frame: 01:08:40 {in hh:mm:ss}
CPU: 2600k @ 4.5GHz
# of CPU sockets: 1
# of Physical cores: 4 (8 threads)

# of FAH CPU processes: 1
# of FAH GPU Clients: 0

RAM GB installed: 8
RAM Type: DDR3 2133
RAM Speed: 2133Mhz 11-11-11-30-2T

OS name/kernel version Ubuntu 10.10 2.6.35.30 generick-ck
Client: 6.34
Running in VM: Yes
 
Project: 6903
Average time/frame: 47:55
CPU: Intel L5640 @ 3.871 GHz
# of CPU sockets: 1
# of Physical cores: 6 (12 logical)

# of FAH CPU processes: 1
# of FAH GPU Clients: 0

RAM GB installed: 6
RAM Type: DDR3
RAM Speed: 1600

OS name/kernel version: Ubuntu 10.10
Client: 6.34
Running in VM: {Yes/No}: No
 
Last edited:
Project: 6904
Average time/frame: 01:04:38
CPU: 2600k @ 4700 GHz
# of CPU sockets: 1
# of Physical cores: 4 I think?

# of FAH CPU processes: 1
# of FAH GPU Clients: 0

RAM GB installed: 2x2GB
RAM Type: DDR3
RAM Speed: 2133 @ 9-10-9-24-1T 1.65v (Stock Specs)

OS name/kernel version: Ubuntu 10.10
Client: smp linux?
Running in VM: No
 
Last edited:
Project: 6903
Average time/frame: 00:57:26
CPU: 2x Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
# of CPU sockets: 2
# of Physical cores: 2 x 4

# of FAH CPU processes: 1
# of FAH GPU Clients: 0

RAM GB installed: 8x4GB
RAM Type: DDR2
RAM Speed: PC2-5300 / 667 ECC/FBDIMM

OS name/kernel version: Gentoo / 2.6.36-ck-r5
Client: Linux SMP client 6.34
Running in VM: Nope
 
Project: 6904
Average time/frame: 01:07:07
CPU: 2x Intel(R) Xeon(R) CPU X5550 @ 2.66GHz
# of CPU sockets: 2
# of Physical cores: 8 (16 threads)

# of FAH CPU processes: 1
# of FAH GPU Clients: 0

RAM GB installed: 6x1GB
RAM Type: DDR3
RAM Speed: PC3-10600 / 1333 Mhz.

OS name/kernel version: Ubuntu 10.10 x64 Desktop
Client: Linux SMP client 6.34
Running in VM: Nope
 
Project: 6904
CPU: 2600k @ 4700 GHz

OS name/kernel version: Ubuntu 10.10
Client: smp linux?
Running in VM: No

how are you running a 6904 on a 2600k without a VM? Did you spoof the kernel to tell it you have 12 CPU cores or something?
 
how are you running a 6904 on a 2600k without a VM? Did you spoof the kernel to tell it you have 12 CPU cores or something?

It's done by the guide over at Overclock.net here. Basically you trick the client into thinking you have 12 cores, but only run it on 8 with the -smp 8 switch.

VM has nothing to do with it, having a VM just means you can keep your Windows install and still fold BA (since BA requires 64 bit linux). It's actually more preferential to run native than in a VM, compare mine and d00dz TPF for proof of that; I'm running in a VM.

Cutoff is about 1h20m TPF, which running a 2600k at 4.5Ghz and 2133MHz ram you should easily be able to do.
 
Project: 6903
Average time/frame: 00:39:34 {in hh:mm:ss}
CPU: 970 @ 4.51 GHz
# of CPU sockets: 1
# of Physical cores: 6 (12 threads)

# of FAH CPU processes: 1
# of FAH GPU Clients: 0

RAM GB installed: 6
RAM Type: DDR3 1600
RAM Speed: 1720Mhz 9-9-9-24-1T

OS name/kernel version Ubuntu 10.10 2.6.35.30 generick-ck
Client: 6.34
Running in VM: No




Nice clocks on the 470's, sbinh.... water cooled I assume?

yup ...
 
Project: 6904
Average time/frame: 1:00:40
CPU: 980x @ 4.4 GHz
# of CPU sockets:1
# of Physical cores:6 (12 HT threads)

# of FAH CPU processes:1
# of FAH GPU Clients:0

RAM GB installed:12
RAM Type: DDR3 GSkill ECO 1600
RAM Speed: 1650 Mhz 8-8-8-24-1T

OS name/kernel version: slackware VM image w/ BFS
Client: 6.34
Running in VM: Yes. VirtualBox, Win 7 Pro host
 
Project: 6903
Average time/frame: 00:14:04
CPU: Opteron 6176 SE 2.30Ghz
# of CPU sockets:4
# of Physical cores:48

# of FAH CPU processes:1
# of FAH GPU Clients:0

RAM GB installed:32
RAM Type: DDR3 Corsair 1333
RAM Speed: 1333 Mhz 9-9-9-24 1T

OS name/kernel version: Ubuntu 11.10 (TheKraken)
Client: 6.34
Running in VM: No

Project: 6904
Average time/frame: 00:19:19
CPU: Opteron 6176 SE 2.30Ghz
# of CPU sockets:4
# of Physical cores:48

# of FAH CPU processes:1
# of FAH GPU Clients:0

RAM GB installed:32
RAM Type: DDR3 Corsair 1333
RAM Speed: 1333 Mhz 9-9-9-24 1T

OS name/kernel version: Ubuntu 11.10 (TheKraken)
Client: 6.34
Running in VM: No
 
Last edited:
Project: 6903
Average time/frame: 00:24:38
CPU: Xeon X5650 @ 4.1Ghz
# of CPU sockets:2
# of Physical cores:12 (24 with HT)

# of FAH CPU processes:1
# of FAH GPU Clients:0

RAM GB installed:12
RAM Type: DDR3 OCZ 1600
RAM Speed: 1600 Mhz 8-8-8-24 1T

OS name/kernel version: Ubuntu 11.10 (BFS & TheKraken)
Client: 6.34
Running in VM: No

Project: 6904
Average time/frame: 00:34:06
CPU: Xeon X5650 @ 4.1Ghz
# of CPU sockets:2
# of Physical cores:12 (24 with HT)

# of FAH CPU processes:1
# of FAH GPU Clients:0

RAM GB installed:12
RAM Type: DDR3 OCZ 1600
RAM Speed: 1600 Mhz 8-8-8-24 1T

OS name/kernel version: Ubuntu 11.10 (BFS & TheKraken)
Client: 6.34
Running in VM: No
 
Project: 6903
Average time/frame: 00:49:55
CPU: 970X @ 4.1Ghz
# of CPU sockets:1
# of Physical cores:6 (12 HT threads)

# of FAH CPU processes:1
# of FAH GPU Clients:0

RAM GB installed:6
RAM Type: DDR3 OCZ 1333
RAM Speed: 1600 Mhz 6-7-7-59 1T

OS name/kernel version: Windows Server 2008
Client: 6.34
Running in VM: Yes - VirtualBox 4.1 (Linux 3.0.1-corei7 + BFS)


roject: 6904
Average time/frame: 01:09:33
CPU: 970X @ 4.1Ghz
# of CPU sockets:1
# of Physical cores:6 (12 HT threads)

# of FAH CPU processes:1
# of FAH GPU Clients:0

RAM GB installed:6
RAM Type: DDR3 OCZ 1333
RAM Speed: 1600 Mhz 6-7-7-59 1T

OS name/kernel version: Windows Server 2008
Client: 6.34
Running in VM: Yes - VirtualBox 4.1 (Linux 3.0.1-corei7 + BFS)
 
Server 1 - Project 6904;

Average time/frame: 00:20:05
CPU: Xeon X7560 @ 2.27GHz
# of CPU sockets:8
# of Physical cores:64 (128 HT threads)
# RAM: unsure but 2TB's of it!
# of FAH CPU processes:1
# of FAH GPU Clients:0


OS name/kernel version: Ubuntu 10.10
Client: 7.1.24
Running in VM: No

Server 2 - Project 6904;

Average time/frame: 00:22:10
CPU: Xeon X6550 @ 2.0GHz
# of CPU sockets:8
# of Physical cores:64 (128 HT threads)
# RAM: unsure 512GB
# of FAH CPU processes:1
# of FAH GPU Clients:0


OS name/kernel version: Ubuntu 10.10
Client: 7.1.24
Running in VM: No

** Kraken not setup yet.. just migrated from Windows 2008 R2 Datacenter, both machines just received their 1st ever 6904's - estimted time to finish 1.33days
 
Last edited:
Back
Top