"long pause after WU finish" strikes back

tear

[H]ard|DCer of the Year 2011
Joined
Jul 25, 2011
Messages
1,568
Per summary. Seems that updated Ubuntu 10.10 kernels make
the problem appear again (even w/ext3).

Looking for daring individuals with Ubuntu 10.10 (desktop) (UPDATE:
any kernel revision), HDD (no SSDs please!) that could run few
commands and post test results.

Test prep: log in/open terminal and type:
Code:
wget http://darkswarm.org/synctest.c
gcc -O2 -o synctest synctest.c

Acutal test (please post output from these):
Code:
uname -r
df -hT
grep . /sys/block/sd[a-z]/device/model
time ./synctest
time ./synctest 1024

Also, please locate BigAdv WU finish in your FAHlog.txt and post
this fragment (from "Writing" to "Folding@home Core Shutdown"):
Code:
[16:27:20] - Writing 49197146 bytes of core data to disk...
[16:27:20]   ... Done.                                  <-- Hangs here
[16:43:37] - Shutting down core
[16:43:37]
[16:43:37] Folding@home Core Shutdown: FINISHED_UNIT

Reference output follows:
Code:
user@host:~$ uname -r
2.6.26-2-amd64
user@host:~$ df -hT
Filesystem     Type   Size  Used Avail Use% Mounted on
/dev/md2       ext3   361G  330G   13G  97% /
tmpfs          tmpfs  2.0G     0  2.0G   0% /lib/init/rw
udev           tmpfs   10M  780K  9.3M   8% /dev
tmpfs          tmpfs  2.0G     0  2.0G   0% /dev/shm
/dev/md1       ext3   2.0G  124M  1.8G   7% /boot
user@host:~$ grep . /sys/block/sd[a-z]/device/model
/sys/block/sda/device/model:SAMSUNG HD403LJ 
/sys/block/sdb/device/model:SAMSUNG HD403LJ 
user@host:~$ time ./synctest
Performing 100 iterations, 4096 bytes each
Done.

real	0m0.128s
user	0m0.000s
sys	0m0.008s
user@host:~$ time ./synctest 1024
Performing 1024 iterations, 4096 bytes each
Done.

real	0m0.993s
user	0m0.000s
sys	0m0.084s
user@host:~$


Thanks in advance :)
 
Last edited:
Using 11.12. Hope this helps...


tony@brickhouse:~/fold$ uname -r
3.0.0-14-generic
tony@brickhouse:~/fold$ time ./synctest
Performing 100 iterations, 4096 bytes each
Done.

real 0m0.067s
user 0m0.000s
sys 0m0.008s
tony@brickhouse:~/fold$ time ./synctest 1024
Performing 1024 iterations, 4096 bytes each
Done.

real 0m0.962s
user 0m0.000s
sys 0m0.064s
 
What we are looking for are folks who have had an Ubuntu install running for a while without updating it. I am thinking kendrak and jebo, but I am sure there are others who set up an Ubuntu machine several months ago and have not touched it.
 
My install is about a month old I think and I have updates turned off. I can run this when i get home. Will f@h mess with the results or can I leave it running while I do this?
 
This will not mess with F@H - you can leave it running when you run this.

Just a note - we know 2.6.35-28 is affected and that 2.6.35-22 is not. What we need are kernel versions between those to see what is and is not affected.
 
I ended up reinstalling Ubuntu 10.10 w/o updates. :(
 
Last edited:
here we go from one of my machines

2.6.35-31-generic

Performing 100 iterations, 4096 bytes each
Done.

real 0m2.079s
user 0m0.010s
sys 0m0.000s
nathan@Hood:~$ time ./synctest 1024
Performing 1024 iterations, 4096 bytes each
Done.

real 0m21.448s
user 0m0.000s
sys 0m0.090s

Edit evidently the upgrade borked my BFS install, correcting now.

edit 2: Just found this in the logs - the updates were applied before this WU finished so not sure what is happening here???

[07:10:51] Leaving Run
[07:10:52] - Writing 230811534 bytes of core data to disk...
[07:11:31] Done: 230811022 -> 222454000 (compressed to 3.3 percent)
[07:11:32] ... Done.
[07:12:01] - Shutting down core
 
Last edited:
Exactly. Basically we need to find the exact version that started exhibiting the problem first or last
version that was free of the issue. Not enough time to go through them here.

Anyway, ~2s for the first case (100 iterations) and ~20s for the second case (1024 iterations) means
your kernel makes FahCore/client issue manifest (English: apparently "bad" result).
Anything below 1s for each means that kernel hides FahCore/client issue (fast but unreliable; English:
apparently "good" result).

Per musky, we know that 2.6.35-28 is "affected" and 2.6.35-22 is "not affected" == we need data from
kernel revs between the two...
 
Ok i've just checked my other linux box and i've got 2.6.35-30 installed so i can't be of more help unless someone is willing to post instructions on how to download and install a specific kernel.

I'm still a bit confused tho, with ext 4 the pause between writing core data to disk and done was upwards of 40 minutes, mine both use ext 3:-
[07:10:52] - Writing 230811534 bytes of core data to disk...
[07:11:31] Done: 230811022 -> 222454000 (compressed to 3.3 percent)
[07:11:32] ... Done.
Even tho my rigs have the affected kernel the write time is still less than a minute, i am not seeing the problem that TJ and Chandan are. Am i missing something or does it not affect every set up in the same way??

Specs are Hood and Warspite in sig, both are on HDD's not SSD's. If you need more info just ask.
 
Interesting... can you run the test anyway please?
 
No Problem, the test for Hood is in my earlier post. Test for warspite is as follows:-


2.6.35-30 generic-ck

Performing 100 iterations, 4096 bytes each
Done.

real 0m2.076s
user 0m0.010s
sys 0m0.000s
nathan@Warspite:~$ time ./synctest 1024
Performing 1024 iterations, 4096 bytes each
Done.

real 0m21.108s
user 0m0.040s
sys 0m0.090s
 
Do you host FAH client on a different partition or a ramdisk/tmpfs ?
Can you post output of 'df -hT' please? Something's not adding up.
 
Ah, here we go, seems that one needs to look all the way through shutting down core (or
subsequent msgs) -- from FF


Code:
[16:27:20] - Writing 49197146 bytes of core data to disk...
[16:27:20]   ... Done.                                  <-- Hangs here
[16:43:37] - Shutting down core
[16:43:37]
[16:43:37] Folding@home Core Shutdown: FINISHED_UNIT

Nathan, can you post a bit more of the log? Just curious.
 
can you downgrade from 2.6.35-31 to 2.6.35-22? I googled but nothing was clear or it wouldn't work.
22 was faster on my 4P than 31 is now, but does the kernel version affect wireless operation working.
When I upgraded, my wireless card started workin PnP but with 22 it would not, may have been driver addon that I didn't see happen though
Hope this is not too stupid of a question.
 
can you downgrade from 2.6.35-31 to 2.6.35-22? I googled but nothing was clear or it wouldn't work.
22 was faster on my 4P than 31 is now, but does the kernel version affect wireless operation working.
When I upgraded, my wireless card started workin PnP but with 22 it would not, may have been driver addon that I didn't see happen though
Hope this is not too stupid of a question.

Probably - was going to work on that this evening. You should be able to update grub to boot to a previous kernel. I just did something like this related to "uninstalling" BFS.

Your question about the wireless card support is also a good one, and also one I can't answer. Drivers shouldn't care about kernel versions I don't believe, but I really don't know.
 
I know you aren't looking for server versions but in case anyone wants to compare, because my times seemed weird also:

2.6.32-37-server

folding@AMD61284P:~$ time ./synctest
Performing 100 iterations, 4096 bytes each
Done.

real 0m3.650s
user 0m0.000s
sys 0m0.030s

folding@AMD61284P:~$ time ./synctest 1024
Performing 1024 iterations, 4096 bytes each
Done.

real 0m37.277s
user 0m0.000s
sys 0m0.210s
folding@AMD61284P:~$
 
After re-installation without updates:
Code:
tim@Voltron:~$ uname -r
2.6.35-22-generic
tim@Voltron:~$ time ./synctest
Performing 100 iterations, 4096 bytes each
Done.

real    0m1.983s
user    0m0.000s
sys    0m0.010s
tim@Voltron:~$ time ./synctest 1024
Performing 1024 iterations, 4096 bytes each
Done.

real    0m20.843s
user    0m0.000s
sys    0m0.160s
tim@Voltron:~$
 
Update on this:

The problem is not with the kernel version. It is with the Desktop (-generic) versus Server (-server) variants. The newest server version:
Code:
dave@bulldawg:~$ uname -r
2.6.35-31-server
dave@bulldawg:~$ time ./synctest
Performing 100 iterations, 4096 bytes each
Done.

real    0m0.123s
user    0m0.000s
sys     0m0.020s
dave@bulldawg:~$ time ./synctest 1024
Performing 1024 iterations, 4096 bytes each
Done.

real    0m1.173s
user    0m0.000s
sys     0m0.200s
That is actually slightly slower than the 2.6.35-22-server kernel, but not nearly as bad as the -generic kernels of any version.

What may be the more interesting part is that this machine is running Ubuntu Desktop. If it really does matter, I'll get something written up on how to change to the server version of the kernel.
 
I'm apparently running 2.6.38-8-generic and I am not affected by the issue.
 
2.6.38-13-generic


Performing 100 iterations, 4096 bytes each
Done.

real 0m4.610s
user 0m0.000s
sys 0m0.010s

Performing 1024 iterations, 4096 bytes each
Done.

real 0m56.103s
user 0m0.000s
sys 0m0.190s


Just got the 8P moved and decided to run the test... Looks like the worst results yet.

Going to wipe and start over again...
 
Here's mine:

Code:
2.6.35-30-generic

Performing 100 iterations, 4096 bytes each
Done.

real    0m2.071s
user    0m0.020s
sys     0m0.000s

Performing 1024 iterations, 4096 bytes each
Done.

real    0m20.657s
user    0m0.160s
sys     0m0.000s

Looking forward to improvement recommendations once the issue is resolved, thanks tear and musky!
 
Just did a fresh install of 11.04...

2.6.38-8-generic

Performing 100 iterations, 4096 bytes each
Done.

real 0m2.107s
user 0m0.010s
sys 0m0.010s

Performing 1024 iterations, 4096 bytes each
Done.

real 0m26.184s
user 0m0.000s
sys 0m0.190s


Crap, well its better than my last one by a mile, will have to use this for the time being.
 
Based on received feedback it seems synctest may not be simulating client/core behaviour accurately.
That said, we need to take few steps back and work individually with chandan, tj and UTO...

No need to panic, carry on with your business.
 
No need to panic, carry on with your business.

Unless you are being read Vogon poetry:
Oh flundered gruntbuggly,
Thy micturations are to me
As plurdled gabbleblotchits
On a lurgid bee
That mordiously hath bitled out
Its earted jurtles
Into a rancid festering
Now the jurpling slayjid agrocrustles
Are slurping hagrilly up the axlegrurts
And living glupules frart and slipulate
Like jowling meated liverslime
Groop, I implore thee, my foonting turlingdromes
And hooptiously drangle me
With crinkly bindlewurdles,
Or else I shall rend thee in the gobberwarts with my blurglecruncheon
See if I don't.
BTW, no issues to report during the current WU. I'll check the log on what happens once the WU completes later today.
 
Last edited:
This is the Dual X5570, Ubuntu 11.04, BFS

untitledone@folder:~$ uname -r
2.6.38-11-generic
untitledone@folder:~$ df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda5 ext4 9.9G 4.1G 5.3G 44% /
none devtmpfs 3.0G 644K 3.0G 1% /dev
none tmpfs 3.0G 16K 3.0G 1% /dev/shm
none tmpfs 3.0G 400K 3.0G 1% /var/run
none tmpfs 3.0G 0 3.0G 0% /var/lock
/dev/sda2 ext3 274G 866M 260G 1% /home/untitledone/bootmount
untitledone@folder:~$ grep . /sys/block/sd[a-z]/device/model
ST3320613AS
untitledone@folder:~$ time ./synctest
Performing 100 iterations, 4096 bytes each
Done.

real 0m2.493s
user 0m0.000s
sys 0m0.020s
untitledone@folder:~$ time ./synctest 1024
Performing 1024 iterations, 4096 bytes each
Done.

real 0m26.079s
user 0m0.000s
sys 0m0.200s
 
Here's mine just in case it proves useful.

athan@Hood:~$ uname -r
2.6.35-31-generic
nathan@Hood:~$ df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda5 ext3 67G 4.1G 59G 7% /
none devtmpfs 5.9G 208K 5.9G 1% /dev
none tmpfs 5.9G 116K 5.9G 1% /dev/shm
none tmpfs 5.9G 608K 5.9G 1% /var/run
none tmpfs 5.9G 0 5.9G 0% /var/lock

and from the log.

09:19:56] logfile size: 209495
[09:19:56] Leaving Run
[09:19:59] - Writing 230814231 bytes of core data to disk...
[09:20:38] Done: 230813719 -> 222456394 (compressed to 3.3 percent)
[09:20:38] ... Done.
[09:21:05] - Shutting down core
[09:21:05]
[09:21:05] Folding@home Core Shutdown: FINISHED_UNIT
[09:21:07] CoreStatus = 64 (100)
[09:21:07] Unit 7 finished with 83 percent of time to deadline remaining.
[09:21:07] Updated performance fraction: 0.818605
 
From the 6903 I finished this morning:

16:46:15] - Writing 230897532 bytes of core data to disk...
[16:47:17] Done: 230897020 -> 222553433 (compressed to 3.3 percent)
[16:47:18] ... Done.
[16:47:32] - Shutting down core
[16:47:32]
[16:47:32] Folding@home Core Shutdown: FINISHED_UNIT
 
This is the Dual X5570, Ubuntu 11.04, BFS

untitledone@folder:~$ uname -r
2.6.38-11-generic
untitledone@folder:~$ df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda5 ext4 9.9G 4.1G 5.3G 44% /
none devtmpfs 3.0G 644K 3.0G 1% /dev
none tmpfs 3.0G 16K 3.0G 1% /dev/shm
none tmpfs 3.0G 400K 3.0G 1% /var/run
none tmpfs 3.0G 0 3.0G 0% /var/lock
/dev/sda2 ext3 274G 866M 260G 1% /home/untitledone/bootmount
untitledone@folder:~$ grep . /sys/block/sd[a-z]/device/model
ST3320613AS
untitledone@folder:~$ time ./synctest
Performing 100 iterations, 4096 bytes each
Done.

real 0m2.493s
user 0m0.000s
sys 0m0.020s
untitledone@folder:~$ time ./synctest 1024
Performing 1024 iterations, 4096 bytes each
Done.

real 0m26.079s
user 0m0.000s
sys 0m0.200s


[10:39:55] Finished Work Unit:
[10:39:55] - Reading up to 121622496 from "work/wudata_04.trr": Read 121622496
[10:39:56] trr file hash check passed.
[10:39:56] - Reading up to 108808872 from "work/wudata_04.xtc": Read 108808872
[10:39:56] xtc file hash check passed.
[10:39:56] edr file hash check passed.
[10:39:56] logfile size: 213136
[10:39:56] Leaving Run
[10:39:56] - Writing 230817496 bytes of core data to disk...
[10:40:35] Done: 230816984 -> 222462838 (compressed to 3.3 percent)
[10:40:35] ... Done.
[10:40:58] - Shutting down core
[10:40:58]
[10:40:58] Folding@home Core Shutdown: FINISHED_UNIT
[10:41:01] CoreStatus = 64 (100)
[10:41:01] Unit 4 finished with 77 percent of time to deadline remaining.
[10:41:01] Updated performance fraction: 0.776785
[10:41:01] Sending work to server
[10:41:01] Project: 6903 (Run 3, Clone 21, Gen 13)




Just grabbed this from the quoted machine. I finished that unit a few days ago.
 
Just did a fresh install of 11.04...

2.6.38-8-generic

Performing 100 iterations, 4096 bytes each
Done.

real 0m2.107s
user 0m0.010s
sys 0m0.010s

Performing 1024 iterations, 4096 bytes each
Done.

real 0m26.184s
user 0m0.000s
sys 0m0.190s


Crap, well its better than my last one by a mile, will have to use this for the time being.

Code:
[06:48:12]   ... Done.
[06:48:50] - Shutting down core

8P just finished a 6903 not that long ago, 38 seconds to write it with the kernel quoted.
 
Running 10.04 Server, untouched/updated since bringing it online many moons ago.

Code:
veeb0rg@happy:~$ uname -r
2.6.35-28-generic-ck
veeb0rg@happy:~$ df -hT
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/sda1     ext3     71G  1.5G   66G   3% /
none      devtmpfs    3.0G  172K  3.0G   1% /dev
none         tmpfs    3.0G     0  3.0G   0% /dev/shm
none         tmpfs    3.0G  352K  3.0G   1% /var/run
none         tmpfs    3.0G     0  3.0G   0% /var/lock
veeb0rg@happy:~$ grep . /sys/block/sda/device/model
WDC WD800JD-75MS
veeb0rg@happy:~$ time ./synctest
Performing 100 iterations, 4096 bytes each
Done.

real    0m2.081s
user    0m0.000s
sys     0m0.000s
veeb0rg@happy:~$ time ./synctest 1024
Performing 1024 iterations, 4096 bytes each
Done.

real    0m21.591s
user    0m0.070s
sys     0m0.000s
 
Last edited:
Just built last week following the musky guide, U10.10 desktop, all updates applied:

Code:
rick@Server5:~/Downloads$ uname -r
2.6.35-31-generic
rick@Server5:~/Downloads$ df -hT
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/sda1     ext3    221G  3.6G  206G   2% /
none      devtmpfs    3.9G  212K  3.9G   1% /dev
none         tmpfs    4.0G  152K  4.0G   1% /dev/shm
none         tmpfs    4.0G  416K  4.0G   1% /var/run
none         tmpfs    4.0G     0  4.0G   0% /var/lock
rick@Server5:~/Downloads$ grep . /sys/block/sd[a-z]/device/model
Maxtor 7Y250M0  
rick@Server5:~/Downloads$ time ./synctest
Performing 100 iterations, 4096 bytes each
Done.

real	0m2.140s
user	0m0.000s
sys	0m0.000s
rick@Server5:~/Downloads$ time ./synctest 1024
Performing 1024 iterations, 4096 bytes each
Done.

real	0m21.890s
user	0m0.070s
sys	0m0.000s

[13:26:00] Finished Work Unit:
[13:26:00] - Reading up to 121622496 from "work/wudata_01.trr": Read 121622496
[13:26:01] trr file hash check passed.
[13:26:01] - Reading up to 108765168 from "work/wudata_01.xtc": Read 108765168
[13:26:02] xtc file hash check passed.
[13:26:02] edr file hash check passed.
[13:26:02] logfile size: 221020
[13:26:02] Leaving Run
[13:26:05] - Writing 230781676 bytes of core data to disk...
[13:27:15] Done: 230781164 -> 222422001 (compressed to 3.3 percent)
[13:27:16]   ... Done.
[13:27:40] - Shutting down core
[13:27:40] 
[13:27:40] Folding@home Core Shutdown: FINISHED_UNIT
 
First 10.10 folder (no updates), 2.6.35.22 generic, JFS filesystem

Code:
apoc@apoc-H8QGL:~$ wget http://darkswarm.org/synctest.c
--2012-02-06 00:58:25--  http://darkswarm.org/synctest.c
Resolving darkswarm.org... 85.11.66.60
Connecting to darkswarm.org|85.11.66.60|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 604 [text/plain]
Saving to: `synctest.c'

100%[======================================>] 604         --.-K/s   in 0s      

2012-02-06 00:58:26 (93.0 MB/s) - `synctest.c' saved [604/604]

apoc@apoc-H8QGL:~$ gcc -O2 -o synctest synctest.c
apoc@apoc-H8QGL:~$ uname -r
2.6.35-22-generic
apoc@apoc-H8QGL:~$ df -hT
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/sda2      jfs     71G  3.1G   68G   5% /
none      devtmpfs     16G  212K   16G   1% /dev
none         tmpfs     16G  196K   16G   1% /dev/shm
none         tmpfs     16G   92K   16G   1% /var/run
none         tmpfs     16G     0   16G   0% /var/lock
apoc@apoc-H8QGL:~$ grep . /sys/block/sd[a-z]/device/model
WDC WD800JD-75MS
apoc@apoc-H8QGL:~$ time ./synctest
Performing 100 iterations, 4096 bytes each
Done.

real    0m1.132s
user    0m0.000s
sys    0m0.010s
apoc@apoc-H8QGL:~$ time ./synctest 1024
Performing 1024 iterations, 4096 bytes each
Done.

real    0m10.551s
user    0m0.000s
sys    0m0.120s
apoc@apoc-H8QGL:~$

~~EDIT~~

Code:
[09:29:05] - Reading up to 121622496 from "work/wudata_01.trr": Read 121622496
[09:29:06] trr file hash check passed.
[09:29:06] - Reading up to 108811624 from "work/wudata_01.xtc": Read 108811624
[09:29:07] xtc file hash check passed.
[09:29:07] edr file hash check passed.
[09:29:07] logfile size: 208590
[09:29:07] Leaving Run
[09:29:10] - Writing 230815702 bytes of core data to disk...
[09:30:25] Done: 230815190 -> 222473076 (compressed to 3.3 percent)
[09:30:25]   ... Done.                     <------ 22 minute delay
[09:52:30] - Shutting down core
[09:52:30] 
[09:52:30] Folding@home Core Shutdown: FINISHED_UNIT
 
Last edited:
Collective:

Since the beginning of this thread, I've not applied any of the kernel updates suggested but have been applying "other" updates. Do we know if there is actually an issue or at least a range of versions to avoid?

Many thanks
 
False alarm, seems. Evidently, synctest doesn't accurately reproduce application's behavior (which is odd). Haven't had time to dig more into this.

AFAICT, current recommendations are still good.
 
i have started getting this with 12.04 LTS deskop again. takes 20 min to shut down
 
Back
Top