My PPD with bigadv divided by two without any explanation

Discussion in 'Distributed Computing' started by Jeanjean, Nov 9, 2013.

  1. Jeanjean

    Jeanjean [H]Lite

    Messages:
    99
    Joined:
    Nov 22, 2011
    Hi.

    I have a problem with one of my 4P builded with a H8QGI supermicro board and 4 opterons 6176 overclocked at 2652 mhz with Tear bios .

    Indeed , i can not explain why my PPD has been divided by two after a simple reboot .:eek:

    Indeed , normally my PPD with 8101 is near 12 minutes and now is near 21 minutes !!

    Of course i didn't change anything .

    It seems that my processors don't work at full speed because cpu temperature should be near 50 degrees and now is only near 42 degrees .

    However we can notice that there are 48 working SMP cores .

    In order to help my if you can , you will find attached a screenshot of my system .

    Thanking you in advance for your help .

    [​IMG]
     
  2. Grandpa_01

    Grandpa_01 [H]ard|DCer of the Year 2013

    Messages:
    1,176
    Joined:
    Jun 4, 2011
    Try rebooting and resetting the bios to optimal defaults, then reset your OC through OCNG poweroff and reboot.
     
  3. Jeanjean

    Jeanjean [H]Lite

    Messages:
    99
    Joined:
    Nov 22, 2011
    Thanks Grandpa .


    I will test your solution next tuesday because i can not do it now .
    Indeed, as i am not near the machine, i use teamviewer to observe my 4p system and of course i can not reboot without loosing control .

    if anybody has another suggestion ?
     
  4. Grandpa_01

    Grandpa_01 [H]ard|DCer of the Year 2013

    Messages:
    1,176
    Joined:
    Jun 4, 2011
    could you run fahdiag | pastebinit and post the results. :)
     
  5. Jeanjean

    Jeanjean [H]Lite

    Messages:
    99
    Joined:
    Nov 22, 2011
    Pastebinit is not installed on my machine .

    However , you will find below the result of " fahdiag " command :

     
  6. Grandpa_01

    Grandpa_01 [H]ard|DCer of the Year 2013

    Messages:
    1,176
    Joined:
    Jun 4, 2011
    It appears you have something running in the background are you running system monitor or some other program. If so turn it off and your tpf should go back to normal.

    49 0 0 11769416 147024 674728 0 0 0 0 5228 4160 57 43 0 0
    ^ ^
    this is your system the 57 (sys) should be 90+ and the 43 (int) should be 10-

    48 0 0 28160676 112352 1435628 0 0 0 0 5109 166 90 10 0 0
    This is mine ^ ^
     
  7. sbinh

    sbinh Gawd

    Messages:
    957
    Joined:
    Jul 12, 2008
    It seems you have 2 clients running at same time.

    Exit out your current FAH cllient, then issue the command:

    ps -ef |grep -i fah6 | grep -v grep

    If it returns something like:

    (your_username) 34377 2120 0 Oct30 pts/0 00:01:27 ./fah6

    - Then kill that process:

    kill -9 34377 (your might be diff)

    - Now, restart your fah6 client.
     
  8. tear

    tear [H]ard|DCer of the Year 2011

    Messages:
    1,568
    Joined:
    Jul 25, 2011
    Get rid of these two and see if the problem persists...
     
  9. Jeanjean

    Jeanjean [H]Lite

    Messages:
    99
    Joined:
    Nov 22, 2011
    Hi.

    your command returns me nothing so i suppose that i have only one client running .

    Hi.

    I have killed " mozilla " but i can not kill " Teamviewer " because i will loose the distance control of the system .

    Anyway , i have always used teamviewer and never had such problem.

    My main supposition is that there is a problem with the bios . I will try grandpa solution as soon as possible .

    Thanks.
     
  10. musky

    musky [H]ard|DCer of the Year 2012

    Messages:
    3,155
    Joined:
    Dec 14, 2009
    Even though I suspect that TeamSpeak CPU usage drops to about nothing when you aren't connected, you need to look into using ssh and PuTTY for remote access. You typically don't need GUI access on a folding box. To demonstrate the difference, here Is what fahdiag shows on one of my boxes even when I am connected to it:

    top 5 CPU consumers:
    59637 6391 54 thekraken-FahCo
    60290 0.3 8 bash
    99 0.0 23 watchdog/23
    985 0.0 27 kworker/27:1
    98 0.0 23 ksoftirqd/23
     
  11. tear

    tear [H]ard|DCer of the Year 2011

    Messages:
    1,568
    Joined:
    Jul 25, 2011
    If you're sure there there are no CPU consumers that could mess with FAH, power-cycling
    may be the quickest route to get you going.

    Before power-cycling you could also run
    Code:
    sudo ht-retries
    and see what's going on in there (let it run for at least one minute before looking at Delta data).
     
  12. Jeanjean

    Jeanjean [H]Lite

    Messages:
    99
    Joined:
    Nov 22, 2011
    Here is the results of ht-retries :

    What does it means exactly ?

    Indeed, i never understood the meaning of this command.:eek:

    However , it seems that here is some problem .

     
  13. tear

    tear [H]ard|DCer of the Year 2011

    Messages:
    1,568
    Joined:
    Jul 25, 2011
    HT retry is an event that occurs whenever HT-link needs to be reestablished.

    Generally accepted number of retries is 1 per minute per link.

    Current version of ht-retries (which I encourage you to use -- it's part of ocng-utils --
    http://hardforum.com/showthread.php?t=1677395 steps 18 through 22) refreshes
    every 60 second and prints cumulative and delta (increments) HT-retry counters.

    In simple words, if you see anything above 1 in the Delta section of ht-retries
    (the current version, not the one you've used) after no less than 60 seconds
    then things need to be looked at closely for possible improvement.

    For instance:
    Code:
    ht-retries (OCNG4.4)
    Ctrl+C to interrupt, refreshes every 60s. Only non-zero values are reported.
    
    Cumulative, Sun Nov 10 13:48:44 MST 2013 (1384116524)
           L0S0 L1S0 L2S0 L3S0 L0S1 L1S1 L2S1 L3S1
    Node 0                     a6e8                
    Node 1                                         
    Node 2                                         
    Node 3                                         
    Node 4                                         
    Node 5                                         
    Node 6                                         
    Node 7                                         
    
    Delta, Sun Nov 10 13:48:44 MST 2013 (1384116524)
           L0S0 L1S0 L2S0 L3S0 L0S1 L1S1 L2S1 L3S1
    Node 0                     0001                
    Node 1                                         
    Node 2                                         
    Node 3                                         
    Node 4                                         
    Node 5                                         
    Node 6                                         
    Node 7                                         
    Node 0, Link 0, Sublink 1 shows one retry in the last 60 seconds (which is acceptable).

    It seems you have rebooted the machine on Fri Nov 8 09:06:43 CET 2013, did
    problems start occurring then or before the reboot?

    Also, note that there are 8101s that fold waaaay slower than normally. We do not know
    why that is. If power-cycling the machine and resuming the unit don't help while diagnostics
    check out, consider possibility of WU being just... slow (whatever the reason) and
    monitor performance of next WU.
     
  14. Jeanjean

    Jeanjean [H]Lite

    Messages:
    99
    Joined:
    Nov 22, 2011
    With this machine , i use only version 3 of OC bios .

    So, i can't use ocng .

    At the moment , i fold a 8103 and i have got the same slowdown of the machine .

    So this is not a problem of WU.

    My conclusion is that i must try grandpa's solution before further investigation .

    Thanks Tear for your help . :)
     
  15. tear

    tear [H]ard|DCer of the Year 2011

    Messages:
    1,568
    Joined:
    Jul 25, 2011
    The ht-retries from the new package work fine with any OCNG version as well as stock BIOS.
     
  16. bowlinra

    bowlinra Limp Gawd

    Messages:
    195
    Joined:
    Apr 30, 2012
    Is there some chance, you have a problem with memory? Are you using ECC of Non-ECC? Seems like you might be getting ALOT of errors.
     
  17. Grandpa_01

    Grandpa_01 [H]ard|DCer of the Year 2013

    Messages:
    1,176
    Joined:
    Jun 4, 2011
    He is getting quite a few HT retries but with the old version of the tool you can not tell if that was at boot or if they came latter, Jeanjean you need to do as tear sugest and install the new version of the tools.
     
  18. Linden

    Linden [H]ard|Gawd

    Messages:
    1,193
    Joined:
    Sep 8, 2005
    With respect to TeamViewer, it will slow down Folding on an AMD 4P if you have an open session from one computer to another. It will not slow down Folding if you have closed the session. Running as a service in the background, TeamViewer is benign.
     
  19. Jeanjean

    Jeanjean [H]Lite

    Messages:
    99
    Joined:
    Nov 22, 2011
    Here is the result with new version of HT-retries :

    Hi.

    I am using Non-ECC memory .

    Of course , it is possible that there is a problem with memory .

    But at that stage, i don"t know.

    Hi.

    i am using Teamviewer with all my machines and never had this sort of problem .
     
    Last edited: Nov 11, 2013
  20. Grandpa_01

    Grandpa_01 [H]ard|DCer of the Year 2013

    Messages:
    1,176
    Joined:
    Jun 4, 2011
    So most of those were on boot or prior to the last screen shot, It looks like none since the last screen shot, and I assume that was folding when you ran the test, so no real problem there. :)

    I would say that a reboot will fix it, and if it gets a new WU before you get to it it may fix itself as tear mentioned earlier. It would be nice if you could run fahdiag after you reboot it or if it starts running right.

    By the way tell toTOW that Grandpa says Hi. :)
     
  21. Jeanjean

    Jeanjean [H]Lite

    Messages:
    99
    Joined:
    Nov 22, 2011
    Yes, i was folding since the last screenshot .

    So this is a good news .

    On the other hand, as soon as i reboot ( tomorrow morning normally ) , i will keep you informed about the result .

    And of course i will say Hello to Totow for you .;)

    We have got a small forum in France but sadly it is much less active than hardforum.
     
  22. Jeanjean

    Jeanjean [H]Lite

    Messages:
    99
    Joined:
    Nov 22, 2011
    Hi.

    I come back to you because i have done what Grandpa requested in his first post .

    Indeed , i did a reboot , went into the bios and did a reset to "optimal defaults " .
    After i reseted OC trough OCNG and finally applied OC again .

    And now everything is back to normal ! See below .

    Cheers ! :cool:


    [​IMG]
     
  23. Grandpa_01

    Grandpa_01 [H]ard|DCer of the Year 2013

    Messages:
    1,176
    Joined:
    Jun 4, 2011
    Happy to see it is back to normal. If you get a chance could you post a fresh fahdiag just to see how things are running and to see if we can spot anything that may have changed.
     
  24. Jeanjean

    Jeanjean [H]Lite

    Messages:
    99
    Joined:
    Nov 22, 2011
    Here is a fresh fahdiag .

     
  25. Grandpa_01

    Grandpa_01 [H]ard|DCer of the Year 2013

    Messages:
    1,176
    Joined:
    Jun 4, 2011
    Hmm something is still amiss and I would bet you can do better looking at your vmstats something is still running in the background and stealing CPU cycles.

    48 0 0 12819112 84652 419848 0 0 0 22 5222 4815 69 31 0 0
    48 0 0 12823032 84652 419848 0 0 0 0 5316 4963 71 29 0 0
    These 2 lines the ones I highlighted in red are your user (FAH) and System (other OS things) as you can see yours are in the 70 range for fah that should be in the 90 range and 30 for the system that should be in the 10 range. In other words your computer should be running 90% of the time on FAH and 10% of the time on other things.

    The paste bin link below is 3 of my rigs running bigadv the first is running an
    1st 8101
    2nd 8104
    3rd 8105
    As you can see they are all running in the 90 / 10 range
    http://pastebin.com/jx8HKKwL

    run (top) in a command window and see if it shows anything else running using a few % of a cpu time also run (vmstat 10) in a command window for a couple of minutes and post the results here we may be able to shave some time off your tpf and get you a little more ppd. :cool:
     
  26. Jeanjean

    Jeanjean [H]Lite

    Messages:
    99
    Joined:
    Nov 22, 2011
    Here are the results of " top " and " vmstat 10 " commands .

    Please don't take into account Teamviewer_Desk because usually it doesn't run .

    [​IMG]
     
  27. Grandpa_01

    Grandpa_01 [H]ard|DCer of the Year 2013

    Messages:
    1,176
    Joined:
    Jun 4, 2011
    Well there are a couple of things you can do, Are you using the 4P as your main rig or just as a folding rig. I see you are still running 10.10, also is there any reason you have to have teamviewer running on it, you can monitor it with HFM from another rig teamviewer is using a bit of cpu time. Also using the GUI is using a bit of resources (Xorg) you can ssh in to do most of the things you want to do on a Linux 4P folding rig. Anyway here are the things I would do.

    #1 install 12.04 and use fahinstall from here http://hardforum.com/showpost.php?p=1037125470&postcount=2 fahinstall has some OS optimiseations that help fah out. If you are old like me you may be hesitant to change due to having to learn something new.:D I actually thought I was the last person to give up and change to 12.04 from 10.10, the team brow beat me for quite a while to accomplish this feat.

    #2 get rid of teamviewer and monitor FAH remoteley from another machine using HFM.

    #3 use the GUI as little as possible or not at all you can go headless.

    If you are using the rig for your daily internet machine #2 and #3 obviously will not be possible, #2 may be as far as changing to HFM vs teamviewer I do not know if HFM uses less resources or not when used for monitoring on the box itself.
    Anyway at a minimum I would do #1 ;)
     
  28. Jeanjean

    Jeanjean [H]Lite

    Messages:
    99
    Joined:
    Nov 22, 2011
    Hi.

    Thanks for all these ways of improvement .

    I will make it a try as soon as i have some time. :)
     
  29. Linden

    Linden [H]ard|Gawd

    Messages:
    1,193
    Joined:
    Sep 8, 2005
    Grandpa will not steer you wrong.

    TeamViewer. If you enjoy the simplicity of TeamViewer, keep using it. Unless you have an open remote session, it does not slow down Folding on a 4P when it is running in the background.