Is something wrong, or do I just have crappy luck?

fastgeek

[H]ard|DCOTM x4 aka "That Company"
Joined
Jun 6, 2000
Messages
6,520
Seriously though! :p

Trying to get some "quality' work done with this dual E5-2680 R720 before I have to give it back (hoping to get a little quality time with our R820 with quad E5-4650's over the weekend and until they pry it out of my grubby mitts), but right now I can't get anything other than a 609x WU or other A3 core WUs! Am running Ubuntu; did decide to try out v12, but can't see that being the problem. Have tried both the v6 and v7 clients. have my passkey in place. Learned from my last brain dead mistake and made sure to spec "big" work units, along with the proper v6/v7 smp & bigadv flags. Even tried bigbeta; but based on a recent thread that seems to be dead. Haven't installed TK as there's really no reason to at the moment. Before setting the flags in the v7 client it defaulted to A4 units FWIW.

Thoughts? Suggestions? Point out something painfully obvious that I missed again?!
 
What version are you using now v6 or v7 if v7 use
client-type
bigadv

max-pack-size
big

If v6 make sure you have set the packet size to big and advanced is set to no and you have the flags -smp and -bigadv
 
Check to all of that; exactly how I had the V7 and V6 client configured respectively. Was still getting 609x WU's. After reading your reply I double checked everything and forgot that I had, out of desperation, set the advmethods flag last time with V6 a little bit ago. Killed that, wiped all the usual stuff and got a 6900. (Going to move that WU over to another machine; presuming it will work) You know, this seems to happen a lot - have issues, ask for help, try the same bloody thing I did earlier without luck... then it works!

Now how about you work your "magic" when the R820 comes online? Never seen anything other than a 6900, and I don't mean 690x, on that box... and it's certainly capable of it. :p Was doing a 4m30s TPF on 6900's after all. :D
 
Hmmm I thought you were running Ubuntu you should not of pulled a 6900 using Ubuntu 6901, 6903 6904 or 8101 is what you should have picked up. Are you running Windows ?
 
Hmmm I thought you were running Ubuntu you should not of pulled a 6900 using Ubuntu 6901, 6903 6904 or 8101 is what you should have picked up. Are you running Windows ?

I think 6900 is ligit in the series. I have pulled a rare few with -bigadv.
It's the 609x I would be suprised at.
 
No, I am run Ubuntu; only thing different there is that it's version 12 vs the usual version 11.

I have never seen anything other than a straight 6900 with bigadv. Believe me, I have paid attention too as in the past people have really been interested in seeing what that quad E5 could do to a 8101 and such. Only thing I can think of is that it might have to do with the firewall at work. The initial server always fails, but the secondary goes through. I can put up the failed/successful name/ip when I get in.
 
Ubuntu 12 won't be the issue. Are you specifying the number of cores? -smp XX in the cmd line on v6
 
Have tried both ways, with -smp 32 or -smp 48 (depending on the system) and just -smp by itself. However the client has been recognizing the number of cores on its own.

As an aside, the system in question was assigned said 6900, but then was given a 6097 next time around. I might be able to find the logs, but unless I'm not remembering correctly, I typically see 6097, 6098 and 6099's on a regular basis despite having the -bigadv flag set.
 
I have a feeling the OS may not be reporting your core count correctly to the assignment server, the 6900 are primarily assigned to 8 core systems the server they are on is set to 8 cores the 609x are also on the same server and primarily assigned to 8 core or less machines running the bigadv flag or Multi core rigs running the smp flag.

It appears there is either something wrong in your configuration or the OS is reporting the core count wrong. Have you ever used the core hack on the hard drive you are using or anything like that.
 
Nope. Have heard of the core hack, but know it's frowned on and never looked into it beyond that. Every time I get one of these systems back to folding it's always with a virgin installation of Ubuntu. Right now the system is just running with the regular -smp flag - here's a screen shot showing that it sees and uses the cores.

ScreenShot514.jpg


I will change that to -smp 32 and see how it behaves over night.

*edit* Spaced on making a backup of my FAH folder when I terminated and it lost the progress on the WU. Went ahead and deleted all the usual again... and got a new one... 6099. :(

But FWIW, the slow 48 core system I have *IS* using -smp 48 -bigadv and it too is typically getting 609x WU's with the rare 6900.

So, yeah, it's good that these bang out the 609x WU's pretty fast; but sure would be nice to see something that would stress them properly. :)
 
Oh, and here's what I see every time I get a new WU or send the results... getting a new WU in this particular case.

ScreenShot515.jpg
 
Are you running with -verbosity 9 ? If not, can you add it and see what you get next time client attempts to get a WU?

It smells like client isn't succeeding in contacting main AS and falls back to backup AS [?].
 
Could you post a screen shot of -configonly something is amiss some where set it up like you had it set up for bigadv. edit your passkey out.
 
Doing this will work much better (no offense, Grandpa):
Code:
grep -v passkey client.cfg
 
tear - I will make that change in just a second and get a new WU.

The results of the grep command -

Code:
[settings]
username=fastgeek
team=33
asknet=no
machineid=1
bigpackets=big
extra_parms=-smp 32 -bigadv
local=6

[http]
active=no
host=localhost
port=8080

[core]
priority=96
checkpoint=3
addr=

[clienttype]
memory=48000
type=0
 
Results of -verbosity 9 (AFAIK at least) while getting a new WU. I did rename machinespecific.dat to generate a new one of those too.

Code:
someone@someone-somewhere:~/fah$ ./fah6 

Note: Please read the license agreement (fah6 -license). Further 
use of this software requires that you have read and accepted this agreement.

32 cores detected


--- Opening Log file [July 27 17:03:34 UTC] 


# Linux SMP Console Edition ###################################################
###############################################################################

                       Folding@Home Client Version 6.34

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/someone/fah
Executable: ./fah6
Arguments: -smp 32 -bigadv -verbosity 9 

[17:03:34] - Ask before connecting: No
[17:03:34] - User name: fastgeek (Team 33)
[17:03:34] - User ID not found locally
[17:03:34] + Requesting User ID from server
[17:03:34] - Getting ID from AS: 
[17:03:34] Connecting to http://assign.stanford.edu:8080/
[17:04:37] - Couldn't send HTTP request to server
[17:04:37] + Could not connect to Primary Assignment Server for ID
[17:04:37] Connecting to http://assign2.stanford.edu:80/
[17:04:37] Posted data.
[17:04:37] Initial: F545; - Received User ID = 45F5055561DC962F
[17:04:37] - Machine ID: 1
[17:04:37] 
[17:04:37] Could not open work queue, generating new queue...
[17:04:37] - Preparing to get new work unit...
[17:04:37] Cleaning up work directory
[17:04:37] + Attempting to get work packet
[17:04:37] - Autosending finished units... [July 27 17:04:37 UTC]
[17:04:37] Passkey found
[17:04:37] Trying to send all finished work units
[17:04:37] - Will indicate memory of 48000 MB
[17:04:37] + No unsent completed units remaining.
[17:04:37] - Connecting to assignment server
[17:04:37] Connecting to http://assign.stanford.edu:8080/
[17:04:37] - Autosend completed
[17:05:41] - Couldn't send HTTP request to server
[17:05:41] + Could not connect to Assignment Server
[17:05:41] Connecting to http://assign2.stanford.edu:80/
[17:05:41] Posted data.
[17:05:41] Initial: 8F80; - Successful: assigned to (128.143.231.202).
[17:05:41] + News From Folding@Home: Welcome to Folding@Home
[17:05:41] Loaded queue successfully.
[17:05:41] Sent data
[17:05:41] Connecting to http://128.143.231.202:80/
[17:05:42] Posted data.
[17:05:42] Initial: 0000; - Receiving payload (expected size: 3811375)
[17:05:47] - Downloaded at ~744 kB/s
[17:05:47] - Averaged speed for that direction ~744 kB/s
[17:05:47] + Received work.
[17:05:47] + Closed connections
[17:05:47] 
[17:05:47] + Processing work unit
[17:05:47] Core required: FahCore_a3.exe
[17:05:47] Core found.
[17:05:47] Working on queue slot 01 [July 27 17:05:47 UTC]
[17:05:47] + Working ...
[17:05:47] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 01 -np 32 -priority 96 -checkpoint 3 -verbose -lifeline 16805 -version 634'

thekraken: The Kraken 0.7-pre15 (compiled Thu Jul 26 11:38:19 PDT 2012 by someone@someone-somewhere)
thekraken: Processor affinity wrapper for Folding@Home
thekraken: The Kraken comes with ABSOLUTELY NO WARRANTY; licensed under GPLv2
thekraken: PID: 16818
thekraken: Logging to thekraken.log
[17:05:47] 
[17:05:47] *------------------------------*
[17:05:47] Folding@Home Gromacs SMP Core
[17:05:47] Version 2.27 (Dec. 15, 2010)
[17:05:47] 
[17:05:47] Preparing to commence simulation
[17:05:47] - Looking at optimizations...
[17:05:47] - Created dyn
[17:05:47] - Files status OK
[17:05:48] - Expanded 3810863 -> 4169428 (decompressed 109.4 percent)
[17:05:48] Called DecompressByteArray: compressed_data_size=3810863 data_size=4169428, decompressed_data_size=4169428 diff=0
[17:05:48] - Digital signature verified
[17:05:48] 
[17:05:48] Project: 6097 (Run 0, Clone 31, Gen 363)
[17:05:48] 
[17:05:48] Assembly optimizations on if available.
[17:05:48] Entering M.D.
                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                            :-)  VERSION 4.5.3  (-:

        Written by Emile Apol, Rossen Apostolov, Herman J.C. Berendsen,
      Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra, 
        Gerrit Groenhof, Peter Kasson, Per Larsson, Pieter Meulenhoff, 
           Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz, 
                Michael Shirts, Alfons Sijbers, Peter Tieleman,

               Berk Hess, David van der Spoel, and Erik Lindahl.

       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
            Copyright (c) 2001-2010, The GROMACS development team at
        Uppsala University & The Royal Institute of Technology, Sweden.
            check out http://www.gromacs.org for more information.


                               :-)  Gromacs  (-:

Reading file work/wudata_01.tpr, VERSION 4.5.1-dev-20100930-afd66-dirty (single precision)
Starting 32 threads
[17:05:54] Mapping NT from 32 to 32 
Making 3D domain decomposition 4 x 4 x 2
starting mdrun 'Solvated system'
182000000 steps, 728000.0 ps (continuing from step 181500000, 726000.0 ps).
[17:05:54] Completed 0 out of 500000 steps  (0%)
 
For comparisson
Code:
[settings]
username=Grandpa
team=33
asknet=no
machineid=1
extra_parms=-smp 48 -bigbeta
bigpackets=big
local=88

[http]
active=no
host=127.0.0.1
port=8880
usepasswd=no
 
The only variable I don't understand is local=6, or local=88 in your case. And IIRC BB is dead; but had already tried it earlier w/o any luck.
 
Hmm, you seem to be having issues talking to assign.stanford.edu.

What does this give you:
Code:
wget -q -O - http://assign.stanford.edu:8080/


Expected output:
Code:
$ wget -q -O - http://assign.stanford.edu:8080/
<html><b>OK</b></html>
$

Is it possible outgoing traffic to :8080 is being actively rejected in your environment?
 
I would also try and manually delete [clienttype] and [core] sections (to sync up with Grandpa's output) and try things again (start fresh w/o mahchinedependent.dat, work/ or queue.dat).
 
Hmm, you seem to be having issues talking to assign.stanford.edu.

What does this give you:
Code:
wget -q -O - http://assign.stanford.edu:8080/


Expected output:
Code:
$ wget -q -O - http://assign.stanford.edu:8080/
<html><b>OK</b></html>
$

Is it possible outgoing traffic to :8080 is being actively rejected in your environment?

Yeah, I can tell you now that it doesn't like :8080 for whatever reason here at work. The connection is amazingly fast, 12MegaBYTES per second both ways; but IT has some things locked down. Have ran that command and it just sits there. Are there any good proxies for something like this?
 
You mean proxy as in proxy service that you could just use or a proxy application that you could set up in another location that you control?

BTW, before we venture proxy path I'd try doing client.cfg modification I suggested previously
and see if that makes any difference -- just in case....
 
Either should give you a non-P609x... still, if one doesn't cut it, I'd try the other just to cover all bases...
 
OK, I did as requested and am affraid it's the same deal.

Config as reported by GREP command

Code:
someone@someone-somewhere:~/fah$ grep -v passkey client.cfg
[settings]
username=fastgeek
team=33
asknet=no
machineid=1
extra_parms=-smp 32 -bigadv
bigpackets=big
local=88

[http]
active=no
host=127.0.0.1
port=8880
usepasswd=no

FAH launched after making above changes and deleting all appropriate files.

Code:
32 cores detected
--- Opening Log file [July 27 19:35:09 UTC] 
# Linux SMP Console Edition ###################################################
###############################################################################
                       Folding@Home Client Version 6.34
                          http://folding.stanford.edu
###############################################################################

Launch directory: /home/someone/fah
Executable: ./fah6
Arguments: -smp 32 -bigadv 

[19:35:09] - Ask before connecting: No
[19:35:09] - User name: fastgeek (Team 33)
[19:35:09] - User ID not found locally
[19:35:09] + Requesting User ID from server
[19:36:12] - Couldn't send HTTP request to server
[19:36:12] + Could not connect to Primary Assignment Server for ID
[19:36:12] - Machine ID: 1
[19:36:12] 
[19:36:12] Could not open work queue, generating new queue...
[19:36:12] - Preparing to get new work unit...
[19:36:12] Cleaning up work directory
[19:36:12] + Attempting to get work packet
[19:36:12] Passkey found
[19:36:12] - Connecting to assignment server
[19:37:15] - Couldn't send HTTP request to server
[19:37:15] + Could not connect to Assignment Server
[19:37:16] - Successful: assigned to (128.143.231.202).
[19:37:16] + News From Folding@Home: Welcome to Folding@Home
[19:37:16] Loaded queue successfully.
[19:37:23] + Closed connections
[19:37:23] 
[19:37:23] + Processing work unit
[19:37:23] Core required: FahCore_a3.exe
[19:37:23] Core found.
[19:37:23] Working on queue slot 01 [July 27 19:37:23 UTC]
[19:37:23] + Working ...
thekraken: The Kraken 0.7-pre15 (compiled Thu Jul 26 11:38:19 PDT 2012 by someone@someone-somewhere)
thekraken: Processor affinity wrapper for Folding@Home
thekraken: The Kraken comes with ABSOLUTELY NO WARRANTY; licensed under GPLv2
thekraken: PID: 17185
thekraken: Logging to thekraken.log
[19:37:23] 
[19:37:23] *------------------------------*
[19:37:23] Folding@Home Gromacs SMP Core
[19:37:23] Version 2.27 (Dec. 15, 2010)
[19:37:23] 
[19:37:23] Preparing to commence simulation
[19:37:23] - Looking at optimizations...
[19:37:23] - Created dyn
[19:37:23] - Files status OK
[19:37:23] - Expanded 3810233 -> 4169428 (decompressed 109.4 percent)
[19:37:23] Called DecompressByteArray: compressed_data_size=3810233 data_size=4169428, decompressed_data_size=4169428 diff=0
[19:37:23] - Digital signature verified
[19:37:23] 
[19:37:23] Project: 6097 (Run 0, Clone 95, Gen 424)
[19:37:23] 
[19:37:23] Assembly optimizations on if available.
[19:37:23] Entering M.D.
                         :-)  G  R  O  M  A  C  S  (-:
                   Groningen Machine for Chemical Simulation
                            :-)  VERSION 4.5.3  (-:
        Written by Emile Apol, Rossen Apostolov, Herman J.C. Berendsen,
      Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra, 
        Gerrit Groenhof, Peter Kasson, Per Larsson, Pieter Meulenhoff, 
           Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz, 
                Michael Shirts, Alfons Sijbers, Peter Tieleman,
               Berk Hess, David van der Spoel, and Erik Lindahl.

       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
            Copyright (c) 2001-2010, The GROMACS development team at
        Uppsala University & The Royal Institute of Technology, Sweden.
            check out http://www.gromacs.org for more information.


                               :-)  Gromacs  (-:

Reading file work/wudata_01.tpr, VERSION 4.5.1-dev-20100930-afd66-dirty (single precision)
Starting 32 threads
[19:37:29] Mapping NT from 32 to 32 
Making 3D domain decomposition 4 x 4 x 2
starting mdrun 'Solvated system'
212500000 steps, 850000.0 ps (continuing from step 212000000, 848000.0 ps).
[19:37:30] Completed 0 out of 500000 steps  (0%)
NOTE: Turning on dynamic load balancing
[19:40:05] Completed 5000 out of 500000 steps  (1%)
[19:42:36] Completed 10000 out of 500000 steps  (2%)
^C
Folding@Home Client Shutdown.
someone@someone-somewhere:~/fah$ 

Received the INT signal, stopping at the next NS step

 Average load imbalance: 1.8 %
 Part of the total run time spent waiting due to load imbalance: 0.7 %
 Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 0 % Y 0 % Z 0 %

	Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:    332.945    332.945    100.0
                       5:32
               (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:   1705.804     87.974     11.284      2.127

Thanx for Using GROMACS - Have a Nice Day

Will now try the same with BB; but am expecting the same results.

As for the proxy - ideally one that I can just use. But if need be I could try setting one up elsewhere, presuming I can find a place with a fast enough connection for it not to be a bother.
 
BA = 6097
BB = 6098

Am bringing the 4P system online and am hoping against hope it'll have better luck. Oddly enough Ubuntu 12 did NOT like that system at all (lots of errors regarding the processors); so using 11 instead.
 
Back
Top