Newbie needs help with it 6274

-alias- · Apr 5, 2013

Then I am afraid I can not help you further by myself. I guess that you have configured langouste wrong. There is a guide you can use here: http://hardforum.com/showthread.php?p=1037125476#post1037125476 and a lot more here: http://hardforum.com/forumdisplay.php?f=133

musky · Apr 5, 2013

if you start langouste with the -D flag, it is running in daemon mode, which is why you don't see the first terminal output.

Right now, I would probably recommend disabling langouste and trying to send your completed units with the -send all flag:

Code:

./fah6 -send all

Take as many variables out of the equation as you can until you get units to download and upload correctly.

Thomas R · Apr 5, 2013

This morning I´ve tried to send the WU´s about stopping the terminal with fah;
rename the WU 5 to 1 and start fah; than stop fah; rename WU 6 to 1 and start fah; and so on....
No luck with this.
(The tip with ./fah6 -send all comes to late)

And langouste starts with the -D flag.

Than I deleted again machinedependent.dat, queue.dat, work-folder and a3-Core.
Made a restart and started fah again and what happens?
Downloading core_a3 !!!

I´ve shut down the machine and think F@H is not for me.

I´ll try to format the HDD and install Windows. Than with BOINC any other project.
F@H and Linux are to heavy for me. They are waisting my time.

Qinsp · Apr 5, 2013

I have 7 BigAdv machines. Out of 7 machines, 6 picked up an SMP job at first.

Then after a job or two, it kicks into BigAdv.

Qinsp · Apr 5, 2013

IIRC, normal Windows isn't going to see more than 2 CPU's.

Have you tried just loading the FAH 7.3.6 Linux from folding.stanford.edu? That's what I would start with, and as you get the machine running good, then do the CLI Musky install. For the 7.3.6 you will have EDIT the CPU slot in the FAHControl and ADD the -bigadv flag.

Don't give up. FAH is a worthy project, but it can be a PITA.

Qinsp · Apr 5, 2013

Another problem I've encountered is that if you are running BigAdv projects, then try to go SMP, it will choke and you must reinstall. It looks for a server that won't respond.

Qinsp · Apr 5, 2013

PS - The Musky Install WILL work. I just did 5 quad CPU machines, with no problem but it loading SMP's at first. But all the puzzle pieces must be in the box.

But I understand the frustration when you have a lot of time, effort, and money tied up and aren't getting results. That's why I suggested 7.3.6 install. Once you start making big points, you will smile again.

Thomas R · Apr 5, 2013

First:
I've never read anything about Muskys CLI
Can you give me a link please?

Second:
Must I deinstall the FAH-Client 6.XX bevor installing Client 7?

tjmagneto · Apr 5, 2013

Thomas R said:
First:
I've never read anything about Muskys CLI
Can you give me a link please?

Here's a link to all of the [H] DC guides. http://hardforum.com/forumdisplay.php?f=133

Thomas R · Apr 6, 2013

And where can I find Muskys CLI ??

tear · Apr 6, 2013

Tear walks in with a silver plate on his hand -- Right here, Sir.

Thomas R · Apr 6, 2013

Oh - thank you tear. That´s real loveley....

Can anyone tell me what means "CLI" ???

tear · Apr 6, 2013

Command Line Interface -- just another name for "doing it in a terminal"

Thomas R · Apr 7, 2013

musky said:
if you start langouste with the -D flag, it is running in daemon mode, which is why you don't see the first terminal output.

Right now, I would probably recommend disabling langouste and trying to send your completed units with the -send all flag:

Code:

./fah6 -send all

Take as many variables out of the equation as you can until you get units to download and upload correctly.

How can I disable langouste, without demaging any other process?

Thomas R · Apr 7, 2013

OK - I´ve found it!
If I disable langouste and start F@H again, the User-ID can not be found.
Connection fails over minutes.
When I start langouste again the User-ID can be found, core_a3 comes down and a smp-WU is running.

What should I do?

tear · Apr 7, 2013

If you do want to use Langouste -- just keep it running and make sure you start it
in /etc/rc.local (above exit 0 line), like this (replace username with your Linux user
name):

Code:

sudo -u [b]username[/b] langouste3 -l 8880 -D

If you do not want to use Langouste -- just stop the client, run ./fah6 -configonly
and disable proxy use.

For details: http://hardforum.com/showpost.php?p=1037125476&postcount=5

Thomas R · Apr 8, 2013

Thank you tear.

Since yesterday it looks like it´s running normal.
Between a 8101 and now a 8103 only one smp-WU comes down.

Thomas R · Apr 9, 2013

Hmmm...... I´m not sure of the frequency my CPU´s are running.

I´ve set: "sudo tpc -psmax 1"
The frequency should be at 2500MHz.

If I use "cpufreq-info" it show´s, that the current frequency is at 2000MHz with a possible maximum from 2200MHz.

Can you show me, how I can read the real running frequency?

EDIT:
Found it!
Installing OCNG-Utils
sudo clockspeed

CPU´s are running at 2500.32MHz

Thomas R · Jun 2, 2013

New problems again........Can´t get connected to the server!
Two finished units are in the work-folder. I get this massage:
[11:55:52] + Could not connect to Work Server (results)
[11:55:52] (128.143.199.97:80)
[11:55:52] Could not transmit unit 01 to Collection server; keeping in queue.
[11:55:52] + Sent 0 of 1 completed units to the server
[11:55:52] - Autosend completed
[12:00:06] Completed 197500 out

@musky
To use your tip - have I to disable langouste or is the command -send all usable with running langouste?
If not - How can I disable langouste?
And - at first I have to exit ./fah6 - than I open a new terminal an change to the fah-folder. Than only give in ./fah6 - send all ????

musky said:
if you start langouste with the -D flag, it is running in daemon mode, which is why you don't see the first terminal output.

Right now, I would probably recommend disabling langouste and trying to send your completed units with the -send all flag:

Code:

./fah6 -send all

Take as many variables out of the equation as you can until you get units to download and upload correctly.

tear · Jun 2, 2013

Stop the client.
Run ./fah6 -configonly
Disable proxy.
Start the client.

Thomas R · Jun 6, 2013

I had five ready WU´s in the work-folder.
OK - I could not solve the problem!
I´ve done as tear has written.
The rig was sending data to stanford over hours. I´ve seen this with etherape.
In sum a total of 248MB has been sent to stanford.
Than, about seven hours ago I started the process, the client stopped with this report:

[15:58:02] + Could not connect to Work Server (results)
[15:58:02] (128.143.199.97:80)
[15:58:02] Could not transmit unit 04 to Collection server; keeping in queue.
[15:58:02] + Sent 0 of 1 completed units to the server
[15:58:02] - Failed to send all units to server
[15:58:02] ***** Got a SIGTERM signal (15)
[15:58:02] Killing all core threads

Folding@Home Client Shutdown.

After this I deleted the work-folder with the five ready WU´s, queue.dat and machinedependent.dat, shut down the rig and started new with self-started langouste.
After it fits the first WU it looks like the same problem, but after a few hours the WU was sent.
The second WU was ready eleven hours ago, and the same problem appears:

[13:34:43] + Attempting to send results [June 6 13:34:43 UTC]
[13:34:43] - Reading file work/wuresults_02.dat from core
[13:34:43] (Read 91706842 bytes from disk)
[13:34:43] Connecting to http://128.143.199.97:8080/
[13:34:43] - Couldn't send HTTP request to server
[13:34:43] + Could not connect to Work Server (results)
[13:34:43] (128.143.199.97:8080)
[13:34:43] + Retrying using alternative port
[13:34:43] Connecting to http://128.143.199.97:80/
[13:34:43] - Couldn't send HTTP request to server
[13:34:43] + Could not connect to Work Server (results)
[13:34:43] (128.143.199.97:80)
[13:34:43] Could not transmit unit 02 to Collection server; keeping in queue.
[13:34:43] + Sent 0 of 1 completed units to the server
[13:34:43] - Autosend completed

After six hours another try:

[19:34:43] - Autosending finished units... [June 6 19:34:43 UTC]
[19:34:43] Trying to send all finished work units
[19:34:43] Project: 8103 (Run 1, Clone 44, Gen 110)

[19:34:43] + Attempting to send results [June 6 19:34:43 UTC]
[19:34:43] - Reading file work/wuresults_02.dat from core
[19:34:44] (Read 91706842 bytes from disk)
[19:34:44] Connecting to http://128.143.231.201:8080/
[19:43:57] Completed 90000 out of 250000 steps (36%)

The "wuresults.dat"-file from this WU I can see in my work-folder.
So I could not believe that it is sent to stanford. (The stats say the same)

Somehow I can not believe that the problem is to search at my side.
What should or could I do????

tear · Jun 6, 2013

Try finding a pattern. Did the units you're trying to return come from same or different servers?
Run ./fah6 -queueinfo to get more information about that.

If different servers -- I'd genuinely look into firewall or similar issue (at your end).

Also, paste complete log from ./fah6 -send all [make sure you're not folding as you run it].

Thomas R · Jun 6, 2013

I can´t tell you if the units come from the same or different server.
How or where can I proof this?
Only the server where the units are sent to, is the same.

Or is it the server-ip on the pic?
Because this is the ip where the client tries to send.

About the firewall:
Is Ubuntu comming with a integrated firewall?
I haven´t seen a firewall till yet!

The complete log from ./fah6 - send all I can post you tomorrow.

tear · Jun 6, 2013

How or where can I proof this?

./fah6 -queueinfo

Yes. Server is in the pic. Are these all of the problematic units?
The reason I'm asking is because I've seen another IP address in the snippets you've pasted: 128.143.199.97.

Ubuntu indeed includes a firewall but I'm fairly certain it's disabled by default.
I'm talking about [possibly] some device in your network [like your access router]
that's doing this [== not embedded Ubuntu firewall].

Do you have multiple systems in this location? Do all of them suffer from this issue?

Thomas R · Jun 6, 2013

tear said:
./fah6 -queueinfo

Yes. Server is in the pic. Are these all of the problematic units?
As I´ve written, I deleted the five ready units.
Actually theese three units are all.
The reason I'm asking is because I've seen another IP address in the snippets you've pasted: 128.143.199.97.
Yes - the client tried to use an alternative port.

Ubuntu indeed includes a firewall but I'm fairly certain it's disabled by default.
I'm talking about [possibly] some device in your network [like your access router]
that's doing this [== not embedded Ubuntu firewall].
But why is the first of the three units gone to stanford? The five before too, but the one after not?

Do you have multiple systems in this location? Do all of them suffer from this issue?

Yes - there are five systems in this location. One of them (this one I write now) has problems to. But it is the same: one time it is no problem to send the units. The next day it is´nt possible to send a unit.
The other two systems who run for F@H have no problems!

Thomas R · Jun 7, 2013

And than this:

[23:34:47] - Couldn't send HTTP request to server
[23:34:47] + Could not connect to Work Server (results)
[23:34:47] (128.143.231.201:80)
[23:34:47] - Error: Could not transmit unit 02 (completed June 6) to work server.
[23:34:47] - 5 failed uploads of this unit.

[23:34:47] + Attempting to send results [June 6 23:34:47 UTC]
[23:34:47] - Reading file work/wuresults_02.dat from core
[23:34:47] (Read 91706842 bytes from disk)
[23:34:47] Connecting to http://128.143.199.97:8080/
[23:39:51] Completed 137500 out of 250000 steps (55%)
[23:52:16] Completed 140000 out of 250000 steps (56%)
[00:04:42] Completed 142500 out of 250000 steps (57%)
[00:17:08] Completed 145000 out of 250000 steps (58%)
[00:29:30] Completed 147500 out of 250000 steps (59%)
[00:41:56] Completed 150000 out of 250000 steps (60%)
[00:54:21] Completed 152500 out of 250000 steps (61%)
[01:06:47] Completed 155000 out of 250000 steps (62%)
[01:19:13] Completed 157500 out of 250000 steps (63%)
[01:31:38] Completed 160000 out of 250000 steps (64%)
[01:34:49] - Couldn't send HTTP request to server
[01:34:49] + Could not connect to Work Server (results)
[01:34:49] (128.143.199.97:8080)
[01:34:49] + Retrying using alternative port
[01:34:49] Connecting to http://128.143.199.97:80/
[01:44:00] Completed 162500 out of 250000 steps (65%)
[01:56:24] Completed 165000 out of 250000 steps (66%)
[02:08:49] Completed 167500 out of 250000 steps (67%)
[02:21:13] Completed 170000 out of 250000 steps (68%)
[02:33:37] Completed 172500 out of 250000 steps (69%)
[02:46:01] Completed 175000 out of 250000 steps (70%)
[02:58:23] Completed 177500 out of 250000 steps (71%)
[03:00:36] Posted data.
[03:00:36] Initial: 0000; + Results successfully sent
[03:00:36] Thank you for your contribution to Folding@Home.
[03:00:36] + Number of Units Completed: 30

[03:00:36] Successfully sent unit 02 to Collection server.
[03:00:58] + Sent 1 of 1 completed units to the server
[03:00:58] - Autosend completed

tear · Jun 7, 2013

That is awfully long.

You should work on narrowing down the cause of the issue using something else than FAH
as it's unlikely that FAH backoffice is the problem.

Check quality of connection in your LAN.

These two should really be first order of business:
1. Transferring big (say... 1GB) file in or out of the problematic machine.

  See if you get appropriate speeds. With 100Mbps it should be about 10-11MB/s. With GigE
  you may be limited by your HDDs -- these days I would expect no less than 30MB/s.

2. Running ping -f from the problematic machine to your LAN router, then checking for packet losses
  -- there shouldn't be any (EDIT: well, perhaps a handful at the most).

You could also check 'dmesg' output for excessive link up/down messages.

As well as speculatively switch to another LAN port on the board (if you have more than one port).

Could replace the cable , too.

Thomas R · Jun 7, 2013

Is there any atribute for ping -f I must write?

thomas@bigadv:~$ ping -c 5
Usage: ping [-LRUbdfnqrvVaAD] [-c count] [-i interval] [-w deadline]
[-p pattern] [-s packetsize] [-t ttl] [-I interface]
[-M pmtudisc-hint] [-m mark] [-S sndbuf]
[-T tstamp-options] [-Q tos] [hop1 ...] destination
thomas@bigadv:~$ ping -c 5 ((IP from Router))
PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data.
64 bytes from 192.168.2.1: icmp_req=1 ttl=64 time=2.09 ms
64 bytes from 192.168.2.1: icmp_req=2 ttl=64 time=2.10 ms
64 bytes from 192.168.2.1: icmp_req=3 ttl=64 time=2.11 ms
64 bytes from 192.168.2.1: icmp_req=4 ttl=64 time=2.10 ms
64 bytes from 192.168.2.1: icmp_req=5 ttl=64 time=2.09 ms

--- 192.168.2.1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4003ms
rtt min/avg/max/mdev = 2.098/2.104/2.114/0.058 ms
thomas@bigadv:~$ ^C
thomas@bigadv:~$ ping -f 1((IP from Router))
PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data.
ping: cannot flood; minimal interval, allowed for user, is 200ms
thomas@bigadv:~$

Started a test-download with a 1GB-File:
Maximum speed with 252Kb/s (This is the maximum from my Internet-connection)

Can not start an upload because I have no email or FTP-programm installed here (and I dont know how to do this in linux....)

When I give in the command "dmesg -d" it takes a milli-second and there come a lot of output. Can not copy and paste it - it is to long.

A internet-speedtest for DSL shows perfect results.

tear · Jun 7, 2013

ping -f should be run as root -- prepend the command with sudo if you're on Ubuntu.
Let it run for 30-60 seconds, then interrupt with Ctrl+C.

When I said download/upload test, I meant testing of your LAN == for instance, pulling a file from
a(nother) machine in your LAN.

About dmesg -- just examine it and see if there are excessive messages
suggesting Ethernet link going up/down, for instance, something similar to:

Code:

ADDRCONF(NETDEV_UP): eth0: link is not ready
ADDRCONF(NETDEV_UP): eth0: link becomes ready
ADDRCONF(NETDEV_UP): eth0: link is not ready
ADDRCONF(NETDEV_UP): eth0: link becomes ready

Thomas R · Jun 7, 2013

OK - sudo ping -f:

thomas@bigadv:~$ sudo ping -f 192.168.2.1
[sudo] password for thomas:
PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data.
.^C
--- 192.168.2.1 ping statistics ---
35832 packets transmitted, 35831 received, 0% packet loss, time 62058ms
rtt min/avg/max/mdev = 1.576/1.702/2.774/0.102 ms, ipg/ewma 1.731/1.689 ms
thomas@bigadv:~$

LAN Upload/Download:
From Windows-PC to Linux-PC:
File with 2,7GB in 4min36sec
From Linux-PC to Windows-PC:
File with 2,7GB in 5min45sec

dmesg:
Anything like this?

[17701.080085]
[17701.080085] Call Trace:
[170780.900335] igb: eth0 NIC Link is Down
[170785.206672] ADDRCONF(NETDEV_UP): eth0: link is not ready
[170786.004501] igb: eth1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
[170786.007031] ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[170796.432014] eth1: no IPv6 routers present
[171139.032039] device eth1 entered promiscuous mode
[171931.622406] device eth1 left promiscuous mode
[188515.472017] eth1: no IPv6 routers present
thomas@bigadv:~$

Update:
The last unit goes out without any issue.
I dont know if this is helpfull...?!

Update Sunday:
The last three units are going out without any issue!

Thomas R · Jun 12, 2013

Since last Saturday no problems with the upload!

I´m confused:
Is the problem now at my side or at stanford´s ??

tear · Jun 12, 2013

No idea. Try soliciting some information in the <grind> FoldingForum </grind> next time maybe?
No one else reported issues in [H]...

Thomas R · Jun 12, 2013

OK.
Thanks anyway !!

Newbie needs help with it 6274

Limp Gawd

[H]ard|DCer of the Year 2012

Weaksauce

2[H]4U

2[H]4U

2[H]4U

2[H]4U

Weaksauce

[H]ard DCOTM x2

Weaksauce

[H]ard|DCer of the Year 2011

Weaksauce

[H]ard|DCer of the Year 2011

Weaksauce

Weaksauce

[H]ard|DCer of the Year 2011

Weaksauce

Weaksauce

Weaksauce

[H]ard|DCer of the Year 2011

Weaksauce

[H]ard|DCer of the Year 2011

Weaksauce

[H]ard|DCer of the Year 2011

Weaksauce

Weaksauce

[H]ard|DCer of the Year 2011

Weaksauce

[H]ard|DCer of the Year 2011

Weaksauce

Weaksauce

[H]ard|DCer of the Year 2011

Weaksauce