Big Progress in FAH Daily Production

ok this is a dumb question but I've been out of most this for a bit, I've been running my 970s on my computer off and on when I can, is there a way to set them to like 80-90% so I can bump up the fans and then just leave them on? I cant do extended periods of at 100% because now that PC is near where I sleep since my office is now a nursery lol. Any way I can 80 or 90 it so the fan noise doesnt kill me or the wife?
 
ok this is a dumb question but I've been out of most this for a bit, I've been running my 970s on my computer off and on when I can, is there a way to set them to like 80-90% so I can bump up the fans and then just leave them on? I cant do extended periods of at 100% because now that PC is near where I sleep since my office is now a nursery lol. Any way I can 80 or 90 it so the fan noise doesnt kill me or the wife?

If you're on windows, you can set the manual fan speed with MSI Afterburner.

If you're on linux, you can use this script from github after you enable coolbits.

I'm not sure about OS X fan speed control.
 
This stupid bad state thing is killing my production. Apparently my house lost power yesterday while I was at work. This means that the clock changes I did with inspector changed back to defaults and I started getting a bunch of bad states again killing production. I think I solved that issue now by setting up a startup link to launch when the computer boots.

However, I discovered that one of the GPUs needed to be lower than 100Mhz so I dropped it down by 200Mhz for the time being. So that killed production for most of the day today since I just noticed it an hour ago.

I lost the #5 spot because of this. :(
 
I was wondering about throttling the actual client so I don't have to have the fans so loud, If I can put the client to 80 or 90, then I can leave the fans at 55-60 and still fold quietly to keep the wife off my back lol
 
I'm not sure if you can do it that way, but you could probably under clock the GPUs low enough that they wont heat up as much.
 
This stupid bad state thing is killing my production. Apparently my house lost power yesterday while I was at work. This means that the clock changes I did with inspector changed back to defaults and I started getting a bunch of bad states again killing production. I think I solved that issue now by setting up a startup link to launch when the computer boots.

However, I discovered that one of the GPUs needed to be lower than 100Mhz so I dropped it down by 200Mhz for the time being. So that killed production for most of the day today since I just noticed it an hour ago.

I lost the #5 spot because of this. :(

Is that on a Nvidia 9xx card if it is then try this solution I posted earlier using Nvidia Inspector http://hardforum.com/showpost.php?p=1041935612&postcount=71
 
Is that on a Nvidia 9xx card if it is then try this solution I posted earlier using Nvidia Inspector http://hardforum.com/showpost.php?p=1041935612&postcount=71

I already did that. That's what I was talking about doing. :D

Remember when you told me to go -100 on both cards and then slowly start adding +10 until I get the error? Well even at -100 I still get an error on one of the cards. So I went down to -200. Just now getting on the PC to check and I am STILL getting errors on that same card. So maybe I need to go -300 on it?
 
What temp is core on the card with the errors? Is the affected card well-ventilated? It is possible to have a core running within tolerances but other card components scorching hot. I had a thorough course on video card cooling when I was an active multi-GPU participant at Folding@Home (ran quite a number 9800 GX2s and GTX 295s). Cooling video cards can be very tricky.
 
Both cards are around 46c to 54c.

I have 1 980 that needs -350 on some wu'so it can vary one thing about it though I have not noticed much different ln performance when under clocking the memory
 
Strange thing is my 980 doesn't get that error at all. Haven't touched it. I've got the other 970 card -300 now and so far so good.
 
Strange thing is my 980 doesn't get that error at all. Haven't touched it. I've got the other 970 card -300 now and so far so good.

That is odd my 970 is not as problematic as my 980's are
 
Time to update your signature granpa! :p

Do you think an i3 will hold back a 970 GTX folding?
 
OK,

I'm done with part time folders pushing me out of the daily top 10.

Time to bring the heat......

http://www.newegg.com/Product/Product.aspx?Item=N82E16814127911

I have one behind my recliner (for my son's rig which folds part time in the winter) and one for my year round rig.

On the other hand, I think I have to add my GTX970 to the "died in the line of duty" thread. I'll be talking with MSI tomorrow to see if I can RMA.
 
Last edited:
I had periodic issues with the bad frames on my GTX970. I just restarted boxen.... and the unit would pick up and finish no problem.
 
"This is not a core21 problem it is a Nvidia problem that affects the upper end Maxwells mainly GTX 980's and some of the 970's, It may also affect 980 TI's but I do not know that since I do not have a 980 TI to test. I have a feeling Nvidia knows about it since they took steps to reduce the problem. When you run compute software on the Maxwells it defaults to P2 state which has the same core clock but has a reduced memory speed it gets lowered from 7000Mhz to 6000Mhz no other generation of Nvidia GPU's does this that I know of."


Actually this has affected my 960 since I got it earlier this year. Not just core 21's either. Core boosts to 1455 while folding but memory only 6 Ghz. Couldn't OC memory in Afterburner either. It mystified me for quite awhile, couldn't understand why Afterburner mem OC wouldn't take. I found out by fluke one day by shutting off GPU folding before I wanted to play Witcher 3 then using Afterburner the memory OC would take and I happily played away at 8GHz but as soon as I exited game and resumed folding the mem clocks were 6 GHz again. This does not happen with my GTX750 tho so it would seem maybe only the 900 series? I don't have a 950 to test tho.Anyone folding with a 950?
 
OK,

I'm done with part time folders pushing me out of the daily top 10.

Time to bring the heat......

http://www.newegg.com/Product/Product.aspx?Item=N82E16814127911

I have one behind my recliner (for my son's rig which folds part time in the winter) and one for my year round rig.

On the other hand, I think I have to add my GTX970 to the "died in the line of duty" thread. I'll be talking with MSI tomorrow to see if I can RMA.

Too bad about the 970 I envy those Ti's though see you back in the top 10 soon

I already did that. That's what I was talking about doing. :D

Remember when you told me to go -100 on both cards and then slowly start adding +10 until I get the error? Well even at -100 I still get an error on one of the cards. So I went down to -200. Just now getting on the PC to check and I am STILL getting errors on that same card. So maybe I need to go -300 on it?

Are you under clocking the memory in P2 settings or are you using the P0 settings to under clock it.
 
Actually this has affected my 960 since I got it earlier this year. Not just core 21's either. Core boosts to 1455 while folding but memory only 6 Ghz. Couldn't OC memory in Afterburner either. It mystified me for quite awhile, couldn't understand why Afterburner mem OC wouldn't take. I found out by fluke one day by shutting off GPU folding before I wanted to play Witcher 3 then using Afterburner the memory OC would take and I happily played away at 8GHz but as soon as I exited game and resumed folding the mem clocks were 6 GHz again. This does not happen with my GTX750 tho so it would seem maybe only the 900 series? I don't have a 950 to test tho.Anyone folding with a 950?

This explains a lot.

I have had issues OCing memory and the setting not doing anything. I think it was on my 970, but possibly on a 960. I have generally not fiddled with memory clock as older experience has said that it has little affect on FAH performance. So I just gave up. Now I have to wonder....
 
Too bad about the 970 I envy those Ti's though see you back in the top 10 soon



Are you under clocking the memory in P2 settings or are you using the P0 settings to under clock it.

I am using the P2 settings in the drop down menu on both cards. Just got home from work and I am seeing more Bad State errors on both cards this time. Memory was already -300 on both of them. I am now at -400 on both of them. :(

Temps are sitting around 49c and 56c right now.

edit
18:21:37:WU01:FS02:Upload complete
18:21:37:WU01:FS02:Server responded WORK_ACK (400)
18:21:37:WU01:FS02:Final credit estimate, 63026.00 points
18:21:38:WU01:FS02:Cleaning up
18:24:57:WU02:FS02:0x18:Bad State detected... attempting to resume from last good checkpoint
18:26:26:WU02:FS02:0x18:Completed 160000 out of 16000000 steps (1%)
18:30:19:WU02:FS02:0x18:Completed 320000 out of 16000000 steps (2%)
18:34:13:WU02:FS02:0x18:Completed 480000 out of 16000000 steps (3%)

Weird that it happened before the first percent was completed. No more happened from 1% through 100%.
 
-400 on both cards and I still get the occasional bad state error. I give up.
 
Not that I'm even 1% of the bump- but my main is now a 3930 OC to 4.1, and I have broken into the top 200. High water for a dude that builds out of spare / secondhand parts.
 
Top 5 requires 7 digits now.
 
Last edited by a moderator:
Grrrrrrr........ Bad State Hell, this is killing my GTX980Tis.

Even my GTX960s are having issues, though much more occasionally, and not usually too detrimental to points.

I am really confused as how the drivers even do this. The GPU clocks are at P3, but the ram is at P2 clocks. Though this is probably what causes the issue in the first place. Backing off the P2 Ram clocks right now, maybe this will be better. Has anyone ever tried cranking the P2 clocks up to P3 speeds? I would give it a try, but I won't have a day off to fiddle with this until Friday. I've been working 72 hours a week, the pay is nice (it helped me pay for those 980Tis) but I just don't have free time to experiment.

I also have seen some core 18 WUs that do not fully utilize the 980Tis.... they run at 85% of GPU loading. Not really sure what the cause of this one. I have seen this behavior from project 9412 and 9413.

p.s. it is hard to believe that I am complaining about folding when I am cranking out over a million points a day. I never would have seen that coming.
 
Last edited:
Decrease the clocks on P2 using Nvidia Inspector. I have two of my 970s at -400 Mhz now. I still get bad state, but it's no where near as often anymore. My PPD has gone way up.
 
Reading the whole post > me today apparently.

I have not tried to clock them higher, but I did start out at -100, then -200, then -300 and now I am at -400. It only seems to happen occasionally, always on core 21 and usually at the very beginning.

I've got some more stuff coming in soon, too. Aiming for the #4 spot.
 
there was some ninja editing, but discussing reducing the clocks was in there......

no worries, I've been guilty of it too.
 
As for progress we seem to have a new #1 PPD member. who is Grandma? any relation to Grandpa? ;) Welcome to the [H]orde!
 
I seem to be having an issue with one of my boxen. The points don't seem to be getting counted. I just noticed the time wasn't correct so I updated it. Would that have an effect on anything?

Also how can I tell when points are accounted for on F@H's side? I mean, I can see where the log estimates the points and I can see on OCN when new points are added every 3 hours which is how I am coming to the conclusion that points don't seem to be adding up correctly.

According to FAH Control where I have all my boxen linked to my main computer I should be over 1.2M PPD and even sometimes I hit 1.3M and 1.4M PPD. However, I am barely at 1M PPD. Surely that thing is fairly accurate to within a few thousand points. Before adding my new boxen it was very accurate. Now it's almost as if I've never added them.
 
Last edited by a moderator:
I seem to be having an issue with one of my boxen. The points don't seem to be getting counted. I just noticed the time wasn't correct so I updated it. Would that have an effect on anything?

Also how can I tell when points are accounted for on F@H's side? I mean, I can see where the log estimates the points and I can see on OCN when new points are added every 3 hours which is how I am coming to the conclusion that points don't seem to be adding up correctly.

According to FAH Control where I have all my boxen linked to my main computer I should be over 1.2M PPD and even sometimes I hit 1.3M and 1.4M PPD. However, I am barely at 1M PPD. Surely that thing is fairly accurate to within a few thousand points. Before adding my new boxen it was very accurate. Now it's almost as if I've never added them.

#1 Are all of your boxen on Windows or Linux and have you been checking your logs for Bad States Detected and Failed WU's or slowdowns

#2 Fah v7 estimate has been known to be inaccurate do you use HFM ?

#3 v7 downloads the next WU by default before the previous WU finishes if it is 3 or 4 min. before it starts that can cause a pretty big variation between estimated and actual PPD a 4 min delay would add 2.4 sec to your estimated frame time.

My guess is you are getting some bad states on some of the recently released core 21 WU's that can be fixed if you are running Windows by lowering the P2 memory speed with Nvidia Inspector, but I still have not figured out a way to fix it on Linux
 
#1 Are all of your boxen on Windows or Linux and have you been checking your logs for Bad States Detected and Failed WU's or slowdowns

They're all Windows 7 Pro.

#2 Fah v7 estimate has been known to be inaccurate do you use HFM ?

I do not use HFM. However, my PPD has not seemed to increase much since I've added two more 970 GPUs to my farm. With just two 970s and a 980 I was getting around 800k - 1M PPD on it. Now that I've added two more 970s to my farm I am still only getting around 900k PPD.

#3 v7 downloads the next WU by default before the previous WU finishes if it is 3 or 4 min. before it starts that can cause a pretty big variation between estimated and actual PPD a 4 min delay would add 2.4 sec to your estimated frame time.

A ~300k+ PPD variation though?

My guess is you are getting some bad states on some of the recently released core 21 WU's that can be fixed if you are running Windows by lowering the P2 memory speed with Nvidia Inspector, but I still have not figured out a way to fix it on Linux

Searched through the log and no bad states. I already went through and lowered the P2 memory state on those two new 970s when I first set them all up. Went -400 just to be sure this time.


edit
Got HFM.net installed and gonna see what PPD it gives me. Right now it's saying 372k PPD, but it's only showing a PPD > 0 on one slot the 980. Guess the others need time to generate an estimate or something.

edit 2
Seems to not show PPD because they're showing up as unknown work units.
 
Last edited by a moderator:
Changed the URL. Log for HFM shows it downloaded a lot of projects, but none of the projects it downloaded are projects my GPUs have. I am still missing

P9413
P10495
P9206
P9704
P9413

Can't find those projects in the ProjectInfo.tab file either.

edit
Here is a portion of my log
Code:
[11/12/2015-4:27:29 PM] - (FAH1 Slot 01) Slot Status: Running
[11/12/2015-4:27:29 PM] - (FAH1) Retrieval finished in 16 ms
[11/12/2015-4:27:42 PM] - (NAS Slot 00) Slot Status: Running
[11/12/2015-4:27:42 PM] - (NAS) Retrieval finished in 12 ms
[11/12/2015-4:28:30 PM] - Downloading new project data from Stanford...
[11/12/2015-4:28:31 PM] - Project ID '9206' not found.
[11/12/2015-4:28:31 PM] X Input string was not in a correct format.
[11/12/2015-4:28:31 PM] X System.FormatException: Input string was not in a correct format.
   at System.Number.ParseDouble(String value, NumberStyles options, NumberFormatInfo numfmt)
   at HFM.Proteins.HtmlSerializer.ParseProteins(String html)
   at HFM.Proteins.HtmlSerializer.Deserialize(String fileName)
   at HFM.Core.ProteinDictionary.Get(Int32 projectId, Boolean allowRefresh)
[11/12/2015-4:28:31 PM] - (Localhost Slot 01) Slot Status: Running
[11/12/2015-4:28:31 PM] X Input string was not in a correct format.
[11/12/2015-4:28:31 PM] X System.FormatException: Input string was not in a correct format.
   at System.Number.ParseDouble(String value, NumberStyles options, NumberFormatInfo numfmt)
   at HFM.Proteins.HtmlSerializer.ParseProteins(String html)
   at HFM.Proteins.HtmlSerializer.Deserialize(String fileName)
   at HFM.Core.ProteinDictionary.Load(String fileName)
   at HFM.Forms.MainPresenter.ToolsDownloadProjectsClick()
[11/12/2015-4:28:32 PM] - Project ID '9704' not found.
[11/12/2015-4:28:32 PM] X Input string was not in a correct format.
[11/12/2015-4:28:32 PM] X System.FormatException: Input string was not in a correct format.
   at System.Number.ParseDouble(String value, NumberStyles options, NumberFormatInfo numfmt)
   at HFM.Proteins.HtmlSerializer.ParseProteins(String html)
   at HFM.Proteins.HtmlSerializer.Deserialize(String fileName)
   at HFM.Core.ProteinDictionary.Get(Int32 projectId, Boolean allowRefresh)
[11/12/2015-4:28:32 PM] - (Localhost Slot 02) Slot Status: Running
[11/12/2015-4:28:32 PM] - (Localhost) Retrieval finished in 530 ms
[11/12/2015-4:28:32 PM] - Project ID '10495' not found.
[11/12/2015-4:28:32 PM] X Input string was not in a correct format.
[11/12/2015-4:28:32 PM] X System.FormatException: Input string was not in a correct format.
   at System.Number.ParseDouble(String value, NumberStyles options, NumberFormatInfo numfmt)
   at HFM.Proteins.HtmlSerializer.ParseProteins(String html)
   at HFM.Proteins.HtmlSerializer.Deserialize(String fileName)
   at HFM.Core.ProteinDictionary.Get(Int32 projectId, Boolean allowRefresh)
[11/12/2015-4:28:32 PM] - (HTPC Slot 01) Slot Status: Running
[11/12/2015-4:28:32 PM] - (HTPC) Retrieval finished in 24 ms
 
Further research shows that this points to the same psummary that Grandpa linked to...... sorry to waste time.

Nathan -

A large variety of more "modern" gpu projects are on a super secret psummary. I will dig for a link if I still have one. I ultimately gave up and just stopped using HFM as I have been only folding on one box and or only folding v7 GPU so I can just have the client measure all the other clients I might be running.

John

Edit:

http://assign.stanford.edu/api/project/summary

http://assign2.stanford.edu/api/project/summary

Article is here https://folding.stanford.edu/home/new-psummary-page/
 
Last edited:
Did you use the link I gave you all of those are in there they should have downloaded. You have to close HFM and then open it back up after updating it.
 
jfb9301 , tried those links with no succcess.

Grandpa_01, I did change the URL and restarted the application. It did not parse those projects though. They're still missing.

I downloaded a modified ProjectInfo.tab that has a LOT more projects than what mine did, but it too didn't contain and of the current projects my GPUs are working on.

Grandpa_01, can you send me your ProjectInfo.tab if it's complete?
 
Back
Top