Fall of a Titan

chileman · Jul 15, 2005

Some parts that worried me:

"are going slowly"

"we've been working on for nearly 2 years"

Sounds like to me that they really need to prioritize their projects over there. They need to figure out what will help the community the most and get that done. And it sounds like it's the memory issue that most people are worried about. If it took another 6 months I bet a great deal of machines would be lost from the project. For their sake I hope they clear it over soon, I wish he could have given a time frame for a fix.

[Spectre] · Jul 15, 2005

p[H]ant0m said:
But he does offer a suggestion, shut off -advmethods. At least I was under the impression, if you don't want problems, don't run that. (And I'm not trying to knock anyone)

Err...read OC-X's reply...he again makes a valid point. That and the requirements are getting burdensome for people who don't build boxes specifcally for folding and that is NOT how the program is advertised. It is suppsoed to run i nthe background without interefering.

Not saying I am going to stop folding, well rather start folding again when i can get my borg's backup after rebuilding, but there are issues that need to be addressed if a exodus is to be avoided as the requirements skyrocket.

I am not trying to rag on anyone either just pointing out the issues in a constructive manner.

Shadowchild · Jul 15, 2005

Well, what seem like the most pressing issues to me, in order of priority, are:

1. Memory problem.
2. A true SMP client.
3. Everything else.

And that's the way it goes. It doesn't matter which one has the most potential to help the community, like the GPU project, it's what affects the most people, and the memory problem is first on that list, and then getting a true SMP client out. Both of those affect the largest majority of F@H users. Hell, combined you've got pretty much all F@H users. But, one thing I learned when I was in college, is that a researcher is always more willing to go and develop something really cool than to stick around and fix the problems in the last development. And that's why they're just that, researchers. Once the grounds laid for the work, they need other people to flesh it out nad make it work, which in this case seems to be taking a bit too long to happen.

marty9876 · Jul 15, 2005

Well just to be my usual dorkish self, I'm trying to do the opposite of everyone else. I'm trying to get QMD's around here.... Same boxen that I was flooded with them before can't get a one, looks like they upped the RAM requirments in order to get them 512 Meg RAM/Onboard Video with bios video settings set to lowest will not get one. 1 Gig RAM in same boxen grabs them.

Just an reverse example: I know QMD's fold on my 512 RAM systems quite well. However there is nobody using them at all (being burnt in) so load those puppies up and suck every last resource you can out of it. Yea, more points bascially.

For myself, I'd like to see a more configurable client. Give the end users (us) a chance to define more of what type resources we want to donate. I like OC-AMD's words on this- "but not donating my CPU's until Stanford decides they want to get the most from what we are giving them." Help us with a client that we can tailor to each machine for our own ratio of production vs. preformance hit.

Edit: And WU assigment logic seems flawed. I've seen a P2 450/512 RAM box get a 600 pointer WU and a fleet of P4 2.66/512 get timeless WU's. Maybe the timeless needed to be done before the 600 pointer, just does not seem right.

Gotta admit, SMP client sounds cool. Now all I need is a SMP machine.

Fan_Atic · Jul 15, 2005

We've never posted here, but currently we fold for the Tech Report and average about #7 in the world with our 24 hour production totals if you exclude anonymous and default. As it is, we have already scaled back the number of systems running folding because of the issues with QMDs and not having the ability to limit the amount of memory per work unit. We used to average around 20k a day and are now around the 16k mark.

We have decided to take a stand along with OC-AMD and remove folding from all our machines. Please understand how different folding in a corporate environment is as opposed to a home environment. We push the client out with Group Policy and will remove it the same way. We are not in the business of babysitting individual machines for an application that is considered extraneous at best. What is frustrating is that we've never had to baby-sit our machines until the QMDs came out. So, instead of babysitting them, we simply remove the client altogether.

We understand that we can not run -adv, but as has already been stated, one isn't exactly putting racks of dual hyperthreaded Xeons to good use. This is all further compounded by our heavy investment in technology that is not of benefit these days for when the 600 point work units existed, we started investing heavily in Athlon 64 and Opteron based systems.

So, as of this evening, we will start taking machines offline. Hopefully with two and potentially three of the top 10 out of the picture, the Pande Group will realize that there is a real problem with this and give users some choice.

[Spectre] · Jul 15, 2005

Fan Atic what name do you fold under?

Edit: Nevermind figured it out.

marty9876 · Jul 15, 2005

Wow. Fan_Atic posted this same message at the offical forums also.

From what I've read, How?? is right on the doorstep of stopping too. That's ~+200,000 Points per day loss to Stanford, basically 1/2 our entire team in 24 hours.

I keep asking myself if this thread title was fitting, wild just wild.

[Spectre] · Jul 15, 2005

marty9876 said:
Wow. Fan_Atic posted this same message at the offical forums also.

From what I've read, How?? is right on the doorstep of stopping too. That's ~+200,000 Points per day loss to Stanford, basically 1/2 our entire team in 24 hours.

Yeah it was my understanding that How???? may be leaving as well:
http://www.hardforum.com/showpost.php?p=1027951914&postcount=8

I keep asking myself if this thread title was fitting, wild just wild.

Should be Titans?

Strikemaster · Jul 15, 2005

Eye-opening thread, explains a LOT of what I'm seeing in the office borgs.

I have a 3ghz P4-HT, 1 gig RAM server box that should be flying along, but is dog-assed slow. I took a look at the process list, to find that the SQL server and QMD were *together* eating all available RAM. This box is in HEAVY swap, which for even a test server is utterly unacceptable.

Monday, I reconfigure them for Tinker / Gromacs, which is a waste of 5 ghz of Folding P4s. This keeps up, I have to bow to the users and rip it out.

Ronbo · Jul 15, 2005

for whoever asked a page back I'm running the console client, and none of the net stop/net start batch files I've tried will properly shutdownrestart the client. I built the Asus A8N-SLI/A64 [email protected]/Nvidia6800GT/1gig of corsair 3200xlpro to play the latest and greatest games on at high speed, and it seems that F@H is turning my ultrafast gamer into a laggy slowloading dawg. I'm about to turn off the client and play BF2, it may be the last time it runs on this PC.

I had real issues with the graphical client dropping WU's, changing to the console stopped that. I still have 4 boxen dedicated to folding, and more that I could put online, but most have 128 meg or less so it's timeless for them.

marty9876 · Jul 16, 2005

Just more interesting reading. Hey, at least we found another forum for Spectre to get in trouble in and Kodiak Star banned from

OC-AMD Quote:

Hey guys,

They have taken notice... as has the rest of the folding world. But that wasn't the reason I pulled my machines... My intent isn't to hit Stanford where it hurts, that just hurts the project... but the issue of memory and why I stopped was known about months ago... Stanford has a fix, but didn't release it... yet they continued to push out bigger and bigger WU's.

In order for me to change my configuration, I physically have to touch roughly 600 to 750 scheduled tasks, edit the command line for each task to remove flags and re-authenticate the username/password for the task. All my machines are installed 1 by 1 and none run as a service.

If something such as a new client came out, or a change that limited the memory intensive WU in the client, I can push the client and a client.cfg out to the machines in with an hour or two and the next time the tasks ran, it would pick up the changes.

My stopping my farm is two fold... I personally think Stanford needs to do a better job of managing their project priorities along with their customer support... without the end users, they have no project. If they don't care enough to make a minor client revision months ago when we raised the issue, I simply don't care to run the client until it's fixed. Memory, unlike CPU utilization doesn't back down if a folding core is using it.

I also can't see the reason to take 50 to 80 dual Xeon's and let them run regular WU's like Stanford is recommending... they want WU's back fast, ask people not to use Hyper Threading (Which I agree with) make these high end WU's that need SSE2/SSE3 (which Xeon/P4's support) but then cripple them by send units that either flood the bus (QMD units) or can't read the number of CPU's vs. the amount of installed memory.

They are working on an SMP client with better memory management, but it's still a ways off.

I'm not leaving the folding community, I still will moderate over at folding-community.org, help people out if they ask for it... and when Stanford has a new client out, I'll be the first to beta test it for them.

Cheers!

-Jim (OC-AMD)

http://www.techreport.com/forums/viewtopic.php?p=417521#417521

gnewbury · Jul 16, 2005

Damn - Every time there is a major brooha with Stanford it seems I'm on travel.
OC-AMD is a true resource. However I believe that if one is going to run beta/advmethods one should expect problems. I've several duallies that I cannot run big WU on because they suck up all the memory, and they have 2GB of RAM.
The original concept behind this project was to uses spare CPU cycles. Unfortunately Stanford has not programmed it to only use spare RAM.
The memory problem has been a major thorn in my side for years.
The solution would seem to be that they would write the software so that the memory was released when the program backed down. Swap it to hard drive, bring it back when the program comes back to full life.
I'd rather run TT's on workstations and suck up spare cycles than run Big WU and have to quit folding entirely.

Mattman · Jul 16, 2005

The short term solution to this problem is really simple - turn off large WU's if your machine can't handle it. I only run large WU's on machines that are stout enough and barely used, all others just run it the old fashioned way. It's clear that Vijay knows of our concerns, so if he wants more large WU's done, there's some incentive to get the program updated if you just turn them off. The only reason I can think of that people wouldn't want to do that is because of the points they'd be losing in the competition. Someone please tell me there's another reason.

BakedON · Jul 16, 2005

Mattman said:
The short term solution to this problem is really simple - turn off large WU's if your machine can't handle it. I only run large WU's on machines that are stout enough and barely used, all others just run it the old fashioned way. It's clear that Vijay knows of our concerns, so if he wants more large WU's done, there's some incentive to get the program updated if you just turn them off. The only reason I can think of that people wouldn't want to do that is because of the points they'd be losing in the competition. Someone please tell me there's another reason.

Some of what I'm hearing is that it's difficult for the big corporate farmers to touch 200+ machines by hand to turn off big units.... and there's a little work involved for them beyond that as well.
I'm a small corp farmer with currently only 40 CPU's up and folding..... making a config change on all my boxen isn't that big a deal to me.
There's more than that stated here but this was the issue that glared at me the most.
Oh well.... I hate my users anyhow.... FULL FOLD AHEAD!!!

Sorry to see some of the great ones cutting back... but I understand.
Gives little farmers like me the chance to step up and pick up the slack for a little while.
I'll add 4 more P4's this week at least.... maybe as many as 10.
If everyone else goes out and borgs 5 more machines for at least the summer the project won't suffer so major a blow.

Carnival Forces · Jul 16, 2005

well i just got back from a month of summer camp where i had no access to the internet, and i am absolutely shocked by what i'm reading. with two (possibly three) of the titans gone, i don't know what Vijay will do...

at this point it doesnt' even matter what helps efficiency the most; what matters now is Vijay keeping the current titans folding.

Sparky · Jul 17, 2005

Spectre said:
Err...read OC-X's reply...he again makes a valid point. That and the requirements are getting burdensome for people who don't build boxes specifcally for folding and that is NOT how the program is advertised. It is suppsoed to run i nthe background without interefering.

Not saying I am going to stop folding, well rather start folding again when i can get my borg's backup after rebuilding, but there are issues that need to be addressed if a exodus is to be avoided as the requirements skyrocket.

I am not trying to rag on anyone either just pointing out the issues in a constructive manner.

It will run in the background without you noticing much if you keep the default settings. When you add the flags for beta/large WU's all bets are off. Any beta tester knows there are sometimes problems.

[Spectre] · Jul 17, 2005

Sparky said:
It will run in the background without you noticing much if you keep the default settings.

Really? Well what about these?
http://forums.overclockers.com.au/showthread.php?t=387523
http://forums.overclockers.com.au/showthread.php?t=376062
http://forums.overclockers.com.au/showthread.php?t=370923
http://forums.overclockers.com.au/showthread.php?t=353384
http://forums.overclockers.com.au/showthread.php?t=340534
http://forums.overclockers.com.au/showthread.php?t=282590

http://www.hardforum.com/showthread.php?t=891500
http://www.hardforum.com/showthread.php?t=856643
http://www.hardforum.com/showthread.php?t=854677
http://www.hardforum.com/showthread.php?t=815605

I call that noticing.

When you add the flags for beta/large WU's all bets are off. Any beta tester knows there are sometimes problems.

Even without those two config's RAM release is an issue as are the rising requirements that are out scaling the rising system specs. Adding to that what happens when people get dual core and the client sees two CPU's but still only a fixed amount of RAM?

You need to step back and see the forest not just the trees.

These are all problems that can be fixed and will be much less painful to fix now rather than latter, which is ultimately good for everyone.

Sparky · Jul 17, 2005

Spectre said:
Really? Well what about these?
http://forums.overclockers.com.au/showthread.php?t=387523
http://forums.overclockers.com.au/showthread.php?t=376062
http://forums.overclockers.com.au/showthread.php?t=370923
http://forums.overclockers.com.au/showthread.php?t=353384
http://forums.overclockers.com.au/showthread.php?t=340534
http://forums.overclockers.com.au/showthread.php?t=282590

http://www.hardforum.com/showthread.php?t=891500
http://www.hardforum.com/showthread.php?t=856643
http://www.hardforum.com/showthread.php?t=854677
http://www.hardforum.com/showthread.php?t=815605

I call that noticing..

I cant read the OCAU forums but the other 4 links you posted doesnt prove anything.
The had graphical clients and switched to console. None of them had console with no flags and had problems.

Sparky · Jul 17, 2005

Even without those two config's RAM release is an issue as are the rising requirements that are out scaling the rising system specs. Adding to that what happens when people get dual core and the client sees two CPU's but still only a fixed amount of RAM?

You need to step back and see the forest not just the trees.

These are all problems that can be fixed and will be much less painful to fix now rather than latter, which is ultimately good for everyone.

You need to go back to debate class.
What does this have to do with anything I posted and you quoted from me?

gnewbury · Jul 17, 2005

Carnival Forces said:
<snip>
at this point it doesnt' even matter what helps efficiency the most; what matters now is Vijay keeping the current titans folding.

What matters most is that maximum resources are applied to things that need to be fixed.
Alzheimers, Mad Cow (BSE), CJD, ALS, Huntington's, Parkinson's disease, Iraq, AIDs, poverty.
I (and perhaps WE) fold to donate cycles to assist in the science. Keeping the "current titans" folding should not be Vijay's priority unless he deems it so. Especially since some of these titans are complaining about folding using "beta" and "advmethods". I'm surprised that Vijay did not tell Jim "sorry - don't run beta" and leave it at that.
When I choose to walk the narrow ledge I am prepared to fall. When I run beta I'm prepared for lost WU, hosed systems, and about everything except the house catching on fire.
It's too bad that Jim and others were disappointed with memory problems.
That's why I fold - my dad had memory problems, I don't want my children to have memory problems.

Carnival Forces · Jul 17, 2005

gnewbury said:
<snip>
What matters most is that maximum resources are applied to things that need to be fixed.
Alzheimers, Mad Cow (BSE), CJD, ALS, Huntington's, Parkinson's disease, Iraq, AIDs, poverty.
<snip>

agreed. and the way to do that is to not engender disagreement with the project and disillusionment with the researchers, especially with the largest folders in the world. Quite frankly, my machines barely matter at all compared to people like OC-AMD. Complaining about a charity pandering to their largest donors is like saying a starving person shouldn't do a jig in order to get a filegt minion every day.

if i were starving i'd start dancing.

it's time Vijay did.

DamienThorn · Jul 17, 2005

gnewbury said:
What matters most is that maximum resources are applied to things that need to be fixed.
Alzheimers, Mad Cow (BSE), CJD, ALS, Huntington's, Parkinson's disease, Iraq, AIDs, poverty.
I (and perhaps WE) fold to donate cycles to assist in the science. Keeping the "current titans" folding should not be Vijay's priority unless he deems it so. Especially since some of these titans are complaining about folding using "beta" and "advmethods". I'm surprised that Vijay did not tell Jim "sorry - don't run beta" and leave it at that.
When I choose to walk the narrow ledge I am prepared to fall. When I run beta I'm prepared for lost WU, hosed systems, and about everything except the house catching on fire.
It's too bad that Jim and others were disappointed with memory problems.
That's why I fold - my dad had memory problems, I don't want my children to have memory problems.

I don't think that it's necessarily that Jim is upset about losing WU; not does he seem upset that he discovered a problem. He recognizes that he's running beta, and that problems crop up. What he's upset about is that Standford, half a year later, has yet to offer any resolution, or real indication that they've been working on it. If you worked for a big tech company, and you had a critical flaw like this, then you'd bust your ass to get it to work. Rather than resolve the problem, Stanford is, seemingly, ignoring it. That, unless I'm mistaken, is what the corp folders are having the most issues with - the core idea behind F@H is that it's supposed to be unobtrusive, and it is become very obtrusive, and Stanford hasn't fixed their code to return F@H to its status of an unobtrusive program.

OC-AMD · Jul 17, 2005

I'm surprised that Vijay did not tell Jim "sorry - don't run beta" and leave it at that

Thats more or less what he did, and why I pulled the machines.. if you read my updated posts at tech report, I've had more issues running your regular run of the mill tinkers then any other WU...

You can see my update here:

http://www.techreport.com/forums/viewtopic.php?p=417920#417920

And Vijays side of the things here:

http://forum.folding-community.org/viewtopic.php?p=106405#106405

Betwen the two, it covers everything pretty well.

Tigerbiten · Jul 17, 2005

This argument remides me of the one that went on about AMD's freezeing/crashing.

First we were told it was "Bad Memory", To much overclock, etc.
Then we were told to run without switches, with switches, etc.
It ended up to be a bad code for one register.

Now I fold for fun.
So I want to run at the max Points per Day possibile to minimize the cost of the [H]ardware & electric bills.
So being told to run without the switches = less Points per Day = less fun.

I can see OC-AMD's veiw point.
He wants to maximize the help to the whole community by running beta & big packets.

As most of my farm is built of duallies this useage of memory is important to me.
200 meg work-units I can just cope with but anything bigger is going to bring my whole farm to its knees.

What I would like is some way to prioritize(sp) what type of work units you want on each boxen.
My Opterons would be Big/ Tinkers/ Small.
My mp's would be Tinkers/ Big/ Small.

luck..........

Shadowchild · Jul 17, 2005

//Posted this in the other titan thread, figured it would do some good here, as well.

And the issue with telling people not to run beta clients is just that. If people don't run them, Stanford never finds out about them 'till it's too late and they hit mainstream, and then they lose support. OC-AMD did what he could, and more, and with his huge number of machines he is basically the ultimate beta tester. He's a single person who understands the core and what's happening, and can see it happening. He's one of the people that can catch AND diagnose the issues quickly. I don't have the experience and knowledge of programming and the way the client works that he does, so I don't run beta WU's. But, if Stanford wants a good program, they really do need to keep people like OC-AMD in the game. Not because he does so much work, but because he finds bugs, and is able to tell exactly which cores cause a problem by himself , rather than needing to compile hundreds of reports from hundreds of different people that may be less knowledgable. It sucks losing OC-AMD for the points, and research not being done, but moreso because his skill can advance the client farther, faster, and better than most other places. Beta's just don't work unless you have a lot of PC's running them, and having one person monitoring them all works quite well, because he can notice trends far faster. Losing him because of trivial issues that should be fixed hurts Stanford more than they realize, IMO, because people like him running it on absolutely massive distributions give the program credit. Honestly, I've never heard of people running, say, Seti@Home, on that many machines themselves. He's more than just a team member, he's a figurehead that people trying to recruit more folders can point to and go, "Look, he's running it on a corporate mainframe, on thousands of machines. If he trusts it that much, and has that much faith in it, it shows something about the project, and what we all believe, and KNOW, that it can do."

OC-AMD · Jul 17, 2005

thanks Shadowchild...

You are right about having so many machines and the beta program... heck, they put it on the front page news when I hit the 5 million point mark.

5/13/2004 Donator breaking the 5 million point mark

Top individual donator OC-AMD has broken the 5 million point mark. He has also been an active beta tester and helped a great deal with several troublesome WUs.

Congratulations and thanks for all of his contributions!

I have lost 10's of thousands of points to beta units, I've never once asked to be credited when they were offering to credit others... I know the risks with the beta program, as I said in my Tech Report article, if I had the memory fix, I wouldn't of got those monster WU's, but can still test other units... if I wasn't running the flags, I wouldn't of gotten them either... which is better overall?

I don't mind running beta units, I know the risks, but when I see a risk coming before it happens...

Fall of a Titan

Gawd

[H] Admin

Limp Gawd

[H]ard|DCer of the Month - February 2006

n00b

[H] Admin

[H]ard|DCer of the Month - February 2006

[H] Admin

[H]ard|Gawd

[H]ard|Gawd

[H]ard|DCer of the Month - February 2006

[H]ard|DCer of the Month - September 2007

[H]ard|Gawd

[H]ard|DCer of the Year - 2006

Supreme [H]ardness

2[H]4U

[H] Admin

2[H]4U

2[H]4U

[H]ard|DCer of the Month - September 2007

Supreme [H]ardness

[H]ard|Gawd

n00b

[H]ard|DCer of the Month - February 2007/January 2

Limp Gawd

n00b