Disaster strikes...

musky

[H]ard|DCer of the Year 2012
Joined
Dec 14, 2009
Messages
3,154
I woke up this morning and checked HFM...and it showed all 7 machines as hung. My first thought was a network problem...they can't all be off line, can they? Turns out, yes they can. We must have had a brief power outage last night - the first one in over a year. It was more of a flicker, because all of the machines had just rebooted. It looks like everthing was down for about 3 1/2 hours. The good news is that everything is up and running again now, and that all of the WUs resumed without issue. So what does 3 1/2 hours of downtime cost my farm? That is a tough question to answer - I would estimate somewhere in the 125K point range when you figure in lost processing time and lost bonuses.

Before someone says it, think about what it would cost to put ~2.5 kW behind a UPS...I can live with the downtime...
 
It be about 650. Wouldn't give ya a lot of time but would sure keep ya from things like this. Also protect the hardware.
 
A 900W cyber power UPS goes for $150 per unit so ~450 with a 250 W buffer.
 
I have my systems on 2- APC 1500 VA, 1-Cyberpower 1350VA, and 1- APC 1200VA.

Yeah it costs money and saves you from the little outages. But earlier this month when they did they rolling blackouts I just had to shut them off, so I feel your pain :)
 
3.5hours lost folding, a disaster? lol

I wouldn't say USP's are needed, a better solution is to just have the servers boot up and resume work when power is restored.

If you really do need the up time then USP's will help, together redundant network connection to the web (simply using a 3G dongle would work). At least RAID1... redundant Fans, PSU, ECC with mirrored dimms. Oh and don't overclock - that should help your uptime ;)
 
noob. wtf. put those things behind a UPS :rolleyes:

seriously though, when I saw the word "disaster", I was thinking you had a power spike and lost a couple SR2s or something. I call 3.5 hours of downtime a regular Monday :)
 
Each of my boxen are on a UPS too. It was a challenge when GPU folding, but with SMP the only limitation is finding enough free outlets in my condo.
 
Goodness man, you scared us! The minute I read the thread title, I thought that something terrible had happened and not a simple power outage. I do find it amazing how you didn't lose a single work unit though.
 
Each of my boxen are on a UPS too.

Same here....I put APC 1200's on just about everything I've got. It's not as bad these days, but a few years ago the power company was pretty flakey around here, especially in the Winter. Better safe than sarry, I say.
 
Can't you just make the folding program auto run or run as a service then it will start when the power comes back on.
 
Can't you just make the folding program auto run or run as a service then it will start when the power comes back on.

Try a shortcut placed in the startup folder on the windows start menu, it certainly works for GPU so it should work for SMP clients as well
 
Try a shortcut placed in the startup folder on the windows start menu, it certainly works for GPU so it should work for SMP clients as well
I have my clients set up this way, but I have to log in with my password first. So if my machines reboot, my uptime is hosed.



 
You can use task scheduler to run a program at startup (even before you have logged on, and with any credentials you like).
 
I was implying a bit of sarcasm when I posted this as a "disaster" - didn't mean to scare anyone. I have never had them all go down at the same time before.

UPS's are not going to happen - it is not that important to me that these machines have 100% uptime. They are all behind surge protectors, which is good enough for me.

I have been meaning to set up some sort of autostart system anyway, mainly to prevent me from forgetting to start Langouste and to guard against the occasional reboot I see anyway. Looks like a good reason to do it sooner rather than later.
 
Running as a service will work for sure. I have all my clients setup that way.
 
Some thoughts -

  • you don't get to be number one producer without the attitude that 3.5 hours downtime is absolutely catastrophic. :p
  • UPS for folding? - I suppose if you have flaky power - can't see the benefit for myself. How much extra power draw do they use? Is it a fixed kwh, or does it scale with use?
  • Autostart? I have done it from time to time, but gave up, as it always bites me. Not good when tweaking an SR2 continuously. I quit because of the number of times I forget to remove it when going back to 3D rendering for a while. Reboot for software and whoops, got a new unit that takes 19 hours to clear. Doh.

But glad to hear the farm is ok Musky.
 
I was implying a bit of sarcasm when I posted this as a "disaster" - didn't mean to scare anyone. I have never had them all go down at the same time before.

UPS's are not going to happen - it is not that important to me that these machines have 100% uptime. They are all behind surge protectors, which is good enough for me.

I have been meaning to set up some sort of autostart system anyway, mainly to prevent me from forgetting to start Langouste and to guard against the occasional reboot I see anyway. Looks like a good reason to do it sooner rather than later.

I wasn't worried :p
and I am with worthy... not had good luck with autostarts...
 
I run UPS on my pathetic farm because I fear BROWN outs. I have lost more gear to brownouts than anything else. Surge protectors don't help you there.

On separate but related topic I also put in network surge protectors before the systems to prevent strikes through enet.
 
I run UPS on my pathetic farm because I fear BROWN outs. I have lost more gear to brownouts than anything else. Surge protectors don't help you there.

On separate but related topic I also put in network surge protectors before the systems to prevent strikes through enet.

I run ups on my main rigs at home and surge on network...

we used to get brownouts on most stormy nights...

that said... I don't notice the lights dimming anymore... but my ups will often beep and supplement voltage during storms...
 
Back
Top