Study Finds that 1/3 of Data Center Servers are Comatose

Terry Olaes

I Used to be the [H] News Guy
Joined
Nov 27, 2006
Messages
4,646
A study conducted by Stanford found that 30% of physical servers in data centers are comatose, meaning that they are using energy but not delivering any useful information. This number is the same as a previous study conducted in 2008. I know many of our readers work in or around data centers so do you agree with this research?

The researchers used data collected by TSO Logic, an energy efficiency software vendor, from nearly 4,000 physical servers in customer data centers. A server is considered comatose if it hasn't done anything for at least six months.

The high number of such servers "is a massive indictment of how data centers are managed and operated," said Koomey. "It's not a technical issue as much as a management issue."
 
Is it possible, if this statistic is true (which the article questions), that this could be a result of changes in data retention rules ... there are a lot of financial, environmental, and other regulatory rules that require companies to hold data for up to a decade ... if they were to archive really old data on these servers they might only access them annually to delete one year and add a new years data
 
I would think 15-20% is probably closer to reality world wide. I suppose it also depends on the definition of "unused". Just think of gaming alone. There's all kinds of empty servers out there for games that are 5-15 years old that remain running because people want to have it available just in case they have an itch to play that particular game. UT series games being an example I'm familiar with.

Then to the point of data retention. I'm sure there's at least several thousand servers spinning away just in case someone needs access to some archived data.
 
Just think of gaming alone. There's all kinds of empty servers out there for games that are 5-15 years old that remain running because people want to have it available just in case they have an itch to play that particular game. UT series games being an example I'm familiar with.
The article is talking about physical servers, not virtual servers. I highly doubt there are E5s running solo as UT servers.
 
I would say this is probably true. From my experience with my company being bought out and numerous others, the planning is very poor in integrating systems. Because of this there is a ton of redundant systems spun up across the US basically doing nothing but burning energy, except the occasional legacy user hitting the resource.
 
Is it possible, if this statistic is true (which the article questions), that this could be a result of changes in data retention rules ... there are a lot of financial, environmental, and other regulatory rules that require companies to hold data for up to a decade ... if they were to archive really old data on these servers they might only access them annually to delete one year and add a new years data

I'd think you could just back that stuff up to tape (assuming you have no need for it internally.
 
I'd think you could just back that stuff up to tape (assuming you have no need for it internally.

Just a guess ... given the risks of not having the data if it is required they might get some liability protection by having a third party store it
 
The article is talking about physical servers, not virtual servers. I highly doubt there are E5s running solo as UT servers.

Where do you think those virtual servers run? I know for a fact that a server box a friend built specifically for UT was sitting in a data center in TX as of a year ago. It's not an imaginary machine :confused: They did run some other stuff on it at times, but I think it sat mostly idle for a long time. I'm sure companies like Valve and others maintain a certain amount of servers that sit idle just to maintain a presence for certain games. Combine that with private servers created to host game servers, for people with the money to maintain their status, that just sit there most of the time.
 
Where do you think those virtual servers run? I know for a fact that a server box a friend built specifically for UT was sitting in a data center in TX as of a year ago. It's not an imaginary machine :confused: They did run some other stuff on it at times, but I think it sat mostly idle for a long time. I'm sure companies like Valve and others maintain a certain amount of servers that sit idle just to maintain a presence for certain games. Combine that with private servers created to host game servers, for people with the money to maintain their status, that just sit there most of the time.

I wonder if this includes servers that are for High Availability (HA). For the most part, you've got a server that's not doing anything unless something goes wrong. It could be virtualized, but if the main server isn't and it's highly utilized, I'd think you'd want the other side to be identical, so that you can switch over without having to worry about the backup being underpowered.

Not my area, but that's how I think it works.
 
I can't quote exact statistics, but I know every place I've worked has had some zombie servers. There's always those servers that nobody is really sure if anyone is still using them, but your afraid to shut them down for fear of killing some critical little process that nobody remembers. Sure as you take one of them down, you find out it had some service running on it and you've got someone screaming that their reports arent working. I know, I've seen me do it.
 
I can't quote exact statistics, but I know every place I've worked has had some zombie servers. There's always those servers that nobody is really sure if anyone is still using them, but your afraid to shut them down for fear of killing some critical little process that nobody remembers. Sure as you take one of them down, you find out it had some service running on it and you've got someone screaming that their reports arent working. I know, I've seen me do it.

We've had that. It's not that we're not notified. We just didn't know we had a process on it that hadn't been migrated to a newer server. Sometimes they have to bring it back up (temporarily) but even then, we're probably just moving the process to another server within a few days or a week.
 
I wonder if this includes servers that are for High Availability (HA). For the most part, you've got a server that's not doing anything unless something goes wrong. It could be virtualized, but if the main server isn't and it's highly utilized, I'd think you'd want the other side to be identical, so that you can switch over without having to worry about the backup being underpowered.

Not my area, but that's how I think it works.

Yeah good point, there are mission critical applications that have a duplicate sitting there just in case. I think in recent years, the big companies anyway, have gone to setups where mission critical functions are distributed to several machines. When one goes down the load is spread to the others.

In the end, once you start putting together all the functions servers have done over the last 20+ years and all different entities that are responsible for them. I wonder how many end up being used by "hackers" to run various things because nobody is paying attention and maintaining their security.
 
I'd say easily half of the machines in my lab are idle at any given work day. However, I need to keep them on because of the raid controller BBUs. If you let a lithium ion battery run out of power and sit drained for more than a week, it really hurts their lifetime. In a month, they're likely unusable. At $80-400 each, they're expensive to replace.
 
I can't quote exact statistics, but I know every place I've worked has had some zombie servers. There's always those servers that nobody is really sure if anyone is still using them, but your afraid to shut them down for fear of killing some critical little process that nobody remembers. Sure as you take one of them down, you find out it had some service running on it and you've got someone screaming that their reports arent working. I know, I've seen me do it.

I do my best to keep track of this kind of thing in my lab. I've migrated 27 old, old servers over to small VMs on my infrastructure cluster just to get the old machines out of my lab. I still have a couple old license servers that have nothing to do except confirm that a license exists for a compiler, but it is really difficult to replace an IRIX machine with a VM. I just wish we'd stop supporting that horrible OS.
 
I do my best to keep track of this kind of thing in my lab. I've migrated 27 old, old servers over to small VMs on my infrastructure cluster just to get the old machines out of my lab. I still have a couple old license servers that have nothing to do except confirm that a license exists for a compiler, but it is really difficult to replace an IRIX machine with a VM. I just wish we'd stop supporting that horrible OS.

Reminds me of a stop flow machine we had that ran os2 warp and saved to zip drives...
 
If they didn't include redundancy machines in this figure then I'd find it shocking.
 
Not at my job. If a server isnt being used for something useful, get rid of it. Less work for me to maintain.

Some times I will just turn off a server to see of anyone notices, if nobody says anything in 3 months I get rid of it.
 
Most companies need high-uptime on their systems, so N+1,or 2 scenarios are pretty common. I run quite a few CiscoUCS/VMware systems, and our utilization on some of the clusters is pretty low overall, but we need extra capacity to account for HA or failover scenarios, so we're not running in an emergency situation if 1 host or piece of hardware fails.

33% is a pretty high idle percentage, though. One other thing to factor in is that some servers only exist to do things for monthly reporting, quarterly analysis, etc. I know with a lot of higher end Oracle applications, there are certain application roles that don't do much most of the time, but periodically hit high utilization levels once or twice a month. There are also a lot of Test/Dev/Stage environments that only exist to test software before rolling to production, so they don't see customer load, only developer or internal test loading.
 
might be close to 20% at my job...between legacy systems from acquisitions, demo/POC boxes, failed products and services that are rarely if ever used I'm sure there is a bit of fat to trim (although not all of those are physical). Add in hot spares/failovers if you want to call it an even 30 I guess.
 
Hell, the biggest one I've seen is Point of Sale systems that have been deprecated, but nobody could figure out how to get usable data out of the system so they were left online until somebody figured something out, or they aged into the new system.
 
Not really that big of a deal with the newer servers.

The server's I have bought over the past couple years (usually dual 6 core Xeons), draw much less power than the old server they replaced, especially at idle. The run time on the UPS's are at lease twice as long, even though the new servers have much more CPU power, memory and drives.
 
The last old DC I worked at was full of zombiefied equipment - mostly created when entire teams were laid off/outsourced and their special projects were never shut down. The company was fond of doing zero-notice, escort-you-out-of-the-building layoffs. This went on for years and years. At one point there were thousands of people working onsite there - down to fewer than a hundread last I heard. Management were non-techie people; they had no idea what their teams were doing half the time. Nobody documented anything. It was insane. I wouldn't be surprised if the percentage there was much higher.
 
8 years ago? More like 50%. Nowadays... 10-20% is likely. Big companies have been on the virtualization track for a little while now and eliminating stuff like this. Last big place I worked at took it a step further : if a machine measured xyz cpu percentage over xyz time it was considered to be a "zombie" and was decommissioned. They would only power it back up if someone complained about it, then they would get bitched at why it wasn't being used/accounted for? Then if the conversation continued it would turn into 'Hey we'll give you a FREE VM server instead!" 99% of departments tend to agree with free.
 
Back
Top