Upgrading our primary storage server

wizdum · Jun 5, 2014

I wasn't really sure if this should go here or in the storage section, so feel free to move it.

I work for a school that operates entirely off of LTSP terminals. We have three schools, a school office, town office, fire station, and maintenance building all using terminals. At the center of this is a single RedHat Enterprise Linux 3 (or maybe 4) server, with a 12 bay SAS JBOD that stores all user files on a NFS share. I know the drives in the JBOD are 250GB SATA drives in a RAID 5 configuration, but thats about it, no one wants to touch it for fear that it will break. It connects to our network with a single gigabit NIC and has been happily humming along for a little over 10 years.

We had a lot of performance issues with the NFS server over the past school year, so I'd like to see if we could replace it with something a little nicer. We upgraded almost all of our terminals to thick clients, which means there are a lot more simultaneous connections to the NFS server now.

Due to our budget, we have been looking into a lot of the off-lease and used servers on ebay like the Dell C2100, HP DL180, and Dell R610. We don't use a lot of space (the current size of /home is under 600GB). I'm thinking that we could see a performance increase with something like the Dell R610 with 6 500GB SSDs in RAID 10. The 4x gigabit ports in the R610 have me wondering if we could do some sort of link aggregation to increase total throughput.

Am I on the right track here?

My primary concerns are:
1. Whether we will lose any performance from dropping from 12x 7200RPM SATA drives in the JBOD to only 6x SSDs.
1.a Or should we go with a new server and a new JBOD? Our existing JBOD has a single SAS 3gig connection.

2. SSD reliability. I have seen equal numbers of people swear that SSDs fail too often in servers as have said that SSDs are the only way to go.

3. Is the 4 port link aggregation feasible/worth it? Or should we be looking into connecting all our servers via 10GBe or fiber?

JTY · Jun 5, 2014

1) The 6x SSDs will be way faster. Several times faster.

2) SSDs are fine in servers, so long as they are server or enterprise grade.

3) Link aggregation works so long as you have multiple hosts accessing the NFS server.

wizdum · Jun 6, 2014

JTY said:
2) enterprise grade.

That sounds expensive. We've been using consumer grade mechanical HDDS (WD Blue's IIRC) "fine" could we get away with something like the Samsung 840 EVO?

For brands, I have heard good things about Samsung, Intel, and Kingston (but only their enterprise SSDs, aparently). Are there any others that I should consider?

mwarps · Jun 6, 2014

People saying that SSDs fail in servers are of two groups:

1) Spreading FUD and have never experienced it or seen it and are regurgitating 'tech' articles from 4+ years ago.

2) Were using SSDs for massive transactional DBs with huge write volume.

Using an SSD for your type of application, even a consumer level group of them, is fine.

diizzy · Jun 6, 2014

Be careful hooking up SSDs to RAID cards,

* Many cards doesn't like SSDs at all
* TRIM might not work
* The RAID controller might be a bottleneck in the end

If you're going consumer use Samsung 840 Pro (non EVO) SSDs or Intel
I would also advice you to look at a ZFS solution, have in mind that RAID5 may actually get slower if you use more 6-7 drives due to latency in a single array.
//Danne

Nate7311 · Jun 6, 2014

I'd also ask if you are stuck on the existing scenario for storage and/or if any estimated budget is set yet? What you current usage of the max capacity is? What the expected growth rates are? Assuming just a file server, might consider a commercially supported NAS device. One that supports SSD caching would fit your speed requirements but fit a larger max capacity in the same budget. Sammy Evo's have been decent drives so far. I've been seeing lots of hype about Intels S3500/3700 series as well, a bit more expensive, but supposedly designed more for server duty.

To answer your original questions:
1. In most cases, even a Single SSD would stomp the performance of your existing array. The ultra-low latency of an SSD compared to elevated latency of a SATA RAID array alone would make up the difference, and in a sequential R/W scenario, they will both come close if not max the SAS1 bandwidth.
1a. See my initial comments.
2. As a file server, I wouldn't worry that much about life expectancy unless you have a TON of users
3. Given that the existing configuration only has a single GbE connection, anything is an upgrade

. That being said, with Link Aggregation, you'll have more overall bandwidth available to the network, but with most protocols, each client will still only get a 1GbE connection.

wizdum · Jun 6, 2014

diizzy said:
Be careful hooking up SSDs to RAID cards,

* Many cards doesn't like SSDs at all
* TRIM might not work
* The RAID controller might be a bottleneck in the end

If you're going consumer use Samsung 840 Pro (non EVO) SSDs or Intel
I would also advice you to look at a ZFS solution, have in mind that RAID5 may actually get slower if you use more 6-7 drives due to latency in a single array.
//Danne

I have been looking at the Dell R610, and the spec sheet specifically mentions SSDs. Does not mention if it has TRIM support though. The servers come with PERC H200, PERC H700, and PERC 6/i RAID controllers.

Our current solution uses RAID 5, I was planning on using RAID 10 with the new server. I am not too familiar with ZFS, I messed with it a bit on a FreeNAS server I had at home, but nothing in production. Do you mean that we should look into ditching hardware RAID entirely and going with ZFS instead?

Thanks for the input so far everyone!

wizdum · Jun 6, 2014

Nate7311 said:
I'd also ask if you are stuck on the existing scenario for storage and/or if any estimated budget is set yet? What you current usage of the max capacity is? What the expected growth rates are? Assuming just a file server, might consider a commercially supported NAS device. One that supports SSD caching would fit your speed requirements but fit a larger max capacity in the same budget. Sammy Evo's have been decent drives so far. I've been seeing lots of hype about Intels S3500/3700 series as well, a bit more expensive, but supposedly designed more for server duty.

To answer your original questions:
1. In most cases, even a Single SSD would stomp the performance of your existing array. The ultra-low latency of an SSD compared to elevated latency of a SATA RAID array alone would make up the difference, and in a sequential R/W scenario, they will both come close if not max the SAS1 bandwidth.
1a. See my initial comments.
2. As a file server, I wouldn't worry that much about life expectancy unless you have a TON of users
3. Given that the existing configuration only has a single GbE connection, anything is an upgrade . That being said, with Link Aggregation, you'll have more overall bandwidth available to the network, but with most protocols, each client will still only get a 1GbE connection.

I am looking into standalone NFS servers as well, but this box also does authentication for the terminals through NIS, which limits our options. I just took a quick look at /home and it appears that we have 2091 user accounts. If it goes down, we're dead in the water. No one in the district can log in. We had a drive in the JBOD die yesterday, and every user noticed it. There was a period of 3 to 5 seconds where no machines were responding.

With the cheap hardware that we have been looking at, the idea is that we can have extras, so if something fails we can just swap it out, rather than buying a really expensive solution and hoping that it fails in a easily recoverable way.

We're fine with only 1gig per machine through link aggregation. Most terminals only have a 10/100 NIC anyway. We just keep running into a problem where a teacher waits until a class is in the room to boot up all 50 terminals simultaneously, and each of those machines has to download its OS image over the same 1 gig NIC in the NFS server.

It would be nice if I could get more than 1gig between the NFS server and my backup server, though.

Nate7311 · Jun 6, 2014

Based on your comment, it sounds like you have a significant single point of failure in the structure then. I'd examine how you can combat that as part of this process.

What's your max capacity?
How much are you using at present?
Any idea what you growth is?
Estimated Budget? (I understand that being a school district, Cheap is good, but we need something to work with

)

wizdum · Jun 6, 2014

Nate7311 said:
Based on your comment, it sounds like you have a significant single point of failure in the structure then. I'd examine how you can combat that as part of this process.

What's your max capacity?
How much are you using at present?
Any idea what you growth is?
Estimated Budget? (I understand that being a school district, Cheap is good, but we need something to work with )

Max capacity of the array is 2TB. We have been using about 500GB of that for the past couple of years. Estimated budget, i'd start with under $3,000. Growth has been very minimal over the past couple of years, 50GB per year or less.

Mackintire · Jun 6, 2014

If you were using Windows Server 2012 you could leverage the new paravirtualized network port aggregation feature.

Basically link aggregation without needing any switch support.

Personally I'd use a 16gb+ Server 2012 DELL R620 with an H710 and as many 10K drives as needed. If you need more I/O speed add a single SSD and enable Cachecade. You get a free cachecade 1.0 license with any Gen 8 Raid card.

You can use RAID 10 or RAID 5. Rebuild times for a 1TB 10K HD are just under an hour on the Gen 8 controllers.

Add a hot spare drive

Put 4 hour fix/break on the server and enable monitoring.

I have over 70 physical servers here and only (1) ever just died...and it was 7 years old and was considered suspect a year before it failed.

This is another reason I love to virtualize.

(3) servers all 10G interconnected to a shared storage array. I run 150 servers out of that one cluster alone.

vr. · Jun 6, 2014

If you're considering PERC controllers, Dell documentation specifically calls out some settings for SSD's on their PERC cards that are seem opposite of what you'd set on a mechanical drive. It _feels_ wrong when you set them the way Dell recommends but ATTO tests prove them out.

I put a pair of consumer grade SSD's in a (Dell) Citrix server and hid that server in a farm of 10 other Dell based Citrix nodes. No one has screamed about it yet.

wizdum · Jun 6, 2014

vr. said:
If you're considering PERC controllers, Dell documentation specifically calls out some settings for SSD's on their PERC cards that are seem opposite of what you'd set on a mechanical drive. It _feels_ wrong when you set them the way Dell recommends but ATTO tests prove them out.

I put a pair of consumer grade SSD's in a (Dell) Citrix server and hid that server in a farm of 10 other Dell based Citrix nodes. No one has screamed about it yet.

Its good that you mentioned this, we have 10 or so citrix servers that will need to be upgraded next year. They're all running server 2003. We had been told that we couldn't put SSDs in them because the high number of write operations will burn them out too quickly.

Nate7311 · Jun 6, 2014

That's based on the fact that the XP/2003 Generation doesn't understand how to best utilize SSDs as part of the OS. Win7/2k8R2+ Generations can self optimize to some degree to lessen unnecessary writes to the SSD.

And just to clarify, the Cachecade 1.0 licensing as mentioned above is Read caching only. No writes, thus a single SSD for cache. Cachecade 2.0 is available now for a license fee, that does R/W, but needs redundant SSDs for safety.

Would you be willing to describe more of the environment for us? It might be more efficient to reconfigure more than just the NAS storage for future redundancy and performance.

diizzy · Jun 6, 2014

Yes, some RAID cards can be crossflashed into "stupid" HBAs which would be preferable using ZFS and along with RAID 0+1 or 5 it'll be a very good solution with good backup management.
This includes some Dell H*-series
//Danne

wizdum · Jun 6, 2014

Nate7311 said:
That's based on the fact that the XP/2003 Generation doesn't understand how to best utilize SSDs as part of the OS. Win7/2k8R2+ Generations can self optimize to some degree to lessen unnecessary writes to the SSD.

And just to clarify, the Cachecade 1.0 licensing as mentioned above is Read caching only. No writes, thus a single SSD for cache. Cachecade 2.0 is available now for a license fee, that does R/W, but needs redundant SSDs for safety.

Would you be willing to describe more of the environment for us? It might be more efficient to reconfigure more than just the NAS storage for future redundancy and performance.

I'll try to create a diagram and post it later (we really need to do that anyway, spent 2 hours trying to locate a switch today). I'm sure you guys would have a field day with our network, its made up of a multitude of bad ideas, strung together in a method that allows it to function somehow.

We have one class A for the LAN, with a DHCP that uses MAC filtering to (poorly) keep people from plugging their own equipment in. The NFS/NIS server is located at the high school, which is our newest building and actually has a really nice datacenter. All the buildings are connected with fiber, but currently use 10/100 media converters. The fiber internet access comes into the high school, goes into a "DMZ Core" layer 3 switch, then into an ancient Sonicwall firewall, then into a Cisco 5000 series router. The Cisco separates our internal LAN from our colocation partners and from the wireless internet access that we provide to the town.

The LAN port on the Cisco goes into a "LAN Core" layer 3 switch. The "Core" switch connects all our separate services into the LAN. We have a VMWare switch, a wireless LAN switch, a couple POE switches for cameras, and a LTSP switch for each building (the LTSP switch connects LTSP servers to the LAN). Each building has its own "core" LTSP switch that is connected to the "LAN Core" switch via fiber optic media converter.

The NFS/NIS server is plugged into the "core" LTSP switch for the high school. There are also 5 LTSP servers plugged into the "core" LTSP switch at the high school. The LTSP servers are dual-homed. The second NIC is plugged into a dedicated LTSP terminal switch for each server and provides DHCP and NAT for the LTSP terminals to connect to and boot from.

There is only one subnet, 10.x.x.x, and no VLANs. A lot of our LTSP servers are very old, single core CPUs, IDE hard drives, and less than 2GB of RAM. I'm thinking they may not be able to keep up with the network traffic, since we have never maxed out the NIC on the NFS server. It usually sits at about 40mbps all day. I did manage to hit about 90mbps when doing a backup of the NFS server to a "newish" server that was plugged into the same high school core LTSP switch.

Also, pretty much all of our switches are now out of their "lifetime" warranty period.

Mackintire · Jun 7, 2014

Time to create an IT budget before everything collapses.

wizdum · Jun 7, 2014

Mackintire said:
Time to create an IT budget before everything collapses.

They're freezing our budget on Monday so they can use it to pay for a deficit. So a lot of this is probably now going on my credit card. We're a victim of our own success. We live (very well) off of surplus hardware, each computer (computer, monitor, keyboard, mouse) costs us about $30. I don't think anyone outside our department has any idea what it usually costs to run an IT department for a school district of this size.

Part of me wants the NFS server to fail, so when the administration comes to ask why this happened, we could just point to the money they took.

lightp2 · Jun 7, 2014

casual observation only,

1. Mackintire has good suggestion - you need to have a budget.

2. If you have an official published allocated budget, then you can review the next step.

3. If you do not have official allocated budget, as stated in your reply above, "budget freeze"

3.1 Request official permission from your school official whether they authorize you to have emergency arrangement to solve technical problem? The wording here is because, again, as you state very clearly, quote from your post

start quote
"no one wants to touch it for fear that it will break. It connects to our network with a single gigabit NIC and has been happily humming along for a little over 10 years."
/end quote

4. As stated above, if you asked and no budget

4.1 and school does not allow emergency arrangement on your own, then you are obviously fairly tight in the scenario.

4.2 else if no budget, but school allows you to do something about it on your own, then you have to assess your own scenario. as stated in your post,
quote from your post
"So a lot of this is probably now going on my credit card"
/end quote
4.2.1 In above scenario, I recommend you to ask in this forum how to backup the current running host. Usually they will ask the budget question again...

5. If everything is not allowed by school administration, and you still feel a deep urge to address the situation ...

calmly review your overall scenario, assess your own commitment level, trust that you follow the right path...,build certain understanding with some along the work place, donate a few to charity organizations, then

you still need to perform data backup, learn to backup the current host and the intricacy, then as stated in this scenario, if everything not allowed, use your own resources to test backup and recovery as a starting point. then perhaps a sensible test emergency duplicated host, remember not to touch the original hardware.

occasionally, re-check with your school to re-apply budget allocation. This is reality-check as stated in your first post, it is running the current network services, so certain budgetary allocation is fairly reasonable.

vr. · Jun 7, 2014

This problem resolution seems remarkably easy to justify to whatever tight-wad bean counter is freezing money. They put in an original solution for $X and it has run for $Y years providing a total ownership cost of $Z per year. It is now at a point where its eminent failure will cost them $A dollars per day in lost productivity if they ignore the situation and there is a catastrophic failure of the equipment.

One of the hardest lessons YOU need to learn is that its not on your personal shoulders if they neglect to spend the money to negate the risk. It _is_ on your shoulders though if you fail to provide them with the correct information that they can manage the risk.

And try not to sound like an alarmist. It's unlikely there will be human loss of life if that shit dies. Keep it off your personal credit card.

wizdum · Jun 7, 2014

vr. said:
This problem resolution seems remarkably easy to justify to whatever tight-wad bean counter is freezing money. They put in an original solution for $X and it has run for $Y years providing a total ownership cost of $Z per year. It is now at a point where its eminent failure will cost them $A dollars per day in lost productivity if they ignore the situation and there is a catastrophic failure of the equipment.

One of the hardest lessons YOU need to learn is that its not on your personal shoulders if they neglect to spend the money to negate the risk. It _is_ on your shoulders though if you fail to provide them with the correct information that they can manage the risk.

And try not to sound like an alarmist. It's unlikely there will be human loss of life if that shit dies. Keep it off your personal credit card.

I don't typically deal with the overall budget. I find problems, or am given problems, and make recommendations. As far as I can tell, we give them a budget plan showing what we need to make it through the year, and they raid it for money whenever they fall short somewhere else. It works great for everyone that isn't us. We were blindsided with the budget freeze on Friday, so I will have to talk with my boss and see what we can do. We're planning to make several large pitches on Monday.

We would have no problem getting the money to fix the NFS server if it actually failed, and I do have weekly backups of all the user files (hopefully, the first run starts tonight at 11pm), so we would just lose the configuration. The only issue would be downtime, as we are very rural and can't just go down the street to the "server store" to pick up a replacement. We would have to rush order something.

I do understand that it doesn't have to be my responsibility to fund it if the school doesn't want to. Unfortunately, we're all believers in the mission here so we would have no problem finding people willing to fund it.

Right now i'm thinking 6x Samsung 840 Pro SSDs in a Dell R610 or R710 would work quite well, and not cost a lot. We have also considered getting rid of NIS and going with LDAP for better control and security.

I may experiment with RAID 10 or RAID 5 before I put the server into use. As long as we have the hardware on site, our risk is fairly low.

Jay_2 · Jun 7, 2014

This all sounds very worrying to me. Does this storage server hold all student work? If you lost that wouldn't this be devastating to the students? If its just word docs etc why do you want to use SSDs?

vr. · Jun 7, 2014

wizdum said:
[snip] As far as I can tell, we give them a budget plan showing what we need to make it through the year, and they raid it for money whenever they fall short somewhere else. It works great for everyone that isn't us.

Then perhaps you're doing your budget wrong.

For example; Did Bob in accounting dictate he needs a new printer and you (IT) should get him one? Why the hell is that out of your budget? IT doesn't need a new printer for Bob. Take it out of Bobs budget. Just like "IT" doesn't need a new NFS server.

As an experiment for Monday you should count up the number of people in your IT team, think about the machinery you as an IT group need to operate. A pc or three? There's your budget. Push those money problems back out to the people that are abusing IT.

wizdum · Jun 7, 2014

vr. said:
Then perhaps you're doing your budget wrong.

For example; Did Bob in accounting dictate he needs a new printer and you (IT) should get him one? Why the hell is that out of your budget? IT doesn't need a new printer for Bob. Take it out of Bobs budget. Just like "IT" doesn't need a new NFS server.

As an experiment for Monday you should count up the number of people in your IT team, think about the machinery you as an IT group need to operate. A pc or three? There's your budget. Push those money problems back out to the people that are abusing IT.

No, I mean the money was taken out of our budget to buy unrelated things like diesel for busses and chairs for special ed. But those other things you mentioned happen as well.

This all sounds very worrying to me. Does this storage server hold all student work? If you lost that wouldn't this be devastating to the students? If its just word docs etc why do you want to use SSDs?

The storage server holds all student accounts. Their work, their program settings, etc.

Shockey · Jun 7, 2014

wizdum said:
I don't typically deal with the overall budget. I find problems, or am given problems, and make recommendations. As far as I can tell, we give them a budget plan showing what we need to make it through the year, and they raid it for money whenever they fall short somewhere else. It works great for everyone that isn't us. We were blindsided with the budget freeze on Friday, so I will have to talk with my boss and see what we can do. We're planning to make several large pitches on Monday.

We would have no problem getting the money to fix the NFS server if it actually failed, and I do have weekly backups of all the user files (hopefully, the first run starts tonight at 11pm), so we would just lose the configuration. The only issue would be downtime, as we are very rural and can't just go down the street to the "server store" to pick up a replacement. We would have to rush order something.

I do understand that it doesn't have to be my responsibility to fund it if the school doesn't want to. Unfortunately, we're all believers in the mission here so we would have no problem finding people willing to fund it.

Right now i'm thinking 6x Samsung 840 Pro SSDs in a Dell R610 or R710 would work quite well, and not cost a lot. We have also considered getting rid of NIS and going with LDAP for better control and security.

I may experiment with RAID 10 or RAID 5 before I put the server into use. As long as we have the hardware on site, our risk is fairly low.

adding a script to backup the configuration files wouldn't be to difficult and could save you some frustration and head ache later when you have to restore it

wizdum · Jun 7, 2014

Shockey said:
adding a script to backup the configuration files wouldn't be to difficult and could save you some frustration and head ache later when you have to restore it

Yeah, currently i'm doing the backup by mounting the NFS share on another machine and using rsnapshot. I think I can set it up to backup the rest of the directories over SSH. The problem comes down to not wanting to install or modify anything on the NFS server.

Mackintire · Jun 9, 2014

I 'm sorry...but here's my take.

Offer the budget doing everything "correctly".

That means CISCO, JUNIPER, true enterprise class and level of support devices 4 hour fix break contracts.

Then when they chop your budget drop back to Zyxel, ubiquiti, second-hand supported...second tier backups, "will work (kinda) in an emergency hot swap backup" etc...

I can make a system sing...and not use true mission critical level equipment and support, but from a design point of view... I'm going to start with "what is correct" and pull my Mr. wizard hat out after they tell me "here's what we can give you, and you will have to make due".

diizzy · Jun 9, 2014

In all honesty since its all a fairly low traffic you don't need to overprovision performance grossly especially since you seem to be on a very slim budget. Just inform staff that current hardware is so old that you'll have trouble getting spare parts when shit brakes loose. That will usually make a few alarm bells ring in administration. I would think that you'll have better shot going to for value stuff (Zyxel, Ubnt etc) but still not the crap (tm) if you tell that you're very well aware of the budget issues in general instead of telling them that you'd want CISCO/Juniper/*-stuff.

That said, do you have any performance graphs regarding current performance of the network and file server in question? You'll probably be better off running UNIX/Linux/*BSD in terms of performance but it all depends on what you want to accomplish. Samba is still a bit quirky when it comes to AD in Windows environments.

My take would be using FreeBSD (preferably) or OpenIndiana using RAID-Z (4 drives, 1-2TB Toshiba DT01ACA drives) and getting two fairly large SSD (256Gb or so) as combined ZIL/L2ARC devices (mirrored) and two OS drives mirrored. The RAID/controller card would be flashed as a HBA. You could netboot this box if you would have a reliable storage for the OS. Getting 32Gb of RAM at least, preferably more but I think that would do fine.
//Danne

Red Squirrel · Jun 9, 2014

Personally I would not use SSDs for the storage arrays unless performance is extremely high on the priority list. That said it is something that would be kind of fun to try some day if I had money to burn.

If you do use SSDs still use proper raid like 5 or 10. 10 probably makes more sense. Don't do 0, as tempting as that might be. SSDs also fail based on usage and not randomly, that means that the odds of them failing all at once is much higher if they are all purchased and deployed at once. In raid they'll get somewhat equal writes. You don't want to wait for a failure to replace. You will want to have some kind of monitoring solution to look at the life remaining parameter of each individual drive. Get an idea of how fast it's going down, and plan to replace the drives one at a time before they all fail. Keep in mind the raid rebuild may speed up this process. Start replacing them one by one when they reach near the end of their life. The old drives can then be redeployed in individual systems or what not as they're still good, it's just that they are going to fail shortly, but in an individual system they probably wont get as many writes as in a server.

When I build a server now I always use a SSD for the OS drive. More reliable than a single HDD so I can pretty much not bother with raid for the OS drive (which requires hardware raid).

diizzy · Jun 9, 2014

You're wrong, gmirror which FreeBSD uses (for instance) works fine without hardware RAID even during failure.

//Danne

peanuthead · Jun 9, 2014

Why not go with something simple like a Synology? It integrates with AD, etc.

wizdum · Jun 9, 2014

peanuthead said:
Why not go with something simple like a Synology? It integrates with AD, etc.

If we went with something like Synology, we would need to build a separate NIS or LDAP server somewhere.

In an LTSP environment, the clients have no local storage, so everything (programs, settings, user files, cache, etc.) is loaded from the server when the user requests it.

@Diizzy
I do have some performance data from Observium. Just server load, iops, network usage,

@Mackintire
On the topic of Zyxel, I saw some recommendations for the GS1900-24, would that be a good fit for our network?

diizzy · Jun 9, 2014

@ wizdum
That would be fine knowing the specs and setup

If you want Zyxel GS1910-24 by far over the GS1900, I have a GS1910-24 and its been doing great since day one.
//Danne

Nate7311 · Jun 9, 2014

wizdum said:
If we went with something like Synology, we would need to build a separate NIS or LDAP server somewhere.

In an LTSP environment, the clients have no local storage, so everything (programs, settings, user files, cache, etc.) is loaded from the server when the user requests it.

@Diizzy
I do have some performance data from Observium. Just server load, iops, network usage,

@Mackintire
On the topic of Zyxel, I saw some recommendations for the GS1900-24, would that be a good fit for our network?

Not to throw more irons in the fire, but at this point you're probably ripe to start looking into virtualization as well. That would make adding another server fairly trivial. I believe that you stated that your existing server was performing NIS and storage duties now, right? At this point standardized tech will be your districts friend. At the same time as you are fighting this battle, think about the "hit by a bus" scenario and think where the district would be if 1 or more of your department were suddenly gone from the picture. How's the documentation of this piecemeal system? How well could another engineer pick up where you left off? That's where the standardized solutions benefit everyone. You get a more reliable system, helping you sleep at night. The district gets industry standard tech that has a much lower ramp-up time, and everyone is happy. The path this thread has taken is exactly why I'd asked for a better view into your network. We've all been there, being asked to take care of an environment on a shoestring budget is taxing on the sanity. But until you lift them up from home-grown solutions to standardized tech, life will never change. You'll fight this battle again 5 years from now, only it'll be even harder to change.

diizzy · Jun 9, 2014

In this case I honestly think that virtualization isn't justified at all, it'll just add more complexity and more things(tm) that can go wrong.
//Danne

Mackintire · Jun 9, 2014

I agree with dizzy that the Zyxel 1910 is the better choice.

I disagree on NOT choosing visualization, but you need decent centralized storage to make it work which will cost extra $$$.

A cluster of (2) servers that each one are capable of running everything (barely) is what you need.

So that your peak load is something like 60% of the overall resources. That way if one server dies you can still function (abit at a reduced capacity) until you get the system back online.

This also lets you upgrade one at a time as time progresses, or migrate to better servers without having to reinstall.

Technically it may allow you to have less servers... but I would suggest that you should have/need atleast 4 physical servers before you consider clustered virtualization.

If uptime is NOT important and being inexpensive is the most important...then you might want to try to use a single host VM setup.

I forget who, but someone out there has a product where you sync your VM shared storage with the cloud and in an emergancy can boot that VM in the cloud. The monthly service is rather cheap. but the hourly service once you actually turn on a VM is fairly pricey.

Mackintire · Jun 9, 2014

Still we need to see/understand your network and servers in order to provide you with accurate guidance.

wizdum · Jun 9, 2014

Mackintire said:
I agree with dizzy that the Zyxel 1910 is the better choice.

I disagree on NOT choosing visualization, but you need decent centralized storage to make it work which will cost extra $$$.

A cluster of (2) servers that each one are capable of running everything (barely) is what you need.

So that your peak load is something like 60% of the overall resources. That way if one server dies you can still function (abit at a reduced capacity) until you get the system back online.

This also lets you upgrade one at a time as time progresses, or migrate to better servers without having to reinstall.

Technically it may allow you to have less servers... but I would suggest that you should have/need atleast 4 physical servers before you consider clustered virtualization.

If uptime is NOT important and being inexpensive is the most important...then you might want to try to use a single host VM setup.

I forget who, but someone out there has a product where you sync your VM shared storage with the cloud and in an emergency can boot that VM in the cloud. The monthly service is rather cheap. but the hourly service once you actually turn on a VM is fairly pricey.

A cluster of two servers sounds like it would be the best option for us. We don't mind using eBay hardware, and the hardware is cheap. Power is essentially free for us, until we run out of capacity in the breaker box.

We do have a two-server VMWare esxi setup with a 16TB SAN, but it is rather congested. They did try virtualizing some of the LTSP servers before I got here, but had unacceptable performance issues. I never saw the setup, so I don't know what the problem was. I do have a single LTSP server running as a VM on a R610 at another school, and it seems to work fine, but they have a much smaller network.

Looking at the Observium logs for the server, data transfer through the NIC hits 50mbps at 8am when everyone shows up, jumps to 100mbps at 10am when the computer labs start being used, drops back down to 50mbps around 1pm when everyone goes to lunch, and then between 2pm and 5pm slowly drops to 10mbps. Thats for the upload (server to client). The download sits at about 20mbps from 8am to 4pm.

The load average is 1.52. Its using all of its 4GB of RAM, but none of its swap.

The IOWait fluctuates between 1% and 20%, with it hitting 16% - 20% very frequently. The CPU is an Intel Xeon X3220

schizrade · Jun 9, 2014

With all that is riding on this system, they need to pony up some funds. It will fail, sooner rather than later.

Enterprise grade with support contracts is the only way to go in this scenario. If they need to lose everything for this to happen so be it I guess. Considering the damn fire dept is using it, it makes it mission critical. You might want to get some formal things down on paper and start presenting the scenario to people that matter. A lot is riding on a rickety bridge, and you can bet that when it fails, it will be you they blame.

diizzy · Jun 9, 2014

With those numbers, you'd be fine with the 4 HDD + SSD ZIL/L2ARC (preferably mirrored). That said, I would much rather use two separate servers or run everything on the same OS rather than add overhead and complexity by running virtualzation on something that's most likely within a reasonable future never going to be used.

It would however be nice if you could get two servers for redundancy in case one dies.

//Danne

Upgrading our primary storage server

[H]ard|Gawd

2[H]4U

[H]ard|Gawd

Supreme [H]ardness

2[H]4U

2[H]4U

[H]ard|Gawd

[H]ard|Gawd

2[H]4U

[H]ard|Gawd

2[H]4U

Supreme [H]ardness

[H]ard|Gawd

2[H]4U

2[H]4U

[H]ard|Gawd

2[H]4U

[H]ard|Gawd

Gawd

Supreme [H]ardness

[H]ard|Gawd

2[H]4U

Supreme [H]ardness

[H]ard|Gawd

2[H]4U

[H]ard|Gawd

2[H]4U

2[H]4U

[H]F Junkie

2[H]4U

Supreme [H]ardness

[H]ard|Gawd

2[H]4U

2[H]4U

2[H]4U

2[H]4U

2[H]4U

[H]ard|Gawd

Supreme [H]ardness

2[H]4U