Complicated DFS question

cyr0n_k0r

Supreme [H]ardness
Joined
Mar 30, 2001
Messages
5,360
I work for a public school district. Currently we use DFS for student's userfiles (my documents). We are running into a problem with long directory parses and constantly running out of diskspace on the file servers. Quotas are not in use currently but we plan to implement them along with an overhaul of DFS. Here is our current setup.

We have 10 schools currently. Current DFS path's follow this scheme:

\\ad.domain.edu\schoolname\userfiles\a-l\studentname
\\ad.domain.edu\schoolname\userfiles\m-z\studentname

Students are currently seperated first by school, then my a-l or m-z (last name)
However, the folders are getting rather large. A-L at the high schools for instance might have 1500 - 2000 students, so just parsing the directory takes about 20 seconds.

My boss wants to move to the scheme below. What he wants to do is not sort by last name anymore, but add additional file servers to each site and set a limit to how many students (regardless of name) will rest on each file server.
Example:

\\ad.domain.edu\schoolname\userfiles\server1\studentname
\\ad.domain.edu\schoolname\userfiles\server2\studentname
etc.

Server 1 for instance might have a limit of 400 students. Once 400 students have been added to "server1" we must then begin adding students to server2 and so on. Each server would have different limits to how many students could be added to it depending on hardware.
Personally I find this approach VERY messy and I think it will add way too much administrative overhead. However, we can't seem to come up with any better solution. We want to move away from sorting by alphabet as that would only be a stop gap. For example splitting a-z into 4 folders instead of 2. We have 12,000 students and add about 1,000 per school year.

Does anyone have another approach to this. I need some out of the box thinking.
 
Who's parsing the folders and getting a 20 second delay?

If it's the students, you should just have them map directly to their directory, no real reason for them to browse the folders for their directory.
 
Agree with the above..

Would breaking it down to the letter really be a terrible stop gap? It's logical (and could be scripted) - do you really anticipate 13x growth (the difference between dividing by 2 vs 26).

What metrics lead you to believe that you need more servers? Are you running into CPU or IO issues with the existing servers (aside from the directory listing, which may just be a function of too many nodes to effectively list at once)?
 
Quotas are a must with students, especially that many and they really shouldn't need that much unless thier in a CAD class or graphic design or something similar. We did 250MB for normal students at my old school but it was distributed over half a dozen servers and only a couple thousand students.

Do you have music and video files not being copied over too? I know there's some GPO policies you can set to exclude certain file types from a redirected or roaming profile.
 
Windows 2003 R2

Travese Folder checking.... allows for mapping the student drive without the student having access outside of thier own folder (ie can not go up on the directory tree past their own folder). Save the parsing as they really do not need access to the "root folders" anyway.
 
The DFS must be changed. That comes from even higher up. Quotas must be enabled. Our district wants a student to have enough space to go from 1st grade to 12th grade and never need to delete an assignment. So we are figuring 500MB per student for 2008-2009, then moving to 1GB per student in about a year or 2.

We do not want to sort by the alphabet any longer. My boss is pretty firm about that.

We've revised our structure this morning to have each server1, server2, etc link be a hard limit of 100 students. But it still to mee sounds messy.
 
Windows 2003 R2

Travese Folder checking.... allows for mapping the student drive without the student having access outside of thier own folder (ie can not go up on the directory tree past their own folder). Save the parsing as they really do not need access to the "root folders" anyway.
Its not the students that parse the directory, it's us. When we go into the folders to move students or whatever it takes forever just to load the a-l folder.
 
We've revised our structure this morning to have each server1, server2, etc link be a hard limit of 100 students. But it still to mee sounds messy.

Yes, it will be because you're going to have the overhead of mapping students to the correct server - you'll have even less organization. How will you know if student Asdf should be mapped to server1/asdf or server2/asdf? You'll have to have some kind of additional metadata/table to keep track of it (or an attribute in AD). What about year to year? Are all servers going to have students in the same year? If not, you end up with holes when students graduate. What about when they're kicked out/transfer in or out? Alphabetically makes the most sense - and it's not a stop gap. You can subdivide it further out - some delineation on the first TWO letters. If the problem is simply directory listing times (which it sounds like and is easily worked around), throwing more servers at it is pretty wasteful.

Does server1, server2 correspond to physical machines? If so, 100 users per file server seems extremely conservative.
 
Its not the students that parse the directory, it's us. When we go into the folders to move students or whatever it takes forever just to load the a-l folder.

I have Server 2003 R2 with over 5000 folders and it doesn't take 20s to populate the folder list.

Because we have over 3,000 users and no quotas I don't use DFS for home directory's. If you have to split them up outside of one folder DFS is usueless. We redirect their my documents to the homedirectory ad attribute. Then I manage what server\share they are on through AD. DFS works great for group shares. Not so well for user data when the user data is double the size of our group shares.
 
Alphabetically makes no sense at all unless you plan to balance how many students you allow in the school based on their names.

You need a scheme like odd/even registration number, etc...
 
The reason for the move from alphabet to server1, server2 is because we keep running our of space.

And it's not an OS issue that causes long directory parses, its a limitation of our hardware. We have old OLD servers that we cant surplus until we've had them at least 10 years or more. So we have all these old pentium3 servers with SCSI raids that aren't any bigger than 50GB or so for an entire school.

We plan to make each server1, server2, a hard limit of 100 students and throw old hardware at the problem. If we have a newer server at a school then we might put 3 or 4 server folders on it allowing it to handle 400 students instead. However, that will be transparent.

The overall problem we are trying to solve is how to scale diskspace wise as we can't just magically add 100GB to a server that is maxed out on storage.
 
The reason for the move from alphabet to server1, server2 is because we keep running our of space.

And it's not an OS issue that causes long directory parses, its a limitation of our hardware. We have old OLD servers that we cant surplus until we've had them at least 10 years or more. So we have all these old pentium3 servers with SCSI raids that aren't any bigger than 50GB or so for an entire school.

We plan to make each server1, server2, a hard limit of 100 students and throw old hardware at the problem. If we have a newer server at a school then we might put 3 or 4 server folders on it allowing it to handle 400 students instead. However, that will be transparent.

The overall problem we are trying to solve is how to scale diskspace wise as we can't just magically add 100GB to a server that is maxed out on storage.


I no clue what your network architecture looks like but are you using all local storage ? It would be a lot easier if you had something you could expand like an entery level SAN.
 
all servers are running local storage.

Newer servers use local and DAS units. But no SANS or networked storage of any kind.
 
Back
Top