Q?? on file transfer over network/Internet to MANY people - Server/HD issues

Joined
Aug 21, 2009
Messages
588
I'm really not sure how file transfers work when there are many requests for the same file at the same time. I'm looking at file sizes from 50MB to 5GB and transfers to 50-5,000 people at one time. I'll be running Linux or FreeBSD.

Example: 50 people request the same 5GB file, each person will transfer via Internet with FTP or HTTP. I'm trying to figure out how the OS handles something like this. Some HTTP downloading clients allow files to be split into parts (threads?) and I've seen that FTP seems to only allow one thread per connection. So if the same file is being accessed by 50 users, will the HD be accessing the disk at 50 different locations (or however many threads are open) for each user? This seems like A LOT of use on the HD. OR - would/can the file be transferred to RAM, read once - transferred to RAM - and then read/transferred from RAM..? If it isn't transfered from RAM, is an SSD a necessity for transfers like this?

I've also used RAM drives a few times, where a segment of RAM is provisioned to act as "hard drive" storage but it obviously copies/accesses/transfers much faster. Does anyone know if these are used in high capacity servers these days?

So, does anyone know how this type of setup would work and have any suggestions on OS/FS/server setup/software (apache, nginX, IIS, ETC) to handle loads such as this?
 
It depends on way to many factors. Each http/ftp request will open the file again, so you will have 50 or more accesses to the file at once.

http allows range requests if you don't disable them, this is what those download accellerators use, and acrobat, to download files, using this, a single user could download the same file 50+ times, at different points, to help him overcome the latency of a single tcp connection speed.

How this is cached in ram, all depends on where each user is accessing the file. It's going look like totally random access to the harddrive. The good thing is, you can use large readaheads to help limit this, but at some point, that won't work either.

It's way too expensive to use and power ram, to store that single file, than to just have a bunch of disks, or a single ssd. IF you where doing a major release, like say, windows 8 iso's coming out, then it could benifit from using a ramdisk, but likely not needed assuming just let that ram be empty for the filesystem cache to use it.

An easier way to learn about this, is just lookup what nas makers say, Our nas is guarenteed to stream 4 bluray movies. This is the same thing your basically asking. that would be 4 downloads from that single disk, and able to maintain playback speed, or happy download speeds by users.
 
It depends on way to many factors. Each http/ftp request will open the file again, so you will have 50 or more accesses to the file at once.

http allows range requests if you don't disable them, this is what those download accellerators use, and acrobat, to download files, using this, a single user could download the same file 50+ times, at different points, to help him overcome the latency of a single tcp connection speed.

How this is cached in ram, all depends on where each user is accessing the file. It's going look like totally random access to the harddrive. The good thing is, you can use large readaheads to help limit this, but at some point, that won't work either.

It's way too expensive to use and power ram, to store that single file, than to just have a bunch of disks, or a single ssd. IF you where doing a major release, like say, windows 8 iso's coming out, then it could benifit from using a ramdisk, but likely not needed assuming just let that ram be empty for the filesystem cache to use it.

An easier way to learn about this, is just lookup what nas makers say, Our nas is guarenteed to stream 4 bluray movies. This is the same thing your basically asking. that would be 4 downloads from that single disk, and able to maintain playback speed, or happy download speeds by users.

Thank you very much for your reply! I didn't know RAM used much power. Is that something I should be concerned about on my desktop (32GB VM host for testing) as I use probably 50-90% of it most time (I could add cooling I guess if that would help).

This is just astounding that the disk would need to access so many times when reading the same file. I guess this is where SSD's really come in handy for caching if using spindle drives.

If anyone has has any other info on how this process works, I'm interested to hear anything else that may be relevant, but overall that was an excellent explanation and I thank you again.
 
In that scale, it doesnt matter much, your looking at 3-5watts per memory stick.

But to load up a machine to 48 or 96gigs, your looking at 8 or 16 sticks, it can add up fast, per machine.

Plus, 5watts is how much the green/red harddrives use these days, so memory stick or disk? for the same power usage.
 
Typically the way it works is that the OS maintains a "page cache" in RAM of data that has been recently accessed.

So when it gets a request to read a block from the disk (from some random client) it will check to see if it is available in the page cache. If it is, then it sends the data from RAM and doesn't need to access the disk. If the data is not in the page cache then it will request a chunk of data from the disk (often more than was initially requested on the assumption that the client will soon ask for more). When that chunk becomes available it gets copied into the page cache, possibly evicting the least-recently-used data from the cache.

So if you have a lot of people mostly accessing the same files, it's possible that the working set will fit into your page cache and you'll have no performance issues. If everyone is trying to access different files, then you may run into issues and you'll want SSD or more spindles to provide the necessary performance.

Lastly, you're going to be limited by your weakest link. If you only have gigabit networking then that means you only need to sustain 125MB/sec worst-case.
 
Back
Top