Best way to limit leechers

  • Thread starter Deleted member 88227
  • Start date
D

Deleted member 88227

Guest
I run a web site that hosts a lot of files for a few games. We're talking over 500GB worth of data that is available for download to anyone. The problem I am having here is people using programs such as wget and HTTrack to "site rip" the entire site. While I do not have a problem with people downloading everything, I do not like it when they consume a lot of data all at once.

What I want is some way to monitor each IP and if they download over xx GB in a single day they are either limited by a rate limit and/or forced to wait xx hours before they can download again.

I want this to be completely invisible to the users downloading. I do not like built in browser download apps, making users wait in line, forcing them to do extra work to get to the download (such as captcha) or anything. I want this to work in the background.

Ideally, I want this.

If x.x.x.x has downloaded 20GB in less than 24 hours they are redirected to a page letting them know they can not download anymore for 24 hours. At that point they'll be given access to the downloads again. I do not want any kind of accounts/registration from the user, strictly monitor them via IP. While I do realize this approach can easily be bypassed with a proxy, I do not think most users who do this will bother going that route.
 
You could pull this off very simply with mod_rewrite and some simple PHP. Basically you would setup rules to rewrite all requests to point towards your php script, mangling the filename so it's passed as an argument to the script. The script throws the IP into a mysql database and checks the accumulated total downloaded. If it's within your limit then it serves the file either by redirecting to a unique url bypassing the mod_rewrite rules or by using php to serve the file directly (has some downsides if the file is big).
 
Well what would be considered big? Some of the files are in the 100s of MB size and a couple might be in or close to the 1GB range.
 
The downside was related to serving the file through PHP. You avoid that if you use the other method of redirecting to a unique url through mod_rewrite. If you're serving files that are 1GB then I would recommend the latter route. The big issue with serving the files directly through PHP is it increases the memory footprint quite a bit because each byte goes through PHP. Also you run into issues with execution time on your scripts.
 
Monitoring IPs is totally the wrong way to go. Lots and lots of legitimate Internet users appear to a server as coming from the same IP. Anyone who is on a corporate or institutional network will show up as coming from the same IP. For example, everyone who is within a University network shows up as a having the same IP to the publicly routed Internet.

I have also heard (not confirmed) that Comcast users are often seen under a small number of IPs by the routed Internet.

Using the IP is not a reliable method to separate distinct users.

Conceptually here is what I would do;
Place the files into a directory that's only accessible by your web server and cannot be accessed by a web client directly.
Have a single web page for each file, or perhaps have small groups of files per page, not a lot of files per page though.
Use the session ID to dynamically create a link to the file and have some magic that maps the session ID file URL to the real file path server side.

No idea how to programmatically accomplish this though.
 
I'm not sure if that'd meet the OP's requirement of being transparent. From the sounds of it he's looking to limit people from mirroring the whole site, so as long as the limit is set to something generous then any normal users of the site should be able to use it even if they are sharing the IP with a bunch of different people.
 
That is correct. What you are saying would not be feasible. I do not care if people download/mirror the web site, but I do not want them to rape the bandwidth in two days doing it. I also want the files to be direct links or at least appear as direct links to the end user.

A good bit of the files from the site are downloading by server operators using wget and the method you are describing would make it difficult, not impossible, for them to use wget. Since many of them have limited knowledge since they're just running a game server.

As for multiple users appearing as the same IP; I do not think that would be a problem. I highly doubt everyone behind the same IP will be mirroring the site in one go anyway. At most they'll be downloading the games patches, specific maps, models, etc.. and 20GB per IP, per day is more than plenty for them to probably not even know their is any sort of limit on downloads.

The only people who will know are the ones who use wget to mirror the site or other offline web site programs such as HTTrack.

The games in question are the Unreal series of games, therefore the games aren't all that popular to begin with. Though with the announcement of a new UT in the works, I want to prepare for the game to get popular and thus my site/services getting used more frequently.
 
So, are cookies out of the question? There are ways to determine and likely limit privileges from cookies.

Again, someone sharp would probably be able to figure a way around it, but your average user wouldn't bother.
 
You could host the files with a cloud service such as Amazon S3 and then not worry about it when people do that. Then the changes you would need to make would be pointing all of the files to the new locations.
 
So, are cookies out of the question? There are ways to determine and likely limit privileges from cookies.

Again, someone sharp would probably be able to figure a way around it, but your average user wouldn't bother.

I guess I could use cookies if there is a way, but I don't think most "site ripper" programs even accept cookies.

You could host the files with a cloud service such as Amazon S3 and then not worry about it when people do that. Then the changes you would need to make would be pointing all of the files to the new locations.

I have over 500GB worth of files and I am currently sitting around 1.4TB of transfer this month. While I know that's not a lot compared to my limit, but I still want some way to control it because people do site rip some of the sites. I'd say out of the ~8 sub domains two of them account for over 50% of the bandwidth.

I do run the downloads through a php script to index the files, but it's easily by passed if needed. The main purpose of the site was to function as a download mirror for a content site that has individual pages for each file, but since I do not have the time to create such a site I offer the downloads for everyone and anyone is welcome to hotlink downloads on sites.
 
Last edited by a moderator:
Given the layout of the site, I think my original recommendation is a good fit. mod_rewrite + some simple PHP.
 
I managed to find a script called Anti-hammer that I am testing out. While it doesn't really monitor users bandwidth usage, it will slow site rippers down from hitting the site as fast as their connection will allow them. Hoping to get it working, but it's been given me some problems with errors and what not.

While this isn't an ideal fix, it's a nice bandaid that I think will help with the server load.
 
Assuming Apache, I'd explore this blog post, as it sounds really close to what you want. Using mod_security to limit the number of requests per-time-period from an IP.
 
Hey thanks for that. Currently I am using anti-leech and it seems to be working okay as long as they pull the files from the PHP file. If they get directly into the directory then they can bypass it really easily and to my surprise, that's the first thing this little shit did after I implemented it. So now I turned off directory indexing and it'll take a bit more coding to figure out how to get around that. Although if you know the file name you can still download it directly from the directory without using PHP.

Anyone is welcome to test it out and let me know if it's working, but most importantly if you get around it somehow let me know and I'll see if I can prevent it.

I wish I could get it to pass through ANY file on the server and not just files going through PHP so I can use it on the redirect sites as well.
 
Last edited by a moderator:
I wonder if it's possible to use .htaccess to force files to transfer through the PHP file using mod_rewrite

The syntax for downloading files through the PHP script is this:

Code:
http://www.site.com/index.php?dir=Anticheat/ACE/&file=ACEAutoBan_for_Ace08_v03a.zip

You can see that index.php?dir= is in front of the first directory then the last directory has a &file= line followed by the name of the file. I wonder how hard it would be to force the client to grab files through index.php?=dir and &file=

Anyone can easily figure out if you just drop those sections in that line you'll get this line:

Code:
http://www.site.com/Anticheat/ACE/ACEAutoBan_for_Ace08_v03a.zip

Which will bypass the script, but if a .htaccess file can somehow force the client back to the index.php file that would be spectacular.
 
Last edited by a moderator:
Very very easy using mod_rewrite. You would match based on the request starting with index.php. If it doesn't match then you would rewrite the request to index.php.
 
I wonder if iptables can accomplish this. Been Googling all day it seems and nothing so far.

However I just discovered that IP tables might be able to do this. It just needs to only limit bandwidth on port 80 since I host various game servers and a voice server.
 
Have you looked into mod_rewrite?

Did you even read what you quoted?


No, I do not think that is an option, but that link mentions a CBand Module they were using previously on Apache.

Upon further reading it seems to limit the download rate (throughput) on a per IP basis. While this isn't exactly what I wanted, I could easily set the rate so low that it would deter them from ripping perhaps.

Gonna go do some research on CBand now, thanks!
 
use pfsense

Create the rule:

Screen_Shot_2014_05_28_at_10_00_11_PM.png


Add the advanced filter
Screen_Shot_2014_05_28_at_10_00_23_PM.png


definition of advanced filter:
Screen_Shot_2014_05_28_at_9_59_16_PM.png


This uses traffic shaping and limiter features. It should limit all requests to http port 80 to any host from that IP.
 
Back
Top