Need help setting up an automatic proxy configuration to bypass for specific sites.

Nazo · Apr 17, 2018

So I have a fairly cheap VPN. It's not blazing fast, but beggars can't be choosers as they say. However, I have found that if I use a Squid proxy it suddenly performs almost as well as if there were no VPN at all. So what I've done is setup a Raspberry Pi 2 that isn't doing anything else to act as a simple system on the LAN to connect to the VPN automatically on startup and then I can run my browser through Squid on it. This keeps things very simple and efficient and ensures that non-browser things like my games bypass the VPN (I just want the browser to go through -- I don't want the slow speeds and high latency this would cause in games.) Actually, I had done all of this once before, but sadly the memory card went bad and I had to start over from scratch.

Unfortunately there is a caveat. Some websites block my VPN's address range on principle alone. I'm aware some people misuse VPNs (though many misuse their local connections, so I think it's kind of silly to block whole ranges outright with such an assumption.) In some cases they'll let me in if I verify that I'm human via a captcha or whatever, but a few won't. Unfortunately a couple of these include some services I have to use. But then these particular sites are fine for directly connecting anyway. Unfortunately, normally it's all or nothing. However, for a very very long time now a system has been in place for automatic proxy configuration scripts. Last time I googled around and figured it out and had it working. Unfortunately I apparently am either thinking of the wrong search terms or not finding the right stuff somehow otherwise because I'm not able to come up with a fully working script. I have the most basics in place and I managed to get it working before on sites when a subdomain (eg www.whatever.com) is specified, but not without the subdomain (eg whatever.com) and I really would rather have both working. First of all, here is what I've cobbled together after a fair bit of googling:

Code:

var direct = "DIRECT";
var proxy = "PROXY 192.168.1.100:3128";
var deny = "PROXY 127.0.0.1:65535";

function FindProxyForURL(url, host) {
        if (isInNet(host, "192.168.0.0", "255.255.0.0") ||
                isInNet(host, "10.0.0.0", "255.0.0.0") ||
                isInNet(host, "127.0.0.1", "255.255.255.0"))
        {
                return direct;
        }


        // Bypass proxy for these sites
      if (
              dnsDomainIs(host, ".site1.com") ||
              dnsDomainIs(host, ".site2.com") ||
              dnsDomainIs(host, ".site3.com"))
      {
              return direct;
      }
      else
      {
              return proxy;
      }

        { return proxy; }
}

I've tried a bunch of different ways of actually doing the part that bypasses for specific sites and one will work but requires subdomains. This one seems to work ok with subdomains but not without. (I actually once tried removing that dot, but it made it bypass the proxy for everything...) BTW, I presume it doesn't actually need the else since it has the final overarching default return proxy at the end, but all the code samples I copied from had that and when things weren't working I put that in just to dot my 'i's.

I seem to recall that whatever I had before had to actually have duplicates for each site -- one to make it work with subdomains and one to work even without them and that much is ok. Unfortunately I just can't remember the specifics. I do want to (if at all possible) keep each site on a separate line like that so it's easy to come in and add or even remove one since I don't even know what all I need to do this for. (For one, I've lost the list of sites I needed it on since of course that all died with the old memory card. For another some sites are very tricky and give you an error that completely throws you off. One even basically implied there were server issues and it had nothing to do with my side at all that it wouldn't load pages... Very few outright state that they're blocking the IP range and if it weren't for those few I wouldn't even have known what was going on. Sometimes I have to add a site to the bypass list just to test and see if that's what's going on.)

PS. This time if I get it working again I'm backing this thing up...

Cmustang87 · Apr 17, 2018

So, if I'm understanding you correct - you have a Squid proxy for HTTP/HTTPS traffic, and then the Squid uses your VPN as a next hop gateway to proxy the requested web traffic - correct?

Your PAC file has errors in it, make sure you always use a PAC tester when you update it. Here's an easy online one, just copy and paste the text in the PAC and use a sample URL/host: https://proxyforurl.thorsen.pm/

The first problem I see is that you don't have the variables within the operaton.

What you are looking for is "if(shExpMatch(host, "*.site1.com")) - this allows you to use stuff such as wildcards, where as dnsDomainIs can only be used for exact hostnames.

example:
if(shExpMatch(host, "*.site1.com"))
if(shExpMatch(host, "site1.com"))
if(shExpMatch(host, "site2.com/*"))

You can use wildcards and other syntax to achieve much better granularity because sites that block VPNs that use CDNs to distribute content can make your version very tough to fix without a wildcard.

So, I would update your PAC file to look like this:

Code:

function FindProxyForURL(url, host)
{
//
// Variables
//
var direct = "DIRECT";
var proxy = "PROXY 192.168.1.100:3128";
var deny = "PROXY 127.0.0.1:65535";
//
// Return RFC1918 and Local direct
//
    if
        (
        isInNet(host, "192.168.0.0", "255.255.0.0") ||
        isInNet(host, "10.0.0.0", "255.0.0.0") ||
        isInNet(host, "127.0.0.1", "255.255.255.0") ||
        shExpMatch(host, "*.local") ||
        shExpMatch(host, "localhost.*") ||
        shExpMatch(host, "*.localhost.*") ||
        shExpMatch(host, "*.localhost") ||
        shExpMatch(host, "*.local")
        )
        {return direct;}
//
// Bypass proxy for the below
//
    if
        (
        shExpMatch(host, "site1.com") ||
        shExpMatch(host, "*.site1.com") ||
        shExpMatch(host, "site1.com/*")
        )
        {return direct;}
//
// Return all other requests to proxy
//
return proxy;}

Nazo · Apr 18, 2018

Cmustang87 said:
So, if I'm understanding you correct - you have a Squid proxy for HTTP/HTTPS traffic, and then the Squid uses your VPN as a next hop gateway to proxy the requested web traffic - correct?

You do indeed understand correctly. Perhaps too much information really. As my signature would imply, brevity is not my strong point. Anyway, doing this ensures most of the Web crawling goes through the VPN for privacy while only things I very specifically need to be direct do not.

The first problem I see is that you don't have the variables within the operaton.

I wondered about that myself. Like I said, this is a conglomeration of a bunch of PAC samples I found via Google's help. I presumed it was intentional. It does seem that the system was accepting those variables regardless so I assumed it was something like a "define" in some other forms of code. Well certainly there is no harm in moving them into the function since this is the correct place and I will do so.

What you are looking for is "if(shExpMatch(host, "*.site1.com")) - this allows you to use stuff such as wildcards, where as dnsDomainIs can only be used for exact hostnames.

For some reason I had it in my head that shExpMatch could only do actual expressions (eg that wildcard version.) An exact match as you've demonstrated is perfect.

Do I really need the third line with the /* however? As nearly as I can determine it's not actually part of the "host" part. Like I said, I really wish there were a way to debug the handling of the PAC. Not just something that tests for obvious errors which is basically all that site does, but something that walks through showing each step's current values and results. I'm not a big coder, but most things have an easy way to at least put in an echo/print/whatever to show a value for instance or even an actual debug mode. This as far as I can tell has nothing but that final result to go on. So short of rigging up an actual separate proxy that produces a different page depending on different results this just seems insanely hard to diagnose. Am I missing something?

Anyway, thanks for the help. For now I just have the two lines and as far as I can tell it's working great. I think this is actually the way I did it before (except no variables. I liked that aspect of what I found this time around even if it is unnecessary. Overall my new PAC is a lot cleaner and more organized) I think all of the IP checking websites are just about ready to ban me now though, lol.

PS. You have *.local twice. Also, I think localhost.* seems like a bad idea. Wouldn't it be possible for an actual website to actually be localhost.com or something? Well, it's not as if security is disabled when the proxy is bypassed, but still, I don't want to bypass the proxy by accident rather than intent.

EDIT: I just found one small potential issue. shExpMatch(host, "site") seems to accept anything with site.something. For instance, shExpMatch(host, "site123.com") would also accept site123.com.net. Is there a way to correct this? It's not something that exactly comes up much, but I did notice that, for example, since "localhost" is defined as direct it will do direct for "localhost.com" too from that expression. I probably don't have to worry a lot about this, but perhaps it is something I should correct now if I can. Actually, I think I don't need the bit for localhost anyway regardless, but I'm not sure if this theoretical issue could actually have a real life effect on one of the other addresses.

cheap50 · Apr 21, 2018

IF you can't get this to work, PIA (Private Internet Access) has a chrome extension that achieves what you are trying to do. I use it and it works well.

Nazo · Apr 21, 2018

Well, like I said, I did get it to work. It's just not perfect. It's better than that though. Firstly, I want it to run through my own personal VPN, not a paid one. Secondly, I want to use Squid to improve performance. I did look at Chrome extensions before, but even putting aside the Squid thing only one even seemed to be able to support using your own VPN and even it did it very poorly from what people said.

This method may not be perfect, but it definitely does work quite well. Actually, the only potential problem likely isn't even a real problem in real life use. (And it's not as if bypassing the proxy and VPN causes any security issues, it just is less private. So "exploiting" this won't cause me to get a virus or something.)

Jake · May 3, 2018

I inherited maintaining a proxy.pac file at work. It was just done in one long function with IFs for each line.
Here is our syntax and with some sample entries:

Code:

function FindProxyForURL(url,host)
  {
     if (!shExpMatch(host,"*\.*")) return "DIRECT"; // single string hosts //
     if (shExpMatch(host,"*.wildcard1.com")) return "DIRECT";
     if (shExpMatch(host,"*.wildcard1.org")) return "DIRECT";
     if (host == "127.0.0.1") return "DIRECT";
     if (host == "named.host.com") return "DIRECT";
     if (host == "goatpr0n.net") return "DIRECT";
     if (host == "hardocp.com") return "DIRECT";
     if (host == "hardforum.com") return "DIRECT";
         return "PROXY muhproxy.localhost:3128"; // ip or name of proxy //
  }

IIRC this has to be hosted somewhere your clients can get to it via HTTP. I'm assuming your Squid box has apache too, just serve it up from there.

ETA: Forgot to say, load your PAC file in Notepad++ and set the language to Javascript, it helps for highlighting easy to miss syntax errors. (helps me since I'm not a coder)

Need help setting up an automatic proxy configuration to bypass for specific sites.

Nazo

2[H]4U

Cmustang87

Supreme [H]ardness

Nazo

2[H]4U

cheap50

n00b

Nazo

2[H]4U

Jake

Supreme [H]ardness