Difficult DNS(?) issue

VeeDubbs

Limp Gawd
Joined
Dec 9, 2005
Messages
398
Hi all -

I work at a small private college. We've been experiencing a weird DNS issue for a while, hopefully I can explain it well enough.

We have a lot of servers. Some servers reside in our DMZ (Blackboard, Mailman, etc...) and some are on our internal (10.2.x.x) network.

Those servers that reside in the DMZ NEVER have any issues. I can go to blackboard.school.edu in my web browser and get to it 100% of the time. Those that are on our internal network, we have issues getting to. For example, our knowledgebase is albert.school.edu. I cannot guarantee that we can get to it 100%. If you can't get to it, you can release/renew your IP address and then most likely you'll be able to get to it.

Also, this seems to happen only on Windows machines. I have a Windows, Mac and Linux box on my desk. This issue NEVER appears on the Mac or Linux...only on the Windows box.

We have an ASA5510 for our firewall. When we have a server on our internal network but need it accessible from the outside we set up a static route on the ASA:

Code:
static (inside,outside) 198.150.XXX.XXX 10.2.1.48 netmask 255.255.255.255 tcp 100  70

and also open the appropriate holes in the ASA.

But we also do the same for servers in the DMZ:

Code:
static (ColDMZ,outside) 198.150.XXX.XXX 198.150.XXX.XXX netmask 255.255.255.255

We have tried multiple DNS servers over time. We've had an appliance that does DNS (Cobalt RAQ), a Novell Server, a Linux Server, and most recently a Windows 2003 server. This issue has continued to happen throughout all the different DNS servers. I'm starting to think it's not DNS. At this point, I'm thinking it maybe something in our firewall...but am not 100% sure. I am really at a loss...

So, in short...

You can't get to resources on internal servers 100% of the time.
Release/renew of your IP will most likely fix your issue.
Only happens on Windows machines.
Have tried multiple DNS servers, but the issue still exists.

Any suggestions or input would be much appreciated!!!
 
Does the issue seem limited to XP, Vista or 7? When you "can't get to it", what does that mean? The address doesn't resolve? How many DNS servers are we talking about? When you experience this issue, can you use nslookup to hit another server and resolve the address? Do you have multiple DHCP servers? Are the DNS server(s) located on different subnets? If so, is the firewall the router for these different subnets?
 
Does the issue seem limited to XP, Vista or 7? When you "can't get to it", what does that mean? The address doesn't resolve? How many DNS servers are we talking about? When you experience this issue, can you use nslookup to hit another server and resolve the address? Do you have multiple DHCP servers? Are the DNS server(s) located on different subnets? If so, is the firewall the router for these different subnets?

I have personally seen it on XP and 7. I don't use Vista -- but I think it's been reported to happen in Vista. But again, I can be having the issue on my Windows 7 machine, roll over to my Ubuntu machine, and not have the problem.

When I say we "can't get to it", I mean it's a web-based application and we cannot get to the webpage of it. Doing an nslookup returns the appropriate info of the server I am trying to get to. At that point, either do a release/renew or just type in the IP address of the server that I am trying to get to...

We have one DHCP server and one DNS server. All of our internal servers are on the same subnet (10.2.x.x). We have a Cisco 3750 that does our routing between subnets.

Thanks for all the questions! I hope I'm being clear with everything. The more I write about the problem, the more confusing it sounds...
 
Can you ping the server in question, even when you can't access the webpage?
 
Look at the structure of your static statements...

The one you are having an issue with is limited to 100 TCP connections with an embryonic limit of 70 (these are TCP sessions that have not been set up completely). You aren't specifying this same limit for your DMZ server static which isn't having the same issue.

You can do the following to get an idea of how many connections there are for the firewall total and then for the host:

Code:
sh conn det
sh conn det | grep 10.2.1.48

Check the "FLAGS" portion of the output to see if you have a lot of half-open connections. You'll need the command reference to interpret the flags. IIRC when a bunch start with a little "s", that's bad lol.

If you just want to remove the TCP limit from the static just do (in a maintenance window):

Code:
no static (inside,outside) 198.150.XXX.XXX 10.2.1.48 netmask 255.255.255.255 tcp 100  70
clear xlate
static (inside,outside) 198.150.XXX.XXX 10.2.1.48 netmask 255.255.255.255
 
I was also thinking that since you have a TCP connection limit you'll need to check the "timeout conn" value in the config.

The default is 1 hour and half-closed is 10 min. You could always shorten those values a bit. Depends on the firewall load though and what apps are running through it so be careful.

...and if you change them you'll need to do a clear conn to get everyone to take the new value (in a maint window)
 
Look at the structure of your static statements...

The one you are having an issue with is limited to 100 TCP connections with an embryonic limit of 70 (these are TCP sessions that have not been set up completely). You aren't specifying this same limit for your DMZ server static which isn't having the same issue.

You can do the following to get an idea of how many connections there are for the firewall total and then for the host:

Code:
sh conn det
sh conn det | grep 10.2.1.48

Check the "FLAGS" portion of the output to see if you have a lot of half-open connections. You'll need the command reference to interpret the flags. IIRC when a bunch start with a little "s", that's bad lol.

If you just want to remove the TCP limit from the static just do (in a maintenance window):

Code:
no static (inside,outside) 198.150.XXX.XXX 10.2.1.48 netmask 255.255.255.255 tcp 100  70
clear xlate
static (inside,outside) 198.150.XXX.XXX 10.2.1.48 netmask 255.255.255.255

Hi guys,

Trying to head home now, so this will be brief. I'll try to update more tomorrow...

First off, XOR != OR, I don't know the answer to your question. I guess I'll have to try the next time I experience. I want to say yes I can ping it, but I don't know 100%.

mattjw916 -- I did not mention this in any of my other posts, but I have taken out the TCP connections and embryonic connections (100 70) on one of my static routes a while ago because I noticed that as a difference as well. Unfortuantely, the issue still happened to servers with the limited connections and the one without the limited connection.

Gotta go! Hope to have a better reply/post tomorrow!!

Thanks guys!
 
Those are static NAT statements not static routes but okay... You still need to do a "clear xlate" or the changes you made haven't actually taken effect unless the xlate aged out normally.
 
Those are static NAT statements not static routes but okay... You still need to do a "clear xlate" or the changes you made haven't actually taken effect unless the xlate aged out normally.

Sorry bout that. I've always called them static routes! Thanks for fixing me!

So even though I've taken out the 100 70 on a server it will not take effect until I do clear xlate? What exactly is that command doing? Will it kill any connections or anything like that?

Just want to clarify that with you.
 
The quick answer is the xlate table contains all the firewall's local_ip to global_ip mappings. Any time you modify nat, static, global, and route(?) statements you must issue this command.

The command doesn't "disconnect" anything but it does break the sessions through the firewall momentarily. Whether this is a problem or not depends on how the apps that are passing through it work which is why you should do it off hours or in a maintenance window.

Hit up the command reference for a much more involved answer. I'm all "Cisco'd" out for today since I'm on vacation for the weekend.
 
The quick answer is the xlate table contains all the firewall's local_ip to global_ip mappings. Any time you modify nat, static, global, and route(?) statements you must issue this command.

The command doesn't "disconnect" anything but it does break the sessions through the firewall momentarily. Whether this is a problem or not depends on how the apps that are passing through it work which is why you should do it off hours or in a maintenance window.

Hit up the command reference for a much more involved answer. I'm all "Cisco'd" out for today since I'm on vacation for the weekend.

Awesome! Thanks for all the info!

I will definitely be trying this the next I am able to!

Thanks again!
 
Well, i was able to run clear xlate and am still having the same issues with all internal servers with or without the TCP/embryonic connections.

Looks like there is something bigger going on here...
 
Have you looked at the "dns" option for your static line(s)? It translate DNS queries as they pass through the ASA.
 
Have you looked at the "dns" option for your static line(s)? It translate DNS queries as they pass through the ASA.

Xipher -

This is what I see for the option:

dns Use the created xlate to rewrite DNS address record


I'm not too sure I understand what that would actually do.

And just to reiterate, we DO NOT have this problem with any servers in the DMZ, only with internal servers.

I did just notice that servers in the DMZ use this:

Code:
static (ColDMZ,outside) etc, etc, etc...

whereas internal servers use this:

Code:
static (inside,outside) etc, etc, etc,....

Does anybody think that that would make a difference?

I'm really lost as to what to troubleshoot at this point...
 
I've been trying to follow what you've been saying throughout the thread, so please correct me if I'm wrong.

All your DMZ servers have statically assigned Public IP's
You are using the ASA to perform nat from the 10.2.1.x inside servers network so they can reach the dmz and the outside world.
Only windows computers are having this issue.
Ping does work while the windows computers have the issue.
NSlookup does return the correct IP when the windows computers are having the issue.
Accessing the website via IP directly in a browser does work while the computer is having the issue.

A packet capture while the issue is happening may hold the key to your issue. Post a packet capture if you happen to get one.

What is the IOS version running on the ASA? Cisco isn't infallible... My friend accidentally brought a network down to it's knee's by picking an IOS that had a bug that ignored dscp markings and it wasn't a listed bug, but it was listed as fixed in the next ios after he reported it to Cisco.
 
Last edited:
I've been trying to follow what you've been saying throughout the thread, so please correct me if I'm wrong.

All your DMZ servers have statically assigned Public IP's
You are using the ASA to perform nat from the 10.2.1.x inside servers network so they can reach the dmz and the outside world.
Only windows computers are having this issue.
Ping does work while the windows computers have the issue.
NSlookup does return the correct IP when the windows computers are having the issue.
Accessing the website via IP directly in a browser does work while the computer is having the issue.

A packet capture while the issue is happening may hold the key to your issue. Post a packet capture if you happen to get one.

What is the IOS version running on the ASA? Cisco isn't infallible... My friend accidentally brought a network down to it's knee's by picking an IOS that had a bug that ignored dscp markings and it wasn't a listed bug, but it was listed as fixed in the next ios after he reported it to Cisco.

awesomeo -

Yes, all DMZ servers have static public IPs.
Yes, ASA is performing NAT for internal 10.2.x.x servers.
I can't say 100% that it is only Windows. But I have only personally seen it (and have only heard about it) on Windows machines.
Yes, I can ping the server even when a machine is having an issue.
nslookup does bring back the correct info
Yes, accessing an application by IP instead of DNS record seems to work.

ASA is running software version 8.2.2 -- which I *think* is the latest...

I will attempt a packet capture next time this issue occurs.

Thanks for all the help everyone!!
 
I would upgrade to 8.3(1) and see if your problem remains and cross your fingers that no new problems are created. The caveots list among the last few revisions have fixed quite a few tcp handling problems and a few xlate issues. I'm sure you already know, but it doesn't hurt to remind you. Remember to keep a copy of your current ios and your startup config in two other places before you upgrade the ios on the ASA.
 
Back
Top