Enterprise Admins: How do you handle local DNS?

Brak710

[H]ard|Gawd
Joined
Oct 27, 2008
Messages
1,424
Hey all! Hoping for some input if possible.

Recently at my company, we had an issue where our Primary DNS servers (all BlueCoat/Adonis boxes) got stuck in a recursive fail-over and caused some serious issues because of how it would work for 5 seconds and then go offline for 45 seconds to fail-over again.

BlueCat never figured out our issue with the heartbeat sync on our one cluster, so we're stuck with a possible flaky appliances in our architectures across all our cities and datacenters. While we've ironed out some issues we had with servers/desktops not being able to use the secondary, executive management wants assurance that we "won't have an issue again." Today while testing fail-over conditions, we managed to somehow cause another bug that when both pairs in a HA cluster came online, they picked a primary/secondary, but neither wanted to actually take over the VIP for the common address DNS server... Huge red flag again for BlueCat's appliance.

My first thought was to buy some more F5 BIG-IP LTMs to and have multiple independent DNS clusters behind them. The LTM's would have listener addresses that were the Primary/Secondary DNS servers that desktops/servers would have for their usage. From there those listener addresses would simply forward requests to one/many healthy DNS clusters behind them.

As I got to dig in more with our group who handles DNS, it ended up that they pretty much manually did DNS entries for static IPs (servers, routers, switches, etc...) and they would have to copy the configurations from one cluster to the next to propagate a new DNS record across the whole company. The only thing that was done automatically was DHCP hostname registration.

At this point, I knew we were in trouble and would need to really reconsider our whole setup. We have ~16,000 clients per city we operate in, and have another 10,000 spread out at home and at remote offices...

What do you all run for enterprise setups? And is my load-balancer listeners as primary/secondary DNS a terrible HA idea?

Thanks for any input!
 
I don't know how this helps you or not, but all of ours is AD integrated DNS. Everything points to it. Pretty simple and straight forward.

Do you have DCs at each location for authentication?
 
I don't know how this helps you or not, but all of ours is AD integrated DNS. Everything points to it. Pretty simple and straight forward.

Do you have DCs at each location for authentication?

Yeah, there has been some talk that we want to move to AD DNS, but there was concern about high-availability. Maybe I am completely ignorant of Microsoft AD/DNS solutions, but wouldn't we pretty much be pointing clients to 2 AD-DNS servers? At that point, couldn't we still use an load-balancer and have many AD-DNS servers?
 
Why are they not using the Primary/Secondary setup that DNS supports?

In an ISP I would set it up in one way, and in an enterprise I might do the same thing. If you are using AD use it.

If not and you want to stick with Bind I would setup a Primary or two. Then set up secondaries. Everyone would hit the secondaries and the Primaries would answer no requests except from the secondaries.

The secondaries would pull from the Primaries automatically, so only the Primaries would need to be updated.

The Primaries I might run something like Puppet (if they are Linux, though I guess there is a Windows Puppet client as well) on so I could update one file and have it go out to both.
 
We are miniscule compared to what you require, but we use Infoblox appliances to handle our DNS.

2 DNS appliances in running in HA as "grid members", both centrally managed by another appliance, the "grid master". The grid master handles syncronization across grid members. We are small enough where I don't have to play with these things daily and we don't need the most complex set up. Our appliances are a few years old and I know they have come out with newer stuff. Their Infoblox Grid product may have something that would work/scale for you.

We used to run MS Active Directory DNS and that worked fine for us and probably would still work okay as well.
 
For appliances that only do DNS and DHCP, they don't work very well. In fact I'd say they don't.

Recursive should be done locally at each POP, you really want to avoid sending queries over the wire for internal stuff.

AD can integrate pretty easily with regular DNS - you can replicate with normal zone updates with a master/slave config.

Having loadbalanced DNS is pretty common.

You really want a pair of dedicated recursive servers/systems at each site. Not on the same hardware. (different vm, hosts, racks) Each one should be a slave to whatever your replication master is, and they should not at all depend on eachother. Most DHCP responses answer with two DNS servers.. this really isn't as complicated as it's been made to be.

Also, if your site recursive goes down, you might want to drop the site (unless you're an ISP, in which case you haven't explained yourself very well here)

Also,
DNS is a really freekin small protocol. You don't need a lot of horsepower to run ridiculously fast DNS.
16K users per pop can probably be handled by two decent machines, four at absolute most, and that's at like 10% capacity. The extra 10K at home and abroad can hook up to whichever node they're assigned. This is assuming you run bind or unbound on machines with enough memory to satisfy them..
 
Yeah, there has been some talk that we want to move to AD DNS, but there was concern about high-availability. Maybe I am completely ignorant of Microsoft AD/DNS solutions, but wouldn't we pretty much be pointing clients to 2 AD-DNS servers? At that point, couldn't we still use an load-balancer and have many AD-DNS servers?
No.
In AD every domain controller is also a DNS server
In your DHCP, just set each scope to point to different domain controllers as primary and secondary thus spreading out the load across the entire enterprise.
 
No.
In AD every domain controller is also a DNS server
In your DHCP, just set each scope to point to different domain controllers as primary and secondary thus spreading out the load across the entire enterprise.

Under typical/recommended scenarios, absolutely! And because DNS is a relatively light protocol, it commonly resides with AD and combined with DHCP completes the trifecta. DHCP residing on a DC in each site brings tight integration with DNS for finer resolution of dynamic devices. Add that in with redundancy by design, and you've got a winner.
 
Under typical/recommended scenarios, absolutely! And because DNS is a relatively light protocol, it commonly resides with AD and combined with DHCP completes the trifecta. DHCP residing on a DC in each site brings tight integration with DNS for finer resolution of dynamic devices. Add that in with redundancy by design, and you've got a winner.

Sounds like the ideal setup right here. Way I've done it in the past with very good results.
 
No.
In AD every domain controller is also a DNS server
In your DHCP, just set each scope to point to different domain controllers as primary and secondary thus spreading out the load across the entire enterprise.
It's worth adding some data about AD DNS here, as it might be relevant to large environments.

While AD DNS can copy traditional DNS functionality ( primary/slave relationships between zones ), what you'd probably want is an AD-integrated zone. This would mean a couple things;

1) Zone data is replicated along with normal AD data. This also means that this is subject to your replication schedule and any site/topology considerations you may have with your environment.

2) All servers participating in serving AD-integrated zones are considered "Primary" in that they can be updated locally. AD replication handles sync and conflict issues.

3) In order to participate in an AD-integrated zone, the server must be a domain controller. If you have a site without a domain controller, you'll simply need to setup a slave DNS server ( doesn't have to be windows. Can use bind and linux ).

( http://technet.microsoft.com/en-us/library/cc978010.aspx )
 
Kind of laughing, kind of scared reading this.

I am currently working on a deployment of BlueCat, where we are ripping out InfoBlox for similar issues.
 
Back
Top