Hey all! Hoping for some input if possible.
Recently at my company, we had an issue where our Primary DNS servers (all BlueCoat/Adonis boxes) got stuck in a recursive fail-over and caused some serious issues because of how it would work for 5 seconds and then go offline for 45 seconds to fail-over again.
BlueCat never figured out our issue with the heartbeat sync on our one cluster, so we're stuck with a possible flaky appliances in our architectures across all our cities and datacenters. While we've ironed out some issues we had with servers/desktops not being able to use the secondary, executive management wants assurance that we "won't have an issue again." Today while testing fail-over conditions, we managed to somehow cause another bug that when both pairs in a HA cluster came online, they picked a primary/secondary, but neither wanted to actually take over the VIP for the common address DNS server... Huge red flag again for BlueCat's appliance.
My first thought was to buy some more F5 BIG-IP LTMs to and have multiple independent DNS clusters behind them. The LTM's would have listener addresses that were the Primary/Secondary DNS servers that desktops/servers would have for their usage. From there those listener addresses would simply forward requests to one/many healthy DNS clusters behind them.
As I got to dig in more with our group who handles DNS, it ended up that they pretty much manually did DNS entries for static IPs (servers, routers, switches, etc...) and they would have to copy the configurations from one cluster to the next to propagate a new DNS record across the whole company. The only thing that was done automatically was DHCP hostname registration.
At this point, I knew we were in trouble and would need to really reconsider our whole setup. We have ~16,000 clients per city we operate in, and have another 10,000 spread out at home and at remote offices...
What do you all run for enterprise setups? And is my load-balancer listeners as primary/secondary DNS a terrible HA idea?
Thanks for any input!
Recently at my company, we had an issue where our Primary DNS servers (all BlueCoat/Adonis boxes) got stuck in a recursive fail-over and caused some serious issues because of how it would work for 5 seconds and then go offline for 45 seconds to fail-over again.
BlueCat never figured out our issue with the heartbeat sync on our one cluster, so we're stuck with a possible flaky appliances in our architectures across all our cities and datacenters. While we've ironed out some issues we had with servers/desktops not being able to use the secondary, executive management wants assurance that we "won't have an issue again." Today while testing fail-over conditions, we managed to somehow cause another bug that when both pairs in a HA cluster came online, they picked a primary/secondary, but neither wanted to actually take over the VIP for the common address DNS server... Huge red flag again for BlueCat's appliance.
My first thought was to buy some more F5 BIG-IP LTMs to and have multiple independent DNS clusters behind them. The LTM's would have listener addresses that were the Primary/Secondary DNS servers that desktops/servers would have for their usage. From there those listener addresses would simply forward requests to one/many healthy DNS clusters behind them.
As I got to dig in more with our group who handles DNS, it ended up that they pretty much manually did DNS entries for static IPs (servers, routers, switches, etc...) and they would have to copy the configurations from one cluster to the next to propagate a new DNS record across the whole company. The only thing that was done automatically was DHCP hostname registration.
At this point, I knew we were in trouble and would need to really reconsider our whole setup. We have ~16,000 clients per city we operate in, and have another 10,000 spread out at home and at remote offices...
What do you all run for enterprise setups? And is my load-balancer listeners as primary/secondary DNS a terrible HA idea?
Thanks for any input!