Pythian Blog: Technical Track

DNS Setup for Effective 11i DR Failover

One of the main goals in architecting a Disaster Recovery (DR) solution is to make a DR failover transparent to the end users. Too often, users must reboot their desktops, clear their browser cache and the jinitiator jar cache, and so on, even when we have made sure that the post-failover URL of the 11i instance is the same. After a failover of an 11i instance from a primary site to a DR site, if the user can operate without changing anything in his desktop, only then can we say that the goal is achieved.

In most cases the culprits are: forgetting the DNS setup for the hostnames of Middle Tiers, or the load balancer, if one is used; and the caching of DNS entries at the different levels in the network. A quick look at the caching section of Wikipedia’s page on DNS gives some idea of I’m talking about. Because of the default settings, the old IP address gets cached in the user’s desktop and in caching DNS servers in the network. As a result, the user’s desktop is still trying to reach the old server, which is now offline.

The best fix for these kind of DNS side effects is to change the TTL (Time To Live) parameter of the DNS entry for the hostname from the default value to a smaller one. I prefer setting it to a value a little smaller than the time you take to failover. That is, if you take 60 minutes to failover from Primary to Secondary datacenter, then set the TTL to 50 minutes.

Let’s take an example here. Let’s say our 11i instance has the URL https://apps.example.com:8000, the primary instance being windsor, the secondary ottawa. And we have two load balancers: one at primary site and one at the secondary, with hostnames lb.windsor.example.com and lb.ottawa.example.com respectively. If the DNS is set up with default values, it will look like this:

hostname                 TTL     Type    value
----------------------------------------------
apps.example.com         86400   CNAME   lb.windsor.example.com
lb.windsor.example.com   86400   A       192.168.1.100
lb.ottawa.example.com    86400   A       192.168.2.100

apps.example.com is an alias (CNAME) to lb.windsor.example.com and the TTL value is set to 86400 seconds, i.e., 24 hours. That means this record gets cached for a duration of 24 hours at the user’s desktop and at any caching DNS servers being used by the client. So at the time of failover, even though we change the DNS records of apps.pythian.com to point to the ottawa load balancer instead of windsor, because the TTL is set to a very high value of 24 hours, the user’s browser will still be trying to reach the primary site load balancer, as it is cached in their desktop for next 24 hours

As I suggested earlier, if we set the TTL of apps.example.com to 50 minutes (3000 seconds) and do the changes to DNS as first step in the failover procedure, then by the time we finish (which is supposed to be 60 minutes), the old DNS records in the user’s desktop cache and the caching DNS server will have expired, and they will start seeing the new alias for apps.example.com: lb.ottawa.example.com.

hostname                 TTL     Type    value
----------------------------------------------
apps.example.com         3000    CNAME   lb.ottawa.example.com
lb.windsor.example.com   86400   A       192.168.1.100
lb.ottawa.example.com    86400   A       192.168.2.100

Some of you might already be thinking, why not set it to even lower values, like 5 minutes? The main problem with setting it to a lower value such as this is that it will increase the load on the DNS server. If you have a single DNS server with too low values, any kind of outage on DNS server will effect your users immediately, as their desktops will be making DNS lookups much more frequently than before. So in cases where you have low TTL settings, make sure you have at least two DNS servers at two different locations.

Please feel free to post your experiences related to DNS in the comments section. Any comments or suggestions are welcome!

No Comments Yet

Let us know what you think

Subscribe by email