Download (direct link):
There are many other issues with this approach. First, ping is not an accurate way to measure response time. Whenever there is traffic congestion, switches or routers may drop the ping packets. Second, ping may never reach the local DNS if it is behind a firewall. Many enterprises use firewalls in their network to control any
Selecting the Best Site
incoming traffic into their network. If the local DNS is behind a firewall in the secure area of an Enterprise network, the firewall may stop the ping requests. Third, response times in the Internet vary over time as the traffic patterns change. We may see a ping response of 50 milliseconds now, but see a 500-millisecond response time after 10 minutes. So, measuring the response time at one instant is not good enough to predict the response time over time.
DNS Reply Race
In this method, the GSLB load balancer sends the DNS reply to each local load balancer, instead of sending the reply to the local DNS. Each local load balancer modifies the DNS reply and places its own VIP as the first in the address list, then forwards the reply to the local DNS. The local DNS uses whichever DNS reply is received first and discards the subsequent replies. But the first reply received by the local DNS may not be associated with the lowest Internet delay between the local DNS and the site. Which reply is received first depends on three factors: the Internet delay between the GSLB load balancer and the local load balancer, how quickly the local load balancer modifies the reply and sends the reply to the local DNS, and the Internet delay between the site and the local DNS. Since our objective is to make the site selection based on the third factor, Internet delay between the site and the local DNS, we can minimize the effect of the first two factors by calibrating an approximate time that each local load balancer should wait before sending the DNS reply. The objective here is to get each local load balancer to send the reply at the same instant, so the first reply received by the local DNS will be from the site that has the lowest Internet delay to the local DNS. We can still only hope that all local load balancers are sending the reply at the same instant because this is still only an approximation. Just as in the case of ping response time, this method also selects the best site based on Internet delay at a given instant, as opposed to an average delay over time.
TCP Response Time
In this method, the GSLB load balancer does not pick the site based on the user response time for the first time a query from a local DNS is received. Instead, the GSLB load balancer simply uses any other policy to pick a site for the first time. As the users access the Web site, the local load balancer measures the delay between the TCP SYN and TCP ACK between the client and the Web site. The local load balancer is not generating any explicit traffic here. Instead, it simply measures the response time in-band, based on the natural traffic flow from a client to the site. This method works even if the local DNS is behind a firewall because the local load balancer is not initiating any traffic to the client. This is much like the in-band monitoring of server health that we discussed in Chapter 2, and it’s very efficient. Since each user establishes multiple TCP connections as the user accesses different Web pages, the local load balancer can measure response time over time, each measurement being one data point. To avoid collecting too many data points, the local load balancer can sample the response time for every tenth or twentieth connection. This method allows us to collect a good set of response times between an end user and a site that reflects the Internet delay. However, this method measures the response time after the GSLB selects a site for that end user.
Once the local load balancer collects these response times, it needs to transport this data to the GSLB load balancer since that’s the one making the site selection. This will require some kind of a protocol between the local and GSLB load balancers. When an end user is directed to a particular site, this method provides us response-time data between that site and the end user. In order to compare this response time to other sites, we need the response time between the same end user and each of the other sites. We can only get this data by sending the same end user to every other site. This can be very time consuming and only happens if the client accesses the Web site again at a later time, which triggers a DNS query. If we can find another end user, who is in the same network area as the first user, we can direct the second user to a different site and measure the response time. Thus, we can collect response times from a given user network area to different sites and compare which site is the best one.
Selecting the Best Site
It’s important to note that this method measures the TCP response time between the actual end user and a site. However, the GSLB load balancer receives DNS requests from local DNS, not the actual end user. We are dealing with two different IP addresses: the IP address of the local DNS and the actual end user. We need a way to correlate the two together so that the response times based on actual end-user traffic can be used by the GSLB load balancer to pick the best site based on the local DNS IP address. When a user makes TCP connections to a site, the local load balancer has no knowledge of the user’s local DNS. Similarly, when the GSLB load balancer receives the DNS request from the local DNS, it has no knowledge of the actual end user behind the local DNS. However, if we can group a set of users and local DNS servers together, we can use the response time we learned for one end user to the entire group. If the group has 5,000 users and 20 local DNS servers, we need at least one client from the group to access each site in order to collect the complete set of TCP response times for this group. The GSLB load balancer can then compare the response times to pick the best one. Defining a group that consists of a set of end users and local DNS servers is one of the biggest challenges in this method. Obviously, the effectiveness of this method depends on how well we can define a group of users that is in the same network area with the same Internet delay characteristics to different sites.