Ten years ago most organizations still had their data center located next to headquarters. Then 9/11 happened and the east cost blackout happened and Katrina, along with heavy increases in energy prices and real estate prices, Sarbanes-Oxley storage requirements, HIPAA security requirements and suddenly it didn’t make a lot of sense to keep the data center in proximity to headquarters. Therefore, in the past 10 years the IT world has experienced a growing trend of more and more companies migrating their data centers to remote locations (south and central US seem to be popular destinations for hosting data centers for North American companies). I wrote a lot about the impact that such a move has on application performance, there is even a whitepaper here: http://www.shunra.com/predicting-the-impact-of-data-center-moves-on-application-performance-whitepaper.php
However this series of posts is about the day after a data center move. Now that the data center is remote, how does this paradigm shift impact the way we should develop, test, deploy, monitor and troubleshoot applications. I will try to cover as many topics as possible, but the main focus is still going to be around the role application performance management plays in this new paradigm.
I will start by covering the key reasons behind the performance impact that is experienced when applications are hosted in a remote data center? Those reasons are fairly intuitive, but it is important to understand them in depth in order to adequately plan for those new conditions. Two main things impact how applications perform when application servers are hosted in a remote data center vis a vis their application clients :
1. The performance of the network link between the client and the remote data center. This performance is defined by a set of network performance metrics that are application independent (for now we will ignore application aware networks, however the following basic concepts still hold in this scenario as well).
2. The application efficiency, specifically how efficient the application is when transferring data between the client and the remote server (and other tiers if applicable). This is an application attribute (and some time an attribute of a specific business process within the application). These attributes are application specific and are independent of any underlying network.
Let’s start with understanding the network performance metrics. Consider the following scenario:
An application is hosted in a NYC data center, with users in 2 places, some are in a NYC headquarters next to that NYC data center and some are in a remote branch that is located in San Francisco. The question is: “will the application perform the same for both type of users (local users in headquarters and remote users in SF)? In other words will the application be as responsive to the San Francisco user as it is to the NYC user?”
Well the obvious answer is NO, in most cases a NYC user will enjoy a faster more responsive application. What is less obvious is why? What is it about the network that causes remote users to experience a slower application than local users? The rest of this post will cover that question, future posts will address the application specific attributes. Once we cover that we will be ready to examine best practices in building applications for a remote data center.
When I ask this question during my training seminars, I get a variety of answers, many of them are the right ones, but I would like to address one wrong answer that keeps repeating itself for some reason.
Collisions – there is a general conception that collisions are common phenomena on the network which can explain any bad thing that happens to applications. The truth is that collisions are almost a thing of the past (on Enterprise LANs anyhow) and even when they happen they can’t explain why a remote user has a worst experience than a local user as both will experience a similar collision chance since collisions is a phenomenon that happens on local area Ethernet networks. If there are collisions on the Enterprise LAN it usually points to a configuration issue on a network device (like a duplex miss-match) but is still unrelated to the answer to our question.
Now to the right answers to the question, what is it about the Wide Area Network that causes applications to slow down:
There are 5 key conditions that predominately exist on Wide Area Networks and impact application performance, each in their own way:
1. Network Latency – the time it takes a packet to traverse from a source to the destination across the network, measured in milliseconds [msec]. A typical WAN link will introduce latency in the range of 10msec – 500 msec.
2. Bandwidth constraints – how fast can data be processed by the network link, measured in bits per second [bps, Kbps, Mbps, Gbps]
3. Bandwidth utilization (background traffic) – the percentage of bandwidth that is utilized by traffic that already exists on the link (background traffic).
4. Jitter – the deviation of the inter packet gap of sequential packets across a network link, it is a result of the deviation of the network latency and is sometimes used interchangeably with that standard deviation, measured in milliseconds [msec].
5. Packet Loss – the chance to drop a packet across an end to end network link, measured in %. Sometimes presented as the inverse metric called packet delivery rate.
The above are called network impairments, you can click on each one of the links to learn more about them and their causes.
Network impairments are performance conditions that inhibit the flow of data across a network. Each impairment type has an impact on the performance of business applications and network services. Some applications may be very sensitive to network impairments and some may be almost network agnostic. Sorting applications based on their network sensitivity is one of the important steps in performance engineering
In the next post we will discuss how application design can impact performance across the network. But in the mean time I would like to introduce a question for the group:
“We identified network latency as one of the key reasons that impact application performance; we also said that a typical WAN link will introduce 10 – 500 msec of latency. The question is, why does network latency have a big impact on application performance? surely a user doesn’t notice an increase of a few msec in response time, even 500 msec = ½ second goes by in a flinch. So think about it and let me know what you found based on your experience, why does network latency have such a big impact on application performance?”
Until next time,
Amichai Lesser