Public cloud providers have built data-centers — regions — around the world. We measured the round-trip-time (RTT) latency between VMs spawned in these data centers. The numbers are useful when choosing disaster recovery data centers and also provide insights into the networks which string these globally distributed regions.
Short RTTs between cloud data centers are sweet. End-users appreciate the increased responsiveness of short RTTs and distributed systems such as geo-distributed key-value stores deliver higher throughput on low latency networks (read/write operations). A low RTT is also mandatory for many high-availability (HA) architecture patterns (e.g. hot standbys) and to ensure low recovery point objective numbers.
RTT is correlated to the physical distance between the communicating hosts (speed of light) and to the quality of the network (e.g. number of hops, packet switching quality, network congestion, etc.). Cloud providers have setup dozens of data-centers around the world and connected them up with high-quality links to beat the problem of high inter-continental RTTs for customers in different geographical regions.
In this article we will analyze the inter-region data-center latency of Amazon AWS and Google Cloud. We will also measure inter-region latency between 23 Azure data centers. We are going to answer these questions:
1. What are the average ping RTTs between VMs in different data centers of each cloud provider?
2. Which subsets of regions have “low RTT” neighbors that can be used for High availability and disaster recovery setups?
3. What are the RTTs between Azure, Google Cloud and AWS data centers?
We created virtual machines in a zone in each of the cloud provider regions and then ran ping tests from each VM to every other VM. Thirty pings were exchanged between each VM pair, and the data presented in this article is the average of those 30 pings.
Fig. 1 and Fig.2 show the ping RTTs (in ms) between different Google cloud and Amazon AWS regions respectively. The names of the regions are shown on the x- and y-axes. The darker the shade of the heat-map entry, the better (lesser) the RTT. The absolute values of the RTTs are also noted on the heat-map. The ping RTT from a region to all other regions can be read off the heat-map by looking at its row and and the corresponding destination region column.
Unsurprisingly, geographically close regions have the least RTT between them. For example, European data-centers have very low RTTs between them. These would be perfect candidates to setup say, a distributed data-store or a highly-available hot-standby database.
Asian regions have significantly higher RTTs between themselves than their European and North American counterparts. Perhaps the underlying infrastructure needs to be improved to bring them at par with their western counterparts.
Worse, there are also regions that have extremely high RTTs to any other region. For example, cloud regions in India — gce-asia-south1 and ec2aws-ap-south-1 (Mumbai) — and in Brazil — gce-sa-east1 and ec2aws-sa-east-1 (Sao Paulo) — respectively. We term these RTT orphans — the RTTs to all other regions are too high. That would be problematic for many high availability setups. We hope that more regions are built near these orphan data centers to give their users netter HA setup options.
We also measured the inter-cloud-provider ping RTTs between all the 29 AWS and Google data centers. The heat-map for these measurements is presented in Fig. 3. Interestingly the RTTs between some subsets of a mixture of AWS and Google regions in the same geographical areas are extremely low (see the European data centers in Fig. 3 for example). A user looking for extremely high-levels of availability may want to consider using both the cloud providers to get the extra redundancy (we think that Google cloud and AWS data centers share very little!).
ICMP pinging between Azure regions is not possible because Azure blocks all outgoing and incoming ICMP traffic. We instead used the paping_ tool to measure the TCP connection time between 23 different Azure regions. This metric is different from the ICMP RTT times measured for Google Cloud and Amazon AWS.
The list of Azure regions is available via this Azure cli command
az account list-locations | grep name
While 30 regions are listed by the above command, our Azure subscription could only provision in 23 regions (with the other 7 regions erring with codes like LocationNotAvailableForResourceGroup and SkuNotAvailable — we are happy to get inputs from any Azure experts on what the issue might be).
We spun up 23 Linux VMs (one per region) and ran the paping command between each pair of VMs. The heat-map below shows the TCP connection times between the VMs.
Here are the results:
The darker the shade of the heat-map entry, the better (lesser) the TCP connection time. The absolute values of the connection times are also noted on the heat-map. The connection time from a region to all other regions can be read off the heat-map by looking at its row and and the corresponding destination region column. The heat-map reveals that each Azure region has at least one neighboring region with very low TCP connection latency (except the Brazil South region). This makes Azure a desirable destination for enterprises looking to build highly-available IT solutions across different geographies. There are no “orphan” regions as was the case with Google Cloud and Amazon AWS.
We should note here that the number of regions is not indicative of the total size of the cloud providers — perhaps another cloud provider has many more hypervisors per region.
We do have a hypothesis on why Microsoft chose to build out so many regions: Microsoft’s flagship product — Office 365 — is a key use case for Azure data centers. Given its latency-sensitive nature, Microsoft may have decided to put down data-centers in many more geographies. Even if the demand for the Azure product is not strong in a particular region, local Office 365 subscribers will appreciate snappier Office performance!
Cloud providers have gone to great lengths to solve the high latency issues of remote data centers by building dozens of global regions for their world-wide client base. Some of these regions have very low inter-region RTTs and can be harnessed to create cross-region highly available IT solutions. However, many regions outside the western world suffer from relatively poor connectivity to other regions.
Sachin Agarwal is a computer systems researcher and the founder of BigBitBus. BigBitBus is on a mission to bring greater transparency in public cloud and managed big data and analytics services.