Everything fails all the time — build for resiliency
Any sufficiently advanced technology is indistinguishable from magic. — Arthur C. Clarke
This is the second part of my series on building a multi-region, active-active architectures. In the previous post, we talked about the Quest for Availability since it seats at the heart of such design.
In this post, I will discuss the why and the how of designing multi-region, active-active architectures.
Bear in mind, that building and successfully running multi-region, active-active architecture is hard, so this post does not pretend to cover everything involved with doing that. Instead, it should be treated as an introduction to this art.
Why bother with multi-region architectures?
Good question and glad you asked! There are basically three reasons why you would want to have a multi-region architecture.
- Improve latency for end-users
- Disaster recovery
- Business requirements
1. Improve latency for end-users
The idea is very simple and is related to the speed of light, which no one has yet managed to crack. The closer your backend origin is to end-users, the better the experience.
However, even if CloudFront solves the problem for much of your content, some more dynamic calls still need to be done on the backend, and it could be far away, adding precious milliseconds to the request.
For example, if you have users in Europe but your backend is in US or Australia, the added latency is respectively approximately 140ms and 300ms. Those delays would be unacceptable to start with for many popular games, banking requirements or interactive applications.
Indeed, latency plays a huge part in customer’s perception of a high-quality experience and was proved to impact the behaviour of users to some noticeable extent: with lower latency generating more user engagements.
This observation has also been confirmed several times by few large companies:
- Amazon: 100 ms of extra load time caused a 1% drop in sales (Greg Linden, source here).
- Google: 500 ms of extra load time caused 20% fewer searches (Marissa Mayer, source here).
- Yahoo!: 400 ms of extra load time caused a 5–9% increase in the number of people who clicked “back” before the page even loaded (Nicole Sullivan, source here).
As technology improves and especially with the advent of AR, VR and MR, requiring even more immersive and lifelike experiences, developers now need to produce software systems with stringent latency requirements. Therefore, having locally available applications and content is becoming more and more important.
2. Disaster recovery
What followed was, in my opinion one of the most significant and inspiring engineering work done by any AWS customer— achieving regional resiliency.
Netflix is designed to handle failure of all or part of a single availability zone in a region as we run across three zones and operate with no loss of functionality on two. We are working on ways of extending our resiliency to handle partial or complete regional outages. Adrian Cockcroft. (link)
Indeed, if your application, like Netflix, is composed of multiple different services and if one of these services, critical to your application, experiences issues, you might want to shift the traffic to a healthy region to prevent angry customers.
It is not possible to prevent threats to availability, but it is possible to mitigate them.
Netflix was not the only one to learn from this ELB failure of 2012— AWS too has learned from that failure and has taken steps into reducing the blast radius of potential failures and today’s implementation of the control plane is more robust.
Failures always happen and when they do, it is important to work both on reducing the occurrence of problems and also to work on mitigating the severity of impact of problems.
Educate by example: in this meeting I talk about "everything fails all the time" at which time the power fails for the building we're in...
3. Business requirements
Finally, some customers may have business requirements to store data in distinct regions, separated by several hundreds of kilometres. Therefore, those customers have to store data in multiple regions. This is becoming more and more common since AWS has now 18 regions globally, spread between the Americas, Asia Pacific, Europe, Middle East and Africa.
How to build multi-region active-active architecture
Simply put, a multi-region, active-active architecture gets all the services on the client request path deployed across multiple AWS Regions. In order to do so, several requirements have to be fulfilled.
- Data replication between regions must be fast and reliable.
- Having a global network infrastructure to connect your different regions.
- Services should not have local state — they must be stateless, and state should be shared between regions.
- Synchronous cross-regional calls should be avoided when possible. Applications should use regional resources.
- DNS routing should be used to permit for different scenarios.
Let’s take a closer look at these requirements.
1 — Reliable Data Replication
Let’s talk a little bit about the CAP theorem. The CAP theorem states that it is impossible for a distributed system to simultaneously provide more than two out of the following three guarantees: Consistency, Availability and Partition Tolerance. But especially that in the presence of a network partition, one has to choose between consistency and availability.
This means that we have two choices: giving up consistency will allow the system to remain highly available, prioritising on consistency means that the system might not always be available.
Since we are in building a multi-region architecture and are optimising for availability, we have to give up consistency — by design; This also means we need to embrace asynchronous systems and replication.
For distributed data stores, asynchronous replication decouples the primary node from its replicas at the expense of introducing replication lag or latency.
This means that changes performed on the primary node are not immediately reflected on its replicas — the result of this lag created what is often referred to as eventual consistency. When a system achieves eventual consistency, it is said to have converged, or achieved replica convergence.
To achieve replica convergence, a system must reconcile differences between multiple copies of distributed data. It can do so by doing the following reconciliations:
- Comparing versions of the data.
- “Smart” comparison of the data itself.
- Choosing an arbitrary final state.
The most common approach to reconciliation, and also the one used in most systems, including DynamoDB Global Tables, is called the “last writer wins”.
The effect of asynchronous replication must be taken into consideration when designing applications since besides having architectural consequences, it also has some implications for the client user-interface design and experience.
Such implications are that interfaces should be completely non-blocking. User interactions and actions should resolve instantly without the need to wait for any backend response — everything should resolve itself in the background, asynchronously and transparently to the user.
No loading messages or spinners staying forever on the screen. Requests to the server should be entirely decoupled from the user interface. This “trick” will also make users believe the application is fast, even if in reality, it isn’t — hiding network latency and even full-service failure.
This is often referred to as graceful degradation and it also used by Netflix to mitigate certain failures.
“We might fail a service that generates the personalized list of movies that are shown to the user, which is determined based on their viewing history. When this service fails, the system should return a default (i.e., nonpersonalized) list of movies.” Netflix — Chaos Engineering
2 — Global Network Infrastructure
A few years ago, when deploying multi-regions architecture, it was standard practice to setup secured VPN connections between regions in order to replicate the data asynchronously.
While deploying and managing those connections has become easier, the main problem was that they went over the Internet, therefore were subject to sudden change in routing and especially latency — making it difficult to maintain a consistently good replication.
To overcome that problem, James Hamilton, Vice President & Distinguished Engineer at AWS, announced that AWS was now providing a high bandwidth, global network infrastructure powered by redundant 100GbE links circling the globe.
This means that AWS Regions are now connected to a private global network backbone, which provides lower cost and more consistent cross-region network latency when compared with the public internet — and the benefits are clear:
- Improved latency, packet loss, & overall quality.
- Avoid network interconnect capacity conflicts.
- Greater operational control.
“If you’ve got a packet, the more people that touch it, the less likely it is to get delivered.” James Hamilton, re:Invent 2016.
3 — Stateless Applications
I previously wrote about the local state being a cloud anti-pattern. This is even more true for multi-region architecture. When clients interact with an application they do so in a series of interactions called a session.
In a stateless architecture, the server must treat all client requests independently of prior requests or sessions and should not stores any session information locally. So given the same input, a stateless application should provide the same response to any end-user.
Stateless applications can scale horizontally since any request can be handled by any available computing resources (e.g., instances, containers or functions).
Sharing state with any instances, containers or functions is possible by using in-memory object caching systems like Memcached, Redis, EVCache, or distributed databases like Cassandra or DynamoDB (more later) depending on the structure of your object and your requirements in terms of performances.
Netflix, already in 2013, famously talked and wrote about testing Cassandra in multi-region setup; writing 1 million records in one region of a multi-region cluster, followed by a read, 500ms later, in another region, while keeping a production level of load on the cluster — all that without any data loss.
4 — Use local resources and avoid cross-regional calls
As mentioned previously, preventing increased latency is critical for applications. Therefore, it is important to avoid synchronous cross-region calls and always make sure resources are locally available for the application to use, thus optimising latency.
For example, objects stored into an Amazon S3 bucket should be replicated in multiple regions to allow for local access from any region. Luckily, Amazon has recently implemented the feature called cross-region replication for Amazon S3. Cross-region replication is a bucket-level configuration that enables automatic, asynchronous copying of objects across buckets in different AWS Regions.
This local access of resources also applies for databases. To support this scenario, AWS launched, already in 2012, Cross-Region Read Replicas for Amazon RDS for MySQL followed by MariaDB, PostgreSQL and eventually Amazon Aurora.
Separating the writes from the reads across multiple regions will improve your disaster recovery capabilities, but it will also let you scale read operations into a region that is closer to your users — and make it easier to migrate from one region to another.
The main restriction with this pattern is that all critical writes traffic must go to one single master, in the region of origin.
Please remember that in order to work with cross-region read replicas, you must embrace eventual consistency as discussed above — due to the replication of data being asynchronous.
Note: Using RDS, you can monitor this replication lag by using Amazon CloudWatch and raise an alert if it reaches a level that is unacceptably high for your application.
To prevent having cross-region writes on the database, AWS recently announced at re:Invent 2017 the launch of a new feature: multi-master capability to Amazon Aurora, first within a single region but multi-region by the end of 2018.
Multi-Master clusters improve Aurora’s already high availability. If one of your master instances fail, the other instances in the cluster will take over immediately, maintaining read and write availability through instance failures or even complete AZ failures, with zero application downtime.
Note: Amazon Aurora Multi-Master is now available in Preview for the MySQL-compatible edition of Amazon Aurora. You can sign up to request participation.
But that’s not all, something more was announced at re:Invent 2017 — DynamoDB Global Table — a fully managed, multi-region, and multi-master database, allowing for the first time to build globally distributed applications.
A DynamoDB global table consists of multiple replica tables, one per region of choice (currently 5 regions are supported), that DynamoDB treats as a single unit. Every replica has the same table name and the same primary key schema.
Applications can write data to any of the replica tables; DynamoDB automatically propagates the write to the other replica tables in the other AWS regions.
Note: DynamoDB is the same database that is used at Amazon.com during Prime Days, and let me give you some numbers:
Amazon DynamoDB requests from Alexa, the Amazon.com sites, and the Amazon fulfillment centers totalled 3.34 trillion, peaking at 12.9 million per second. (source)
5 — DNS routing
In order to route traffic between regions, we need to use a Domain Name System (DNS) which support configurable routing policies.
Amazon Route 53 provides highly available and scalable Domain Name System (DNS), domain name registration, and health-checking web services. But most importantly for our use case, it supports traffic flow through a variety of routing policies, all of which can be combined with DNS failover.
- Geolocation routing policy — Use when you want to route traffic based on the location of your users.
- Geoproximity routing policy — Use when you want to route traffic based on the location of your resources and, optionally, shift traffic from resources in one location to resources in another.
- Latency routing policy — Use when you have resources in multiple locations and you want to route traffic to the resource that provides the best latency.
- Multi-value answer routing policy — Use when you want Route 53 to respond to DNS queries with up to eight healthy records selected at random.
- Weighted routing policy — Use to route traffic to multiple resources in proportions that you specify.
In the post, we learned that in order to build a multi-region, active-active architecture, all the services on the client request path must be deployed across multiple AWS Regions, that we must embrace asynchronous designs and architectures and that we must build applications that are fully stateless.
Of course, we should leverage services like Amazon S3 or DynamoDB that are highly available and that benefit from the global network build by Amazon around the globe to have reliable replication of data. Finally, we also discussed the use of traffic flow in order to support different routing policies between AWS Regions.
Please do not hesitate to give feedback, share your own opinion or simply clap your hands. In the next part, I will go on and build a multi-region, active-active backend — but will do that serverless style :) Stay tuned!