Recently I’ve been releasing a new build, as usual utilizing a blue-green deployment by switching the DNS record to point to the load balancer of the previously “spare” group. But before I switched the DNS, I checked the logs of the newly launched version and noticed something strange – continuous HTTP errors from our web frameworks (Spring MVC) that a certain endpoint does not support the HTTP method.
The odd thing was – I didn’t have such an endpoint at all. I enabled further logging and it turned out that the request URL was not about my domain at all. The spare group, not yet having traffic directed at it, was receiving requests pointed at a completely different domain, which I didn’t own.
I messaged the domain owner, as well as AWS, to inform them of the issue. The domain owner said they have no idea what that is and that they don’t have any unused or forgotten AWS resources. AWS, however, responded as follows:
The ELB service scales dynamically as traffic demand changes, therefore when scaling occurs, the ELB service will take IP addresses from the AWS unused public IP address pool and assign them to the ELB nodes that are provisioned for you. The foreign domain name you see here in your case, likely belongs to another AWS customer who’s AWS resource is no longer using one of the IP addresses that your ELB node now has as it was released to the AWS unused IP pool at some stage, their web clients are very likely excessively caching DNS for these DNS names (not respecting DNS TTL), or their own DNS servers are configured with static entries and are therefore communicating with an IP address that now belongs to your ELB. The ELB adding and removing IPs from Route53 is briefly described in [Link 1] and the TTL attached to the DNS name is 60 seconds. Provided that clients respect the TTL, there should be no such issues.
I can simply ignore the traffic, but what happens if I’m in this role – after a burst my IP gets released, but some client (or some intermediate DNS resolver) has cached the information for longer than instructed. Then requests to my service, including passwords, API keys, etc. will be forwarded to someone else.
Using HTTPS might help in case of browsers, as the the certificate of the new load balancer will not match my domain, but in case of other tools that don’t perform this validation or have it cached, HTTPS won’t help, unless there’s certificate pinning implemented.
AWS say they can’t fix that at the load balancer, but they actually can, by keeping a mapping between IPs, owners and Host headers. It won’t be trivial, but it’s worth exploring in case my experience is not an exceptional scenario. Whether it’s worth fixing if HTTPS solves it – probably not.
So this is yet another reason to always use HTTPS and to force HTTPS if connection is made over HTTP. But also a reminder to not do clever client-side IP caching (let the DNS resolvers handle that) and to always verify the server certificate.