Today, we experienced an unscheduled outage of a little over 2 hours. That is highly unusual for Pure Chat and we thought it’s worth providing a little more explanation…

Some Background

Over the past 5 months, we have been carefully and methodically learning and creating plans to migrate Pure Chat’s entire infrastructure from dedicated servers, and transition them to virtual machines hosted on Amazon’s AWS network. On December 29th, we finally pulled the trigger and made this huge move. The transition was naturally quite a large undertaking. Pure Chat has oodles of services, sites, and supporting infrastructure to serve up our home page, dashboard, mobile apps, visitor tracking, and APIs. All of this technology works together to deliver an awesome seamless experience. Our transition to AWS opens up completely new possibilities for scalability, reliability, and new features for Pure Chat, so it’s super exciting!

What Happened Today

Managing infrastructure and software inside of AWS has some unique challenges that we are still working through. Today, we encountered a perfect storm of challenges. We deployed a new version of our software to increase database query performance on a critical piece of code. Even though the new code had been tested in our staging environment, Amazon Elasticbeanstalk was not happy with our change in production, to say the least. It deployed the new version of our code to some of the servers, then noticed that they were not healthy, then tried to roll back, unsuccessfully. Our servers were in various states of being unhealthy, on one of three versions of our code. This started a chain reaction that caused some servers to be terminated for being marked unhealthy. Meanwhile the healthy servers that were running were being overloaded by serving all of the requests that were no longer being routed to the unhealthy servers. That resulted in healthy servers becoming unhealthy, which would then also be terminated.

Resolving this chain of issues ended up making Pure Chat unavailable for around 2 hours. 🙁 This certainly pains us at Pure Chat! We want you to be able to depend on our products and services 100% of the time!

On behalf of the Pure Chat dev team, please accept our apologies!