It seems common for planned server upgrades to result in hours of unplanned downtime, and lengthy post incident reports explaining why large chunks of infrastructure disappeared off the web. We’re very pleased to report that none of this proved necessary during our recent server upgrade.
World Text recently spent many weeks preparing for an upgrade of the main API server. Backups were made, transfers were tested, migration was tested, time and again. Much time was spent choosing the time to migrate so as to have minimal impact on volume customers, which is rather challenging with a 24/7 service.
Significant efforts were made to optimise the migration scripts and process to minimise downtime, and so achieve migration in the shortest time practicable.
When time came for the actual switchover, everyone was prepared and waiting, and the data migrated smoothly. Nagios briefly turned red across the board.
Actual downtime of approximately two minutes (backup and failover systems continued to function as normal).
A minor issue was found with email, but that automatically failed over to the secondary server, so no messages were delayed more than a few seconds.
One customer noticed the server wasn’t there in the brief 2 min outage, but his messages were delivered successfully via the failover systems.
So, there you have it. A server migration of our core infrastructure, and nothing broke. Two weeks later and everything is functioning smoothly without interruption.
Sorry for the dull read! Though we were rather relieved it was so.