Tuesday, May 08, 2018

A migration WE were put through......

This is a bit of a departure for us.

Most of the time when we write about migrations it is us at Sumatra enabling a migration from a legacy calendar server into Microsoft Exchange for someone else.

Recently though WE were the recipients of a migration.  In this case of our web site and associated on-line presence from one hosting company to another.  

Initially this was not our choice as IX Webhosting got bought by Site5.

We had a completely miserable experience! We want to share the points of failure.  Doing our own migrations since 2001 (really?  It's been that long?) we've battle-hardened our own processes and documentation to deal with all of these issues -- but the fact it happens in a scenario that was much simpler than a full-state calendar server migration shows how easy  it can be to slide into the Upside Down.  We'll first talk about the timeline, then the lessons learned.

Timeline  If you want to see a rough timeline as it played out in Twitter -- check out our feed.

This email was our first warning this was happening:

Note the up-beat tone, the promise that all of our content should be there for us, and the lack of any warning to back-up or how to escalate if/when things go wrong

Yeah....  already the Spider-sense was tingling.

But I figured I'd get another communication telling me WHEN they were transferring our site since that's kind of a big deal.


"Ruh Roh Raggy!"

Our first indication was when our email, site, and FTP access went down on Tuesday April 17, a week after this missive.

We were able to get email up pretty quickly thanks to our knowledge that the MX records and DNS servers needed updating, and we were almost... kinda... (*sort of...*)  not not really anywhere near getting our web site running again.  We just ignored other things like FTP by this point.  

To be fair our data and settings DID stay the same (as promised) -- they just weren't set in any of their systems properly.  I'm sure years of weasel-like business practices went into that intro letter without considering the customer impact.
We waited three business days for them to resolve the problems.  But by Friday morning when they misconfigured our SSL certificate, causing visitors our site to be redirected to a site in Peru, we pulled the rip cord, fired Site5 and went with TMD Hosting who got us up and running in really good time.

How bad was Site5?  Seven days after we'd severed relations with them, gotten our site up and running on TMD, and propagated our new DNS, Site5 support contacted us to tell us they were still working on the problem.  Yeah, guys, you keep us in that loop!

Lessons Learned (from what went wrong):

  • There was no honest communication from the get-go -- and I mean zero.
  • There was no public time-line
  • There was no back-out plan
  • There was no opportunity to test.
  • There was no parallel hardware available (literally copy data to new machine then decommission the off old machine)
  • There was no ability for quick response and communication
Happy to say that this debacle did not affect the main migration going on at the time.  Why? Because we're big in up-front communication, warn people that things can (and do) go wrong, create a plan that deals with things when they do.

1 comment:

zyg said...

AND as a coda to the Keystone Kops organization of Site 5: Today, November 24 I got a notice they were suspending our service for non-payment. No d'uh, guys! Didn't telling you to f-off , moving our site, and cancelling our account tell you you weren't gonna get more money???