Recovered from our worst outage in 11 years

We’ve had some issues over the past couple of weeks, including some dashboard downtime, brief periods where links could not be created, and two weeks of click data not showing up. This was caused by changes at our web host, Rackspace. We have recovered from it now, and missing data will fill in later this week.

Here’s what happened:

We have been hosted with Rackspace cloud sites for most of the history of They have a cloud database service and a cloud site service that allows to function and scale without requiring paying for ongoing server administration costs. This arrangement makes viable at the prices we charge.

The problem was the distance between the dashboard on the cloud sites moved to LiquidWeb and the data being written to cloud databases still on Rackspace. It was taking seven seconds to write a single click to the database.

They recently sold the cloud site portion of their business to LiquidWeb, which is where the dashboard lives. However, they did not sell the cloud databases where the data lives. There was plenty of warning that it would happen and since everything was supposed to work the same and they were migrating all our sites, we just let the experts do their thing.

Unfortunately things didn’t work after they migrated the dashboard and client domains.

First, the dashboard went down because we now needed a load balancer to allow outside connections because the dashboard and databases were on different networks. That was no big deal, and we got it fixed shortly after discovering the issue.

Second, once connected, clicks weren’t appearing. Support staff for both companies were giving us different information depending on who we talked to, some telling us the sites hadn’t even been moved yet. Rackspace mostly just washed their hands of it saying it wasn’t their problem. Liquid Web’s support was significantly better, but it still took several days just to figure out what was going on.

The problem was the distance between the cloud sites on LiquidWeb and the cloud databases still on Rackspace. It was taking seven seconds to write a single click to the database. That may not seem like a long time, but writes should be measured in milliseconds and when you’re sending thousands of clicks at 7 seconds a click, none of them get there at all.

That’s why the analytics haven’t been showing up since this move, just over two weeks ago.

We were left with two choices. One was to move all the databases to an unknown database system, which could introduce all sorts of new problems. That seemed too risky, so we decided to go with the more difficult choice. We had to setup and configure a new server on Rackspace, and move a few hundred client domains, 8 databases that weren’t on cloud databases, and much more.

So, for the past two weeks we’ve been working on that with part-time help from two system administrators as they were available. Yesterday we flipped the switch and two of us spent nine hours straight putting out a few fires. And now things appear to be working correctly.

The good news is that we designed to handle downtime on the dashboard without losing clicks. The clicks that didn’t get written to the database over the last two weeks are sitting on the LiquidWeb servers and we’ll be moving them over during this week, so you should see them appear in the dashboard soon.