My Goodbye Letter to Engine Yard

That response sounds familiar but really isn’t that comforting or
realistic. You’re suggesting its rare to have instances disappear yet
it happened twice in less than 6 months. You’re suggesting I upgrade
to a more expensive server which may or may not solve the problem. You
don’t know what causes the instances to disappear but for some reason
a more expensive instance solves the problem. I’m not sure how those
dots are connected but it certainly doesn’t add up to me. You want me
to spend more money per month without any real justification,
quantitative data, or explanations.

To summarize:

* You don’t know why instances disappear.
* Its rare for instances to disappear.
* The solution is to spend more money per month.

From my perspective, you still haven’t resolved the underlying
problem. You suggest maybe hardware but hardware crashes twice in 6
months? That doesn’t sound realistic. The issue that concerns me is
that you haven’t addressed the underlying problem of why the instances
disappeared. You’ve only offered a solution that has no connection to
the actual problem: Get a better server. Of course, I could just go
straight to Amazon and bypass EY altogether if I were to get another
server.

Without really understanding why the instances are disappearing it
points to a bigger problem at EY than just this issue. It points to a
lack of concern for your clients, a lack of understanding of your own
technology, and a fundamental problem in how you attempt to solve the
problems presented to you. Its a ‘shuffle it under the rug’ approach
to solving your problems which doesn’t bode well with me. Your
solution is, simply put, one without any technical merit. Anyone in
sales could have given me the same answer. If I were to refer clients
to EY and they had other problems that you couldn’t figure out would
your solution be that they need to upgrade their servers?

I appreciate your delayed investigation into the issue but I
definitely won’t be moving more of my clients to EY anytime soon. I’m
actually glad you reached out to me recently because its given me the
motivation to review what I still see as this outstanding issue. As
such, I’ve decided to cancel my EY account.

> From: EY
>
> X and Y asked me to follow up with you on this.
> First of all, if someone on my team dropped the ball on getting back to you,
> we apologize. I will follow up with the engineers on my end.
> Re. instances disappearing – I found and reviewed the ticket for when this
> happened. It appears that it was no longer possible to SSH into your
> instance and when our engineer tried to terminate it, it was stuck in a
> shutting down state. I can’t answer why this happened in this particular
> case, but out of the several thousand AWS instances we manage on the EY
> AppCloud, occasionally we have observed that an instance can disappear or
> become unresponsive. This can happen, for example, if the physical server
> it is mounted on fails. The few times we have seen it, it has been for
> Small instances, which is what you had. Our default AWS instance size for
> EY AppCloud is now Medium, which we to date have not observed issues with.
> Hope that answers your question – and feel free to contact me for any
> additional questions about this.

  • David R

    Not sure if moving to larger instance size is an actual solution.

    A “High CPU Medium” EY Cloud instance failed on us Tuesday, on a high profile website; it was the Application Master but only partly failed, so there was no fallover (HAProxy continued to run, but the Rails app failed); the server became increasingly unresponsive and eventually we could not SSH into it (but still would not fallover).

    Because of the nature of the issue, I can only assume hardware failure; since it’s AWS there’s no way we’ll ever know.

    I did find, on the EY forums, that EY has a solution to this theoretically *very rare* problem — they recommend killing the whole environment and restarting it (from snapshots, if you can get a snapshot). That way you get a new instance — hopefully one without a hardware problem!

  • http://www.brianmcquay.com admin

    Great comment. That was my entire point that they seemed to not understand. They obviously haven’t addressed the root cause and are blaming it on hardware.

    http://blog.rightscale.com/2008/02/02/top-reasons-amazon-ec2-instances-disappear/

    This post from 2008 suggests that a lot of the problems are network related. That post is over 2 years old and the very same problems still haven’t been adequately addressed. Makes me wonder about the long term viability of using Amazon’s cloud at all. They obviously haven’t solved some of the more critical problems its facing like disappearing instances. What sort of business can risk losing their web instances multiple times a year?