rosehosting sucks, redux
Last year, I posted about my experiences with RoseHosting.com. A bunch of people have read it (for some definition of “a bunch”). In fact, google analytics shows that the number one source of search traffic for my blog is from people searching for “rosehosting sucks.” (What’s that? Oh, what’s the next most popular search term? How to make a high resolution Life Poster).
As of now, it’s the first link that comes up when you google for rosehosting sucks (and on yahoo too!).
I assume that’s why I had an IM conversation with RoseHosting a few weeks ago.
I’ve been deciding whether to post the conversation, and finally decided it was okay to do so. They were responding to a blog post that posted email correspondence, so I think it’s reasonable that to argue that they were “on the record.”
Here’s the conversation. I’ll let you decode the names. This took place on August 31, 2007 at approximately 8:17 PM ET.
RH: Hi, are you there?ME: yes
RH: I am sorry to bother you, do you have a few minutes?
ME: who asre you?
RH: I am contacting you on behalf of RoseHosting.com
ME: ah
ME: ok
ME: whazzup?
RH: We saw your experience you had with us at http://jasonyanowitz.com/articles/tag/rosehosting
ME: ah
RH: and we were wondering if you are willing to remove that article and try our service once again free of charge?
ME: i’m not in the market for VPS right now (I’m using EC2 now).
ME: i appreciate the offer.
RH: ec2?
ME: (amazon cloud)
RH: I see
RH: Well, I assure you that you will not have that kind of experience again.
ME: what happened?
8:20 PM
ME: you can understand why we cancelled and were upset
RH: Well, it was a simple mistake on our part and the employee that caused it is long gone due to some other mistakes like that one.
RH: We have thousands of customers and we only had a few that complained with issues like yours, so as you can notice there are pretty much no other bad reports like yours anywhere on the Internet.
RH: Which means that the issues have been resolved.
ME: that sounds great
RH: As I said you can have a VPS, any size we offer absolutely free of charge as soon as you remove that article. you can verify my claim in person that way.
ME: ok. let me get back to you. who should i email? (i’ve got a newborn who’s crying right now, so i gotta go do a feeding)
RH: Just email [REDACTED—jy]
ME: ok
RH: And mention that you spoke with [REDACTED—jy].
I had three reactions to this exchange:
1. I guess it would suck if someone posted a negative review of your service and you fixed it and the review was still out there.
2. I didn’t get a lot of confidence that things had improved. On the other hand, I don’t feel like testing out their service again.
3. I’m not about to sell out to The Man. My reviews are not for sale!!! So don’t even try it, Canon.
Also, it is hard to find negative reviews. Or any reviews at all I don’t know if these reviews are real or an Astroturf campaign.
But I’m not alone in my complaints.
Of course, I think hosting providers are like cell phone companies—you hate whomever you’re with.
Anyway, I am posting this (and updating my original post) to remind people that time has passed since my original post and things may have changed at Rosehosting. You could even request a free month to gain confidence (since getting them to honor their SLA was non-trivial). Tell them that punk at jasonyanowitz.com sent you.
rosehosting sucks 10
Update:After reading this article, make sure you check out the followup.
Before purchasing major products or online services, I often google for “foobar sucks” first. Although it’s not decisive, I throw the results into the decision mix. This post is a brief contribution to that effort.
To wit, rosehosting.com is a very very very bad choice for those shopping for a Linux virtual private server. You may be tempted to use them because of their low prices. In this case, you definitely get what you pay for.
Purchasing and set up were a breeze. Run-time performance was abysmal. The VPS regularly (at least once a day) stops responding and your web sites are unavailable.
They provide no out of the box services for monitoring the health of your system. We were hosting a few sites on a vps (slideshowr.org, photocastr.com, etc.). We set up monitoring using HostTracker (which works very well for free).
For the first month or so, everything was good. Then we started getting pages every couple of days that showed downtimes ranging from 3 minutes to 2 hours.
So we installed sar to track what was going on. sar has some limitations inside a vps, but those turned out no to matter for our purposes. We had sar write out a snapshot of system activity every 30 seconds.
Then we got another page. Hazah! We checked out the sar output. At the time of the page, the sar log entries had stopped. (And the sar output file was mildly corrupted at that point!). sar output resumed at the same time the web servers became available again.
In other words, our VPS was getting no CPU time from the host system for minutes (or hours) at a time. At this point, we turn to rosehosting tech support. Wackiness ensues.
We send this email (incidentally, rosehosting has a very strong SLA for uptime—well, it’s strongly worded):
Hi Guys,We are seeing some serious downtime that is happening with disturbing frequency. Here are three samples of our sar output from the last week where we have a grand total of 82 minutes of downtime.
A. What is causing the downtime?
B. What can be done to make it stop?
C. Doesn’t this trigger a refund to us?
Thanks!
Sar output
9/12/06 ( Downtime between 14:00:46 – 14:34:26 or ~34 minutes )
14:00:16 all 22.03 12.17 7.85 39.99 0.00 17.96
14:00:46 all 10.54 1.52 5.23 60.21 0.00 22.50
14:06:15 all 0.33 0.01 0.54 98.92 0.00 0.21
14:07:35 all 11.29 0.07 4.62 83.92 0.00 0.09
14:18:55 all 0.56 0.02 1.09 93.56 0.00 4.77
14:21:57 all 1.56 0.02 4.94 80.26 0.00 13.21
14:32:18 all 7.87 0.08 2.53 5.95 0.00 93.94
14:34:26 all 11.75 0.87 5.39 76.44 0.00 5.55
14:34:56 all 14.12 0.59 6.18 46.20 0.00 32.90
14:35:02 all 3.13 0.72 7.15 37.99 0.00 51.02
9/10/06 (Downtime between 15:24:06 – 15:53:38 or ~29 minutes )
15:23:36 all 5.06 0.62 2.94 58.43 0.00 32.94
15:24:06 all 2.57 0.04 1.65 26.77 0.00 68.97
15:36:19 all 0.48 0.00 0.76 96.47 0.00 2.29
16:26:27 all 1362.23 1458.30 1428.11 11602.42 14592.95 0.00
15:50:50 all 7.14 0.07 2.29 461053591055.50 461053590835.50 24.82 15:53:38 all 8.16 0.03 3.22 88.59 0.00 0.00
15:54:08 all 14.69 0.25 4.38 72.30 0.00 8.38
15:54:38 all 15.96 0.02 3.80 60.05 0.00 20.16
15:55:01 all 11.77 0.00 5.21 68.92 0.00 14.10
15:55:08 all 4.44 0.00 12.31 38.87 0.00 44.37
9/07/06 (Downtime between 06:15:18 – 06:39:00 or ~19 minutes )
06:14:48 all 4.01 0.57 5.54 40.34 0.00 49.54
06:15:18 all 1.19 0.10 4.61 79.13 0.00 14.97
06:19:33 all 0.44 0.00 0.82 98.71 0.00 0.03
02:30:04 all 976.49 1043.46 1022.46 8304.21 10441.30 0.00
06:37:22 all 7.16 0.07 2.29 474935742649.19 474935742422.85 24.98
06:39:00 all 17.15 0.03 4.80 68.88 0.00 9.14
06:39:30 all 4.78 0.07 3.63 67.03 0.00 24.49
Since we were emailing sysadmins, we didn’t want to insult them by explaining the data. Our mistake. We got back this response:
We did not have any downtime lately. You should not be using sar as it will always produce incorrect results in a virtualized environment and is putting strain on your CPU utilization needlessly.A. There was no downtime as far as we know.
B. There was no downtime so I am not sure what can be done.
C. No. We guarantee 99.5% uptime which in a month comes down to about a little more than 3.5 hours of downtime.
Admin RoseHosting.com
Okay, fair enough, we didn’t give the full context for the sar output, the fact that we had independent evidence of a problem (host-tracker results, etc.). But at this point, we assumed they would take one look at our email, check their own logs, and say, “Oops, we do have a problem.” So we regrouped and sent this:
Hi,I’m sorry, I neglected to mention before that the reason we started looking at the sar output was because we are using host-tracker.com and it has been reporting downtime at various intervals. So, we started up sar to see what the machine was doing when our websites were unreachable. I agree that sar is not a perfect medium, but the fact that it is reporting exactly the same phenomena at exactly the same time as host-tracker seems to indicate that YES THERE IS A PROBLEM.
Do you run system monitoring software on the box that looks across all the VMs? Is someone’s VM going crazy at times that correspond with the sar output included below? Can you guys take a look at the various IP interfaces to our machine and see if they are passing a strange amount of traffic at the times I have mentioned. In general, is there anything you can do to help debug this?
Thanks for your help,
Four hours later we got a reply. To shorten this post a bit, I will include it along with our reply:
On Sep 13, 2006, at 12:29 PM, RH Helpdesk wrote:We will investigate this problem further.Thanks!
Yes, we have internal system monitoring software, but that is not available to the end user.Cool. Can you dump info about our machine for those periods of time?
We have absolutely no known issues on our network. Please do a traceroute and let us know where the packets are being dropped. If it’s not on our network, there is pretty much nothing we can do.Yes, if there is packet loss outside your network, we certainly don’t expect you guys to fix it by yourselves.
Thankfully, it looks like the problem is not a networking problem. host-tracker.com uses many machines over many networks to check for outages. It is reporting a complete loss of connectivity which seems to indicate that the machine has gone brain dead.
The sar output also looks as if the machine went totally brain dead for that period of time. If there was a networking problem, I would expect to see the sar output idle time go to 100% instead of reporting close to 0% idle and that it would have a very easy time dumping info into the /var/sysstat file instead of not having enough cycles to even dump a sar line every 30 seconds.
What exactly are you suggesting we should do? The situation on our network is perfectly normal. Your complaint is the only one in the queue. We have had 0 (zero) complaints in the past few weeks for the physical server hosting your virtual server.I’m not sure what you are asking. If I were you guys, I would have system status information on each of these machines similar to the sar output so that you could tell when one of the virtual machines was using the whole computer. Are you saying that you don’t have the ability to tell if one of our virtual machine neighbors has taken over the whole computer and if so, which neighbor?
I can’t speak to the lack of complaints. I don’t know how many other virtual machines are on the same computer as us and I don’t know what their applications are or if they are even monitoring their own uptime. We have two separate data points describing a brain dead machine. One of them is from a completely independent source: host-tracker.com. The events we are describing haven’t happened just once. They are a common occurrence on our machine. In fact, we installed sar in order to get better insight into what was going on.
Is the QOS you guys have signed up to only applicable to network availability or does it also apply to some minimal number of computer cycles?
Thanks again for looking into this,
Now, at this point, if I was rosehosting, I’d actually be concerned that something odd was happening on our systems and we weren’t capturing that information. Or at least I’d have put together a scenario to explain why host-tracker.com and sar register simulatenous problems but in reality everything was fine. Instead, we got another response explaining that they don’t share their monitoring data, an offer to switch us to another machine (why would that one be better run?) and a bogus technical explanation for what’s occuring:
There could have been an extreme situation or two where the IO was so high that the CPU was in “wait state” waiting on read or write to the hard drives. This means that even though there are plenty of CPU cycles available, the CPUs are unable to do anything and are idling waiting to go out of the “wait state”.
(This is bogus because it doesn’t explain the frequent occurence of the problem, the uniformity of our system behavior, the brain dead nature of the sar output, and so on…)
Our solution: switch providers. I suspect this was a good solution for rosehosting too—we were eating up more in tech support time then our service cost. I am guessing their pricing structure assumes a near 0 cost-of-support.
We went with Linode instead. They cost more, but they work. They appear to take steps to prevent over-subscription of their services. And the web UI for controlling your machine is fantastic. So, in summary: rosehosting drools, linode rules.
[The main source of traffic to this blog is from my friend Bijan but on the off chance you are reading this and using it to make a purchasing decision, I’d be grateful if you left a comment. And yes, I am aware of how rigorous and scientific this approach to polling is.]