2008 August 23

Octane.3v0.net HTTP Upgrade

By Toby H.

We are upgrading Apache / PHP on Octane tonight to add PDO support. This will cause a very short outage while the update completes, sorry for any inconvenience this may cause.

2008 August 1

USDC (Resolved)

By Tim M.

21.37:  (Official note from the USDC) Duration: August 1 2008 04:25AM - 10:45AM EST -As you may of noticed this morning starting around 3:30AM we had multiple BGP Sessions drop with one of our upstream providers (Level3). This caused some instability in the network while routes were re-routed. This issue was also compounded by a large DDoS attack targeted at our core networking system. As a result of the attack, troubleshooting of the initial route related issue was made much harder, thus extending time to get things sorted out. The network team was able to resolve the issue completely and everything should be running now. We will be looking into strengthening our internal policies to help alleviate issues such as these in the future. Thank you for your patience during this time.

12.36: The USDC is still on/off at random points. Unfortunately out of our control. It will stabilize over time, we’re seeing outages of upto a minute every 5 - 60 minutes (It’s quite random).

11.24: Ohh what a day ;) US Data centre just had 100% ping loss for approx 10 minutes, unfortunately this is out of our control. It’s back now however there is approx 10% ping loss on and off, this should tidy itself up very quickly.

2008 August 1

Firestar (Resolved)

By Tim M.

07:47 - Just to let you know, we’ve finished working on the firewall, we tweaked a few settings and will keep an eye on it. If it recurs (hopefully not) we know immediately what the issue is so can fix it quickly.

07:24 - We had some problems with a few ports between the last post and now, this has been resolved. Everything seems to be working okay, however we continue to work on the server to stop this from repeating.

07:06 - Just to let you know, the server is back, it locked itself out so we’ll be doing work today on the Firewall which might cause a couple of minute long (hopefully no longer) outages at random periods. Nothing to worry about, just letting you know in case your connection drops for a moment.

06:41 - No reboot necessary, we’re in and back online - Firewall issue.

06:29 - Problem seems to be firewall setting, however engineer on the floor cannot access the box successfully via KVM so we might have to reboot.

05:50 - We are experiencing problems with Firestar this morning. We will let you know as soon as there is new news on the situation. We have seen multiple occurances of 100% ping loss.

2008 July 29

Rampage (Resolved)

By Tim M.

22:38: Monitoring has reported that the load on rampage is very high, we are currently investigating, this is causing a temporary drop out of every service on that server.

22:44: Load has dropped, it was caused by several thousand http connections being directed at a specific site. Thus the site is suspended and the server is returning to it’s usual load, all services were restored at 22:41 - total downtime 10 minutes 50 seconds. (It hiked to 220, should be about 1.5).

We will rectify the issue with the customer involved so this doesn’t repeat.

2008 July 17

Firestar (Resolved)

By Tim M.

22:00 - Tim: We are commencing the scheduled PHP Recompile & PHPSuExec install on Firestar, maintenance window begins now and will hopefully end by 00:00. We will update the blog if we have any issues which incur server down time. If you experience Internal Server Error / Error 500 please read the email we sent out, alternatively open a ticket.

22:16 - Tim: Server is pretty busy, monitoring has detected HTTP errors, sites we’re checking are working fine right now.

22:21 - Tim: Monitoring says server has recovered.

22:30 - Toby: Everything is going as normal, minor issue seen by monitoring as above was to be expected and had no effect on customer websites.

23:20 - Toby: We have now completed recompiling PHP on the server and will now be testing to make sure everything is still working.

18/07 @ 10:53 - Tim: As expected we have a lot of tickets coming in with Internal Server Errors - we’re currently fixing issues on average within 60-90 minutes, please refer to the email we sent out if you want to tackle it yourself - quick summary below:

Goto your cPanel. Click Error Logs - this will give you a clue as to what the problem is, look for key words, often it will mention permissions issues, for that you change any file or folder that’s 777 to 755 or 644 or lower using your FTP client or the cPanel file manager.

If you want to speed up the process when opening a ticket please paste some of the last entries from your error log inside your ticket, this will save us looking and get your issue dealt with faster - you don’t have to, just worth mentioning :)

You’ll also see possibly issues with your .htaccess  - Move php_value’s out of .htaccess and recode them to php.ini

Some people will also see file ownership problems - this is because some scripts uploaded direct via the server have the ownership of ‘nobody’, you need to open a ticket to fix that as you don’t have sufficient permissions to change it.

The above will fix 90% of issues with regards to Internal Server Errors - this is all perfectly normal, a 1 time thing, and we are doing this to make sure your server is secure, it is vital to do this for the longterm uptime of your website and our servers.

Best regards,

Tim.

2008 July 12

Venom (Resolved)

By Tim M.

@ 16:11

Everything seems to be stable now, all servers are back online.

@ 14:59

I talked with the lead engineer - part of the datacenter over in the US had a brown out, a few racks seem to be without power, including the one with Venom in. I am constantly hassling and will update this post when I have more info - as you can imagine it is slightly hectic at that datacenter with techs running around fixing things.

Further update as soon as I get it — Tim.

@ 14:45

Hello.

The datacenter in the US experienced some issue a short while ago, all boxes went offline for approx 5mins, 2 of 3 are back (Namely Inferno and Vector) but Venom is still MIA.

Unfortunately the datacenter’s websites are down too, their phone lines are busy and nobody is available on AIM to get things straightened out for Venom customers.

We will continue to repeatedly attempt to contact the datacenter to get Venom online and will report back here every hour. (Hopefully a 2nd update won’t be needed!).

I’ll keep you in the loop as ever.

Best wishes,

Tim.

2008 July 2

FREE HOSTING!!!

By Tim M.

Get your hosting account for *FREE* until August 25th!

Enter code FREE when you sign-up and pay £NOTHING inc VAT for your shared or reseller hosting account, it will then rebill at the normal rate from August 25th.

The longer you wait the less time you’ll get free so hurry up! :)

By the way, there is NO SETUP FEE at Evohosting.

This offer is valid for any Shared or Reseller hosting plan when paying Monthly.

Offer ends soon!

2008 June 22

40% Discount until July 1!

By Tim M.

For a very limited time (until 00:00 July 1 2008) to celebrate the fact that Evohosting is now FOUR YEARS OLD we are giving all new customers a 40% one time discount on Home, HomePro, Business and BusinessPro accounts!!!!!!

This includes monthly, quarterly, semi-annual and annual payments.

Enter the code: BIRTHDAY08 when you sign-up!

You’ll still get your free domain when you sign up, no catches, just us being nice, make us one promise though - do something great with your website and make us proud, ok? :)

2008 June 2

Prepare your website

By Tim M.

Most web hosts don’t touch on this subject as downtime is our least favourite term in this industry and it puts off clients when they’re viewing the company site.

You may have heard about the explosion at a data centre in Houston, lots of my favourite sites were knocked off line due to this such as b3ta.com and… erm… ok… I live on b3ta when I’m not replying tickets … Anyway, around 9,000 servers and who knows how many websites were taken offline due to this, many of which are facing 50/60+ hours downtime at the time of writing this.

I have been keeping up to date on the happenings, reading a lot of posts on the forums of some of the companies affected. I notice many, many people complaining about the outage and how it has affected their business.

Many of these people (not all admittedly) didn’t have to suffer downtime, they just hadn’t made any form of disaster recovery plan and when their sites wouldn’t load they took absolutely no responsibility for their own negligence or lack of knowledge. If you are making money from your site then it is in your best interest to learn how everything works and to make plans for the worst. You do this for every business, right? I worked for Mattel for some time as I was starting Evo, they had their own back up office in case their UKHQ burnt down, all their data is sent offsite weekly, now why aren’t you doing similar for your web server? It’s common sense.

I noticed today how even staff at one company mention to their customers on the forums that how they should have had a backup plan if their site is valuable, I wholeheartedly agree with this stance from a business owner point of view, unfortunately for those customers it is too late this time and they’ll just have to learn from their mistake. The worst didn’t happen for them, their sites are merely offline (I say merely, I know this is life or death for some people), but all data is intact.

Now imagine if that building had burnt to the ground, all data was lost and all they had to show was some crispy fried servers.

If that had happened I would imagine some of those hosting customers could go out of business from this purely due to poor/no disaster planning, and of course I wouldn’t be able to check out the awesome drunk cheeseburger eating Hoff animated GIFs and LOLCATS style pictures at b3ta any more, that would indeed make me feel quite sad.

Accidents happen no matter how good a data centre is, no matter how good the equipment is, no matter how good the staff are, no matter how much things are checked and no matter how well we as hosts practice our disaster recovery procedures. It is inevitable that at some point something will go wrong, especially when your building uses as much power as a small town to stay running and needs generators the size of a plane to operate when the power goes out.

The explosion at H1 is by no means the first data centre problem in the world, every single web host has some problem which occurs at one point or another, whether it be those pesky hackers, server configuration issue or lack of power / network / air-con.

We’ve had a couple of instances of 7 - 12 hour FSCKs, rare as they are, you can read about them on our blog, they can and do happen to every hosting company at some point, no matter what the marketing spiel says.

If your business relies on your web site / email to stay alive and you haven’t got a disaster recovery plan yet you should take some time out today to sort this out. I can’t emphasize how important this is.

Here are some hints on starting out with your disaster notification & recovery plan, these are by no means exhaustive but should give you some form of insight into some of the things you should be thinking about.

Monitor your website - As a web designer, isn’t it rather embarrassing when your main customer phones up and asks you why their web site is down when you didn’t realise yourself? We use Wormly here and we love it, it monitors all the services on each server, it lets us know the same minute via ICQ & SMS when something is dying so we can go fix before any of our customers have even noticed. If you had Wormly then you’d know if your customer’s web site was pinging away happily or not, you’d also know that we’d be fixing it already too because we have a minute monitor.

Monitor your home page - Similar to what Wormly does, but home page monitoring will make a call to your website every few minutes to make sure it is loading the data you want your customers to see rather than a “THIS HAS BEEN HACKED BY …” text or “Internet Explorer cannot display this page”. You should be using home page monitoring if you care about your website, it’s your website we’re hosting and you should know the second something happens. It’s our responsibility to make sure the servers are stable and working fine but it’s your responsibility to make sure your web site is working. We don’t do home page monitoring because we don’t know when you update your website, if we did home page monitoring then the second you changed your homepage with a new design or different text we’d be alerted and have to call you, and for a £5/month average hosting plan that isn’t feasible.

Put your data in at least 2 completely different geographical locations - Sounds like a waste of £5 a month for a second shared hosting account somewhere doesn’t it? But then on the flip side, if an outage occurs you have to work out what uptime vs £££ means to you. If you have a replica of your site else where coupled with the next item I’m going to mention then there is no more down time problem.

Set the name servers on your domain to use an external DNS provider so you can either flick the switch to your backup provider manually or have automated DNS failover - DNS is the thing that resolves your domain name to a server’s IP address, when you type www.whatever.com into your browser your computer then goes and asks a DNS server where to go. If your name servers are pointed at your multi-homed DNS provider then you can just press a button to instantly point your domain at another server. You can do this manually or with automated failover.

Take nightly or weekly offsite backups of your data - Whatever your host says about backups doesn’t matter, whoever you host with, YOU should take backups too, it is as simple as that. The onus is on you to make sure your or your customers data is safe. You are our customer and we take nightly backups of our shared servers, we take nightly, weekly, monthly backups on our business class servers, we run RAID so we’re protected against single drive failure, something we didn’t have in 2004 when we started. However this doesn’t protect us against total RAID array failure (unlikely, but then again a data centre explosion is unlikely and that happened… so…). Soon we’ll have entire servers backing up between data centres in case of RAID array failure or even data centre fire which means we can roll out entirely new servers, pre-built with all customer sites in 4 - 12 hours, not many shared hosts bother with this. Even with all this in place, you need to take your own backups too.

Practice restoring backups on your backup server before you actually need to - So the nightmare has happened, your site is down, you don’t already have a ready/rolled out version of your site on your backup, but at least you downloaded your backup and you have a nice tar.gz file of your site sitting on your Windows desktop, right? Great, but you’ve only done half the job. Exactly how are you going to know it works unless you’ve tested it beforehand? There are always little issues, there are always configuration issues between servers too, most of ours have phpSuExec, your backup might not, so you’ll have to change permissions in all your folders in rather a hurry, or your database was corrupt when exported from the downed server so your backup is useless. For those reasons you need to practice what you are doing before it is needed.

Keep your customers up to date - We will always do our best to keep our customers up to date in the event of an emergency, with that information you should be able to do the same as well. ETA’s, best case/worst case scenarios, everything you can do, but never over promise. Make a plan for what you are going to say to your customers, are you going to take the first move in notifying them, or are you going to wait for them to phone you?

I hope this gives you some ideas of the kinds of things you should be looking into right now, I’ll let you and Google fill in the blanks, but next time an outage occurs somewhere, anywhere, hopefully not here, be prepared. The last thing we need to hear is that you’re losing money through something that could have been avoided by being a little bit proactive as we’re sitting here frantically fixing the shiny expensive Dell PowerEdge server you’re hosted on and keeping you in the loop.

2008 May 26

Billing & Support Systems

By Tim M.

11.00: Pete & myself are working on the support and billing systems today. I imagine we’ll be working into the evening, we’ll continue to answer tickets though.

19.04: Update! Kayako & WHMCS are both upgraded to latest, I’ll skin WHMCS tomorrow. Ive fully tested WHMCS, all functions are working - there are some shiny new ones too, log-in and have a look :)

Update: We’re finished with both systems, they’re working great. We won’t be reskinning WHMCS as a new version comes out soon and we’ll have to reskin it from scratch again.