Vortex (Resolved)
By Tim M.We are investigating issues with Vortex, this is a repeat of Monday - full report will be emailed to Vortex customers during the following week.
Latest news:
@ 00:39 Sun (Tim) - Kernel recompile completed on this and all UK servers!
@ 03:54 Sat (Tim) - We’re upgrading another server first, unfortunately post-upgrade it won’t find the lan card so we’re rescheduling for tomorrow at midnight…The other server is being worked on before Vortex.
@ 00:53 Sat (Tim) - Kernel upgrade has begun.
@ 22:25 Fri (Tim) - We’re going ahead with the Kernel upgrade as scheduled from around midnight. The server will be rebooted a couple of times during the night, nothing to worry about. I’ll post a report here if there is an issue.
@ 05:52 Fri (Tim) - I had a problem with the kernel update (its a manual rebuild rather than just typing ‘yum update’ then ‘reboot’) so I’m going to do this at 00:00 (Friday night / Sat morning) Its coming back into daylight hours and I cannot allow more downtime on this box if its avoidable so it has to wait until tomorrow night - this won’t involve much downtime (enough to reboot the server, usually 3 minutes however I’m taking no chances).
Issue is resolved for the moment, no hardware failure, server is stable, file system is okay, all sites are up, all services are running, 95% of tickets are replied, just working on the last few now — tomorrow is likely to be very busy on support tickets due to this but we’ll do our best to action your request in our usual 1 - 6 hour SLA.
Once again, I’m really sorry for the inconvenience caused - if you haven’t already claimed a free month for the downtime you can open a ticket with sales and Mark/Jan-Erik/Pete will extend your hosting period by one month. If we have already given you a free month on Monday we cannot do this again.I’m also copying the backup files to another server so if this happens again in the next few days (It shouldn’t) then we can setup all the sites on another box and switch the nameserver IPs.
Last thing, thankyou to YOU, our customers, for being so understanding and cheering us up with your comments during this stressful time, we don’t like to let you down - myself and the team know the seriousness of this and how this has caused you significant problems. A personal thankyou also to our team for working some long hours this week too - great team, good job guys.
@ 04:13 Fri (Tim) - Kernel update is progressing well, it has been compiling for sometime now.
@ 02:17 Fri (Tim) - IT LIVES! However we’re not quite out of the woods yet. We are to complete a manual Kernel update as a precaution. This is happening right now, we’ll then need to reboot the box. If nothing bad happens then we’ll have it back by 03:30.
@ 01:02 Fri (Tim) - We’re checking on the server each 5 minutes (I’d make Toby sit in there constantly but apparently its against Health and Safety due to extreme noise). Currently the server is on Pass 1D (thats good) of the FSCK check and hasn’t bailed. Hopefully shouldn’t be much longer. While we’re waiting we’re tidying up the support desk (150+ open tickets due to this). Once the FSCK is done we’re performing a kernel upgrade and rebooting the server again. ETA is now 03:00 - 06:00 — we’ll do our ABSOLUTE BEST to get it back online for 06:00.
@ 23:21 Thur (Tim) - Same as an hour ago, but an hour closer. Just playing the waiting game right now. Some people might be interested to actually SEE what we’re seeing … For those who want to see, here you go:
@ 22:21 Thur (Tim) - Disk check failed, performing manual disk check - we started the manual one at 22:06 and expect it to finish at approx 01:00. Uptime ETA 01:00 - 06:00.
@ 21:01 Thur (Tim) - Sitting here in the Bluesquare datacentre chill out room with Toby waiting for the disk check to finish - we’re hoping it’ll be done by midnight although ETA as Mark said below is still the same.
@ 19:17 Thur (Mark) - We have discovered that there is no hardware failure, we are currently running another mandatory disk check on the hard drives. (Current ETA: Midnight - 6am as long as there are no further errors).
@ 16:49 Thur (Tim) - We’re still looking at the ETA’s below as Dell Diags havent finished.
@ 15:56 Thur (Tim) - Disk check finished, Dell diagnostics is running, likely until 6pm.
1550 - 1800 (APPROX) Dell Diagnostics will run.
When diagnostics finish —- if no faults found then we reboot server and see if it wants another disk check or wants to start up. Disk check will then take another 4 - 6 hours.
If faults are found then we call Dell and await engineer arrival.
@ 15:42 Thur (Tim) - Still the same, see the 13:33 update for ETA.
@ 14:42 Thur (Tim) - Still the same as the previous post due to disk check. Nothing new to report as yet.
@ 13:33 Thur (Tim) - Disk check still progressing, we’re estimating approx 6pm before it finishes, Dell are on call also to swap out any hardware - 1- 4 Hr SLA for them to arrive on scene. Best case scenario will mean server is online 4pm - 6pm, worst case 10pm - tomorrow morning.
@ 12:06 Thur (Tim) - We saw some IO Errors before it shutdown, possible harddisk failure. It is running a diskcheck now which (as you will know from Monday) takes 4 - 6 hours.



February 14th, 2008 at 1:04 pm
We have 2 sites hosted at vortex server: http://www.malagataxi.co.uk and http://www.airmalaga.com. Monday morning the server was down until 16:00 aproximatly and now it’s again down, what happened?
(Tim Responds) Please read the top of this post to stay up to date.
February 14th, 2008 at 1:06 pm
Hi,
I see the server is back down again. I hope it gets back up again soon as I have important installations to perform. I know you’ll do your best and these things can’t be helped, good luck.
(Tim Responds) Thanks, we’ll do our best.
February 14th, 2008 at 1:59 pm
Twice in a week :((
Was it 12 in the afternoon that the resolution work started or earlier ? (trying to figure out the ETA of return)
We all appreciate that your doing you upmost to get the server back online, its still dissapointing
GL
Nikki
(Tim Responds) First notification was sent to us at 11:33AM, I was on it that same minute, datacenter staff took 1minute to get onscene, however I didn’t get a chance to update the blog for a short time due to needing to fault find and being on phone to Dell and the datacenter. We’re doing everything we can.
February 14th, 2008 at 2:41 pm
ok guys thanks for the update hope you get it fixed soon
(Tim Responds) Thanks, we’ll do our best.
February 14th, 2008 at 4:05 pm
I just wanted to say thank you for doing the best job you can under the circumstances.
I’ve been with EVO for just over 3 years now and this is the first time anything like this has happened.
If this is the worst it gets, its still 100% better then any other hosting company I have ever done business with.
Keep fighting the good fight! I’m still here knowing the quickest recovery is on its way!
(Tim Responds) Thankyou for the kind words, we will always do our best to go the extra 1000 miles for our customers however this problem is just really unfortunate
February 14th, 2008 at 4:42 pm
Been with Evo for over 2 years now, first time I’ve seen the server down twice in a week. I’m sure the staff will have it all sorted in no time, just hame some patience. Thanks for the frequent updates guys.
Peter
(Tim Responds) Thanks for your kind words.
February 14th, 2008 at 5:44 pm
Whilst it is slightly frustrating at having several sites down I understand that evo are doing everything within there power to rectify the problem.
I myself have also been with evo hosting for a number of years now and this is the first “major” outage we’ve had and I’d like to take the opportunity to thank evo for such realiable service over the time I’ve been with them and I hope that we can continue working with each other in the future.
Thanks
Russell Saunders
(Tim Responds) Thanks Russell, we’ll post a better update about 18:00 - 19:00 once Dell Diags have finished, the whole team really appreciates your comments on this bad day.
February 14th, 2008 at 5:56 pm
Evo you are the best.
never seen the server go down and twice in one week is seriously bad luck.
Thank you for all your hard work on getting us all back online.
February 14th, 2008 at 7:38 pm
I finished a ‘big’ FTP upload to my space on Vortex just before it died today. It wasn’t me wot broked it was it!!
February 14th, 2008 at 9:05 pm
upset.. today was the launch of th elive dj stream ..we have been running the playlist stream for a couple of months with minor issues,,, but the launch of the station has been impossible…am disappointed. Do hope you will resolve this.
February 14th, 2008 at 10:14 pm
Yep must be Simon that killed Vortex
Much respect to Tim and the others who must be in need of a cool beer by now to chill out with.
/me hands the guys a cold one…
February 14th, 2008 at 10:31 pm
Very frustrating to see it down again in such a sort space of time, but appreciate the updates you are providing and the obviously long hours you are putting in to resolve it. Hope you have plenty of coffee to last the night!!!
February 14th, 2008 at 11:13 pm
Not impressed to be honest, ok yeah Evo has been very reliable in the past and I have recommended them to many people, but this is a joke the server has been playing up for a few weeks now atleast and if this is how things are going to start being on here I will soon just go find somebody else for hosting that can be reliable.
February 14th, 2008 at 11:45 pm
John there may be a problem with the server but as I said in my post above I’ve been with evo for a few years now and this is the first “major” issue and I will stand by evo simply because of the cheap cost and the great support offered.
I agree it’s frustrating but to me they are doing all that they can do to get things back online and I would just ask for some patience.
February 14th, 2008 at 11:48 pm
I’ve been with evo in excess of three years are this is the first major problem I have known them to have and I know for a fact they are doing everything they possibly can at the moment!
Just looks like lighting has struck twice this week
Bit of rotten luck we all get it sometimes!
February 14th, 2008 at 11:50 pm
John2
I have been with Evo since 2005
I have never seen a problem like this occur with evo - EVER.
Don’t think about leaving just because there is a 1 in a million glitch. I haven’t seen any problems with the server over the last few weeks at all.
The info that Evo are giving us is amazing, I am incredibly impressed with how this is being dealt with.
Tim and the rest of the team
You’re all stars! THANK YOU for doing everything you can as quickly as you can
February 14th, 2008 at 11:54 pm
I have lost a lot of sales ( this being my main income) with the server being down but with all respect to Tim & co they have kept us informed & up to date with with the status of the server.
Hope the gets resolved ASAP
Ju
February 15th, 2008 at 12:26 am
I have been with evo for 3+ years and this is the 4th major downtime (and I monitor the server 24/7), 1 major downtime was hard drive failure, which after that the webspace given was doubled as the hard drive wa doubled when replaced.
The 2nd major downtime was when the server was being moved from the US to the UK when the hardware was being retired.
3rd was a few days ago, and 4th is now.
There have been shorter downtimes, but not huge amounts.
What is pissing me off the most isnt that the server is down, shit happen simple as, if you want 100% SLA then pay the $$$’s. Whats pissing me off is that my domains are showing like the domain has expired or just vanished, I would think that there would be some way to point the DNS to a standard holding page that said ‘Sorry Technical Difficulties, Please Check back soon’ then at least people would know the site hasnt closed or being killed and was just temp
My solution I think is a back up hosting account which I used to have upto a few months ago with purple-paw, but I cancelled as all was going smoothly, but I’ll re-open one so I can stay online.
(Tim Replies) We’re currently looking into a system whereby we are running two servers together, if server a fails, server b takes over - however this doubles the cost and adds extra implications for if the link between the two goes down, but we’re looking into this, it will also be low contention ratio (meaning 20 people per server, instead of upto 300). Sorry Steve, we’re doing all we can.
February 15th, 2008 at 12:53 am
I know Tim
Isnt there a more simple way of redirecting ALL dns coming into ns34 or whatever vortex is to a ’server maintainance’ page incase of crashes ?
It would be easier an cheaper than some sort of server raid thing ??
February 15th, 2008 at 2:25 am
I’ve been using Evo for a couple of years and this is the first really big period of downtime I’ve had with them to be honest. Most other times it was just me mucking around! Ha ha!
February 15th, 2008 at 2:28 am
Not being a customer of Evo myself, but knowing one of the chaps out there on the front line personally, I must say I’m impressed. You wouldn’t find this sort of dedication at many, if any, hosting companies, unless you’re paying seriously large amounts for it.
You guys are lucky to have such dedicated and comitted engineers (or should that be engineers that need comitting?) working on your servers!!!
Top marks for Tim & Toby, just for being out there at this time in the morning if nothing else!
Keep up the good work chaps, we’re all behind you here!
February 15th, 2008 at 2:46 am
Have you all got matchsticks propping the eyes open yet?
What is the likelyhood of data loss on my site? I am panicing a little as I have no backups.
February 15th, 2008 at 3:06 am
Hi there. Just want to say that although I have only ever used hosting services from EVO I don’t imagine many would post updates every hour about how their getting on with fixing downtime issues. These guys seem genuinely passionate about maintaining a working server, good on you!
^ Feel free to use this as my testimonial guys - http://www.tmdesigns.org
February 15th, 2008 at 3:13 am
Great to see my sites back up so fingers crossed that the problems have been fixed.
Great Job Tim and Toby - thanks!
February 15th, 2008 at 3:34 am
Yippie!! 3 cheers for the boys (and girls?) oh man I bet they are asleep with head against the keyboard now. Watch ‘em wake up with qwertyuiop tattooed in their foreheads.
February 15th, 2008 at 4:04 am
Data Loss Wise for the chap above, I have the files from updates comitted at 02/13/2008 10:52:06 PM and I have database activity from 02/14/2008 10:50:48 AM, only loss is in email so far, but hopefully the mailservers won’t have give up trying to deliver it.
I’d say probability of data loss is pretty low.
February 15th, 2008 at 4:05 am
Thanks for keeping us up to date with exactly what’s going on guys. It takes a lot of the worry out of the situation. Good luck with the kernel update.
February 15th, 2008 at 5:55 am
Darn it! It was working then as well.
February 15th, 2008 at 8:57 am
Thanks Guys!.. Back on and running.. Think Tim needs to get some sleep! - Make sure the team have a beer or two on Saturday night !!!
Btw.. Did you guys sent me a purple’ish/pink glitter lava lamp just before xmas ??
(Cos I got one, and have no idea where or who sent it !!)
February 15th, 2008 at 10:51 am
Well, all seems back up, but my ‘big’ upload from yesterday obviously wasn’t on the backup, so I need to upload it again! Wish me luck!
February 15th, 2008 at 11:24 am
Thanks guys for burning the midnight oil and getting this resolved!
February 15th, 2008 at 1:20 pm
All I can add is thank you!
February 15th, 2008 at 6:58 pm
Hi All,
Thanks for all the kind words here, to answer a few questions I have seen here….
@ Carl: As far as I have seen so far we’ve seen no data loss, or at least, no reports of any. We’ve seen a couple of locked databases that have just required a repair from phpmyadmin and that is about it. Oh and also, no we didn’t use matchsticks, a mix of redbull, kfc and sugary rubbish kept us all going through the night
@ Paul: thanks for the kind words, although I doubt any of us will be awake enough by tonight to feel like going down the pub
Yes we would have sent you a lava lamp at around Christmas if you signed up during that promotion - http://www.evohosting.co.uk/blog/index.php/2007/11/01/november07-specials/
Thanks again everyone
February 15th, 2008 at 7:07 pm
Thanks for all the hard work that has already been done and for the work that’s still to be done to resolve the issues
February 15th, 2008 at 8:48 pm
What caused the server to shut down.
*gets gun out* Hackers ?
February 16th, 2008 at 11:35 pm
Thanks to the efforts of Tim and the boys, my crappy website is back to normal. I have been with Evo since pretty much the start and this is a very rare occurrence. Last time was a HDD failure and we all got double space and bandwidth, even though it was down for a tiny amount of time.
I feel for the businesses that have been affected but knowing Tim (and reading some of the comments above) he would have stopped at nothing to get Vortex back up and running as fast as possible.
They have always helped me fast and effectively and I for one thank them for their efforts.