Vortex (Resolved)
By Tim M.I will keep this post up-to-date with events occuring on Vortex this evening/morning.
@ 02:40 - The filesystem on Vortex has gone read only. We are remounting the filesystem but may need to reboot and do a manual FSCK which will take some time.
@ 03:03 - Still remounting the file system.
@ 03:18 - NOC staff will reboot the server shortly. If it goes into FSCK we could be here a while.
@ 03:57 - Unfortunately the server requires an FSCK and will be down until this completes, this was already been started a little while ago.
@ 04:32 - FSCK is progressing, usually this will take 6 hours. Best case scenario would mean the server is back at 10am.
@ 05:06 - As mentioned in a comment, I am prepping a new server (a shiny 8 x 2.5ghz one this time), once Vortex is back up we will copy all accounts to the new server, switch nameserver ips and rsync databases. (Still waiting for FSCK by the way….)
@ 05:38 - Still FSCKing away.
@ 06:16 - FSCK ‘almost’ done, it might need another.. Its currently complaining about several “multiply claimed block”, this basically involves the NOC engineer repeatedly hitting ‘Y’ until it stops.. It then reboots, we then see if it needs another FSCK or if we can bring it back up and start the migration. In the meantime I continue to prep up the new server — almost done now.
@ 06:41 - Same as before…
@ 07:01 - Same as before, new server is prepped so I’m heading down to Bluesquare now to plug it in. Am only a few minutes away - will update the blog after I’ve plugged in the new server and checked out Vortex myself (will try to report back by 08:00).
@ 08.14 - Replacement server built and Vortex 20% through a 2nd fsck which we kicked off at 7.35? we will update with latest status by 9.00
@ 08:33 - Vortex is currently at 50% of the FSCK.
@ 09:07 - 70%
@ 09:29 - Still the same, nothing further to report, presuming this is the final FSCK we’ll do then hopefully it should be finished by 11am. If we have to do another one then we’re starting to look towards mid-late afternoon.
@ 11:01 - FSCK failed at 85% - server rebooted, going for a third FSCK. Currently at 5% (restarted it 10:55)
@ 11:21 - FSCK at 13.7%
@ 11:45 - 44.5%
@ 12:13 - 70.1%
@ 12:39 - Percentage meter has gone off the screen, it is currently fixing multiple claimed blocks. It did this for a long time earlier, we simply have to see how long it takes and how far it gets.
@ 13:01 - Still the same, server is churning through data - screen hasn’t updated so hasn’t found anymore of the above as yet.
@ 13:30 - It has churned through a LOT of data, just waiting to see if it completes okay this time.
@ 14:19 - It got a lot further then failed again so I have had to kick it off again (eta to finish around 5:30pm based on previous)
@ 15:30 - Nothing to report, there is no % meter this time round, it is currently on pass 1 and has been since 14:19, I’ll keep checking the server every 5minutes and report back when there is actually something to report. Hang in there everyone.
@ 17:26 - The FSCK is on pass 1D, possibly another 60-90 minutes until it has finished.
@ 18.56 - It Lives! Vortex is now coming back online, it may take a while before databases and everything are all up and running if you see any issues please open a support ticket. We will update here when we are moving everything over to the new server. - Toby H.


March 23rd, 2008 at 5:06 am
Hey Guys
Is there a problem with the hardware or software causing these problems ?
Obviously you don’t know exactly what it is, otherwise It wouldnt happen but is there a list of suspects that keep making the storage do silly things ? Like bad raid card/table ? fault hard drive ? dodgey cables ?
March 23rd, 2008 at 5:39 am
Hi Steve.
Nothing has picked up a hardware issue, however once it’s back all sites are to be moved to a different server.
The server magically rebooted at 10am Saturday (came straight back up) and we couldn’t diagnose an issue so we beefed up security to absolute maximum and put it on suicide watch, the idea being if anything else happens within a 24 hour period we swap out the hardware entirely to put an end to it long term.
March 23rd, 2008 at 8:54 am
Thought it was mt fault again! Glad it’s not me. Sorry you got a messed up Bank Holiday weekend, guys. Happy Easter. Want me to ‘lay on hands’ on that server?
March 23rd, 2008 at 9:17 am
Didn’t we already have issues with this one last month?
March 23rd, 2008 at 9:36 am
Thanks Kathy
Rob, yep - hence when its back we’re swapping out the hardware and moving all customers off it to a fresh one because this can’t happen again.
March 23rd, 2008 at 12:03 pm
Not happy!!! You try and build a regular client base only for it to get destroyed every week by your dodge servers, thanks guys!
March 23rd, 2008 at 12:03 pm
Cant be helped.. oh well “Happy Easter”.
March 23rd, 2008 at 12:07 pm
Aaaargh! Not again.
Tim,
How long before you’re able to do a restore to new hardware? Are you going to wait unit the Fsck finishes or roll back to a previous day?
I for one would be happy with a previous day rollback, if only to at least get our DNS back up.
Cheers.
March 23rd, 2008 at 12:07 pm
Thanks for everything guys, hope you can sort it out soon. Happy Easter anyways.
March 23rd, 2008 at 12:13 pm
O yeh! Thanks Paul happy easter…
March 23rd, 2008 at 12:35 pm
@ Grant - We’re going to copy over live data rather than backup data, We can begin when the server is running. The copy will take all day, your sites will be online during that time.
Then during the night we will rsync a number of folders from one server to another, these will include all the data inside MySQL databases etc - no rollbacks or anything.
@ Barry - I understand your frustration, doing all I can as you can see by reading this blog post.
@ William - thanks for your support through this hard day.
March 23rd, 2008 at 12:41 pm
Hey guys, these things happen, but I trust in your hard work in getting everything running again
March 23rd, 2008 at 12:41 pm
Thanks for all the hard work guys - hopefully youll have it up and running soon without much probs
March 23rd, 2008 at 12:58 pm
Thanks for the ongoing updates. I appreciate being kept informed, it avoids *a lot* of frustration at this end!
Hope you get things fixed soon.
March 23rd, 2008 at 1:10 pm
@ Alan, Richard and Kaz - thankyou for your support, and Ill continue to give updates as often as I can. Am literally sitting here in the datacenter hall waiting for the FSCK to finish, replying to comments here, and tickets on helpdesk.
March 23rd, 2008 at 1:13 pm
Hope this doesn’t take as long as the last time
Never got my FREE month either
I know you guys are working hard to resolve this
Happy Easter
March 23rd, 2008 at 1:17 pm
Oh, this ones an unstable one
Thanks for keeping us updated, although it would be nice if we were emailed about things like this, I had to find out from a user of my site.
March 23rd, 2008 at 1:24 pm
This is having a big impact upon us again. We are a gaming team and require that the forums for our site be up etc for team announcments. We have our most important match of the season this evening and as it stands its goiing to be mayhem to organise due to the site being down.
I really hope the site can be back up by late afternoon. I understand it is Easter Sunday and pls pass on my personal thanks to the people who are working to resolve this issue.
Andy
http://www.team-mguk.com
http://www.mature-gamer.com
March 23rd, 2008 at 1:27 pm
@ Julie - did you pop a ticket into Customer Services? Can you send one in today? I did promise that to anyone that emailed us last time - it will be dealt with on Tuesday.
@ Clint - not for much longer, it’s being decommissioned due to this. Fair point regarding email, one will be sent out once the server is back regarding the migration.
March 23rd, 2008 at 1:34 pm
Alright guys,,,
Not gonna go blah blah blah, my site wasnt working so i thought i would check the blog before logging a call lol.
Hope you get the new server going soon and bin the troublesome one..
March 23rd, 2008 at 1:35 pm
@ Andy - I feel your pain. I am hoping for a lot sooner than late afternoon. We just have to see how far this FSCK gets. Each time I run it it gets quicker if you look at the times and percentages. Unfortunately just have to play the patience game.
@ Simon - tempted to burn it and youtube it once we have the live data. Although it won’t help our carbon footprint much.
March 23rd, 2008 at 2:08 pm
I vote Evo sets up ’serverblend.com’ and buys a blendtec and remake this http://www.youtube.com/watch?v=UU_AJfZVnYA&NR=1
March 23rd, 2008 at 2:12 pm
Bahahaha too right! I bet it will blend too.. Although I’ll need to hack it up a bit to get it into the Blendtec. Not sure if Dell will replace it under warranty though after? Only one way to find out I guess.
March 23rd, 2008 at 2:39 pm
Well Tim, Excellent work with the updates on this issue….Regarding the Blendtec im sure you might be able to ask them to make you a unquiue version ideal for 1u/4u system cases, Anyways you and your team @ EVO need a around of applause, its a bank hoilday and your 30hours sleep…
Three cheers for the team
Joe
March 23rd, 2008 at 2:43 pm
I’m sure if you contact blendtec and ask them for a server blade size blender because you wanna blend vortex for doing to sadistic, they will help ya out if they get a youtube video..
I think I found out why vortex is a nightmare.
http://en.wikipedia.org/wiki/Vortex_(Transformers)
“Vortex is the most out-and-out sadistic of the Combaticons, and functions as their interrogator.”
He clearly went off the deep end.
Can the new server be called Hydra ? Since no one else has asked to name it
http://en.wikipedia.org/wiki/Hydra_%28Transformers%29
March 23rd, 2008 at 2:49 pm
Tim M. Says:
March 23rd, 2008 at 1:27 pm
@ Julie - did you pop a ticket into Customer Services? Can you send one in today? I did promise that to anyone that emailed us last time - it will be dealt with on Tuesday.
Tim i did put a ticket in but to tell you the truth i am not worried about it, Just hope you can sort vortex out ASAP
Keep up the good work
JU
March 23rd, 2008 at 3:52 pm
@ Steve
Hmm that may explain a few things, I second the blender vote!
March 23rd, 2008 at 4:00 pm
Say Tim, this seems to be a recurring issue with the hard drive and data storage on the vortex server, I was wondering if you have simply used the included server tools or if you have tried any outside tools to examine, repair the disk?
I have had tremendous success using a product called Spinrite from http://www.grc.com/sr/spinrite.htm
Not only can it recover data, but it works at low level and properly repairs/reports bad sectors. Its saved my behind numerous times and as it works outside the OS it properly reports disk drive condition. Its very easy to use and extremely thorough.
March 23rd, 2008 at 4:24 pm
@Steve: well, with a name like Vortex, is there any wonder it’s gone down the plughole!
RichK
March 23rd, 2008 at 4:53 pm
[sarcasm]
why not call the new server HOPE, as in ‘I HOPE the bloody server stays up!’
[/sarcasm]
I think it’s all the chocolate from the give-away easter eggs Tim bought melting into the server. Tim and the boys are really licking the sugary goodness off of the hard drives and cpu!
Nice specs on the replacement by the way.
March 23rd, 2008 at 5:00 pm
@ John - Well, we are replacing the server so that should fix any hardware issues, then we’ll test it until we either totally break something and can get dell to repair it or replace the server totally.
March 23rd, 2008 at 5:45 pm
Glad to see people are laughing as well as grumbling. It’s easter - we get to sit here and eat choccie eggs (since we can’t do anything with our websites) while poor Tim nurses a server!
I have a nice hatchet you sort the server our if a blender doesn’t. I do suggest a name change though - sadistic is a bout right.
BTW, don’t bother emailing me about downtime - my email is on the server…
Happy Easter everyone!
March 23rd, 2008 at 6:26 pm
Thanks for all the updates
At least it’s happened on a sunday, and as it’s easter, many people are away, so the overall effect is not a great as it might of been
(well, that’s my thought anyway)
I’ve looked through and could we name the new server Wheeljack?
“The Autobots’ resident inventor and gadgeteer. He often produces devices when needed, though his inventions were notorious for exploding in his face while he was still testing/working on them. Optimistic.” :P
March 23rd, 2008 at 6:41 pm
As per John regarding Spinrite - it’s saved my ass a few times too.
Good luck reloading / transferring Tim
Yes it’s a pain being off air but, these things happen. Keeping us informed as you do really does help to avoid a lot of frustration - thanks.
March 23rd, 2008 at 6:55 pm
thanks for the updates tim
March 23rd, 2008 at 7:26 pm
To all those who bashed Evo above - at least Tim and the team are working hard to get you up and running! There are hundreds if not thousands of hosting companies on this planet of ours who would not have the drive, determination and dedication Tim and the guys have!
I’ve had my fair share of woes with hosting companies, and Evo is the best by far. Give Tim a break, and suffer like the rest of us. There’s more to life that your gaming forum. Go and get some fresh air
March 23rd, 2008 at 7:52 pm
And there was light… but no databases heh.
March 23rd, 2008 at 7:53 pm
All your databases are belong to us :P
March 23rd, 2008 at 7:54 pm
Maybe Richard but have thought about other companies, there credibility and lose of earnings.. I think not!
Any news on what time everything will be back on-line?
Cheers Guys
March 23rd, 2008 at 8:05 pm
It LIVES! the server is now back online!
March 23rd, 2008 at 8:06 pm
Alleluia!!
You know what? It’s been rather nice meeting some of the people I share the server with. Well done on keeping cheerful. And an HUGE Thank you to Tim and the team. Hope you all get a day off tomorrow!
*Passes virtual choccie eggs to all*
March 23rd, 2008 at 8:20 pm
A large glass of whiskey on ice for Tim
March 23rd, 2008 at 8:26 pm
It’s indeed nice to see everyone that sits on Vortex … I appreciate the fact that hardware issues can occur, and I’ve done my research before joining Evo … however, what happened to redundancy ? We build servers in the office all the time and surely for a hosting company, a server going offline for 12 or so hours is totally unacceptable.
Thanks for getting it up and running, although this shouldn’t have happened in the first place I’m gladd with the information supplied here. Kept me from panicking.
March 23rd, 2008 at 9:00 pm
@ Wout - Redundancy prevents against data loss and total system failures, but it does not prevent a software failure of the operating system. Having seen issues on this server several times now we are going to be swapping out the whole server over night tonight and testing the server to determine exactly what the problems have been caused by, then it will be returned to Dell for repair.
What happened today is the file system became read only, this does not have to be caused by a hardware issue, there are many many reasons for this, which range from a hardware issue to use of swap space for a prolonged period. Once the file system becomes read only you can try to remount it as read write, but this rarely works so you are generally forced to perform an FSCK, which may need to be repeated numerous times, as it was today. Each FSCK parse takes around 3-4 hours, but is different for different hardware, and the more data, the longer it takes and normally the more parses required.
I have my website on vortex also, so have been without email all day just like everyone here.
March 23rd, 2008 at 9:14 pm
Tim.
Many thanks for getting the server and all of our sites/services back online, I hope / trust that no one has had any loss of data.
Please pass on our thanks to all involved in the recovery.
Thanks
Grant.
March 23rd, 2008 at 9:40 pm
Thanks to Tim, Toby and the rest for getting the server sorted.
I only have one problem… my site is still crap!
Oh well at least people can now read it and mock me mercilessly
Are we still on vortex? does the transfer to the new shiny still have to be performed? I just ask because I want to raise a glass to the memory of vortex, as I assume, do the majority of the posters above.
If it has been transferred - R.I.P vortex, it was a wild ride full of ups and downs, there were good times and bad times, you served us so well up until the end where you had a brain fart and passed into the great land fill in the sky.
March 23rd, 2008 at 9:53 pm
Hi Tucker
We’re starting to do the transfer in the next hour, this is what happens:
- First we do a cPanel copy of live data (200GB @ 100mbit - this is going to take ages).
- By the time the copy finishes peoples databases will be out of date… Ie, if they have forums there will be new posts etc — so we then do an rsync on the MySQL folders, folders for your email etc etc to drag the latest data over to the new server (probably 10GB @ 100mbit so pretty quick).
- At the same time as doing the Rsync we change the IP addresses associated with the nameservers (you know you use ns27.3v0.net / ns28.3v0.net) — these will be routed to new IPs (we’ve already done some DNS magic to reduce the time that these IPs are cached for your end so if we can time this right, do it over night, then most people will just switch seamlessly to the new box.
I hope that makes sense, there’s a whole lot more to it but thats the gist.
It’s likely that the live data copy will take upto 24 hours, maybe even longer. I can’t say for sure. But I’ll make sure the nameserver switch and final rsync is done around midnight (not tonight, probably tomorrow night).
I’ll send out an email ASAP with full details.
March 23rd, 2008 at 10:01 pm
What’s your website tucker?
March 23rd, 2008 at 10:02 pm
Tim, you must be knack… errr shattered! When do you sleep?
The girls at ChristopherEccleston.net are doing the happy dance in your honour. And I shall raise a glass tomorrow night for Vortex: may it rust in peace.
March 23rd, 2008 at 10:06 pm
Tim doesn’t sleep, he has forgotten what sleeping is, we’re trying to fix that, but everytime we think its sorted something like this happens, oh and he is still seeing an army of easter eggs chasing him in his sleep having spent 3 days packaging everyone’s up last week!
March 23rd, 2008 at 10:15 pm
Post him down to Cornwall for a week with his family. I can make sure he walks on the beach and avoids thinking about servers. And hit him over the head with a mallet every night if necessary.
And I withdraw all the comments I’ve been making about Easter eggs if they added to the nightmare. I don’t get allowed any till today.
*Struggles with a mental picture of Tim being chased by a giant smartie egg with ‘Vortex’ written on the side…*
March 24th, 2008 at 12:40 pm
You know, I just realised why I said I didn’t want to work with computers… Err…
March 24th, 2008 at 6:00 pm
Well I’m again impressed with the level of attention the evo staff pay to their servers and customers. A bank holiday spent fixing a server must be a nightmare but they are willing to do it because they value every last customer on this and every server.
Of course you’d have to be mad not to be a bit annoyed with the server messing around again and again.
I do have an issue with accessing my email via Horde (I prefer horde you see!)
Oliver
htp://designblocks.co.uk - 8 days left!