Wikipedia back up – recovery from power outage

Categories: Wikipedia
Tags: No Tags
Comments: No Comments
Published on: April 10, 2006

Wikipedia reborn
Wikipedia is now up again after several hours down time. I found the Wikipedia/Wikitech Server Admin Log which provides some insights about what happened.

It seems that they had a major power failure. Even though they seemed to have gotten power back fairly quickly, did it take a lot more time to get all the servers up and running properly again.


See here the excerpt from from Server Admin Log. Please note that the Date and Times are GMT Time.

April 10
04:26 jeluf: ixia, thistle, lomaria, db1 have broken replication settings, webster has database page corruption. Taking db2 out of rotation to create copies from it.
04:20 jeluf: mounted /home on all DB servers
04:03 brion: ran mass-correction of bad-timestamped entries on enwiki (1529 revision records)
03:05 brion: srv71-srv79 had wrong clock, apparently set to local time instead of UTC.
01:45 brion: irc feeds online. had to rescue udprec from kate’s old home dir
01:38 brion: taking thistle and db1 out of rotation; broken replication.
01:32 brion: turning read_only off on adler. seems to be set to go on always on boot.
01:28 brion: things look mostly good; tried to take site read/write but someone has put adler into read-only? examining
01:23 brion: got fs-squids on the right ip. seems to work now.
01:20 brion: had to start lighty on amane
01:18 brion: trying to get fileserver squids+lvs up. (avicenna as lvs master)
01:10 brion: run-icpagent.sh didn’t take previously; seems to have helped now
01:04 brion: trying to add 10.0.5.5 on dalembert also. no idea if this is correct. 10.0.5.3 works internally, but squids still don’t show anything. there’s no explanation for this that is obvious to me.
00:55 brion: added the lvs master ip on dalembert; http’ing to it internally seems to work, but still nothing from outside
00:49 brion: trying starting LVS monitor thingy on dalembert. no clue if it’s working
00:45 brion: turning on apaches

April 9
23:45 brion: srv33, srv36 should now replicate properly.
External storage borkgage, 2006-04-09
23:20 brion: looking at srv33, srv36 external storage; jens reports replication seems borked
22:00 brion: added izwinger ip to suda; it wasn’t automatic.
21:52 brion: finally got into srv1 and albert. maybe working
21:49 brion: ldap depends on dns; dns is still broken. we can’t reach srv1 or albert.
21:32 brion: still trying to get some core machines online (suda booting; albert ?? srv1 ??). kyle should be available in 30 minutes
20:55 brion: bw is onsite and available to poke at machines. there was a power problem; some machines seem to still eb booting
20:42 brion: phoned kyle (message)
20:38 brion: network mostly back up, still trying to get in
19:20 brion: PowerMedium offline?

Btw. None of my changes got lost and I was able to finish my changes to the ASCII art Article. Check it out.

I also created a new ASCII and ANSI. Yes, a new one. I created it for deviantART. Enjoy.

deviantART ANSIdeviantART ASCII
???????????? Ciao Carsten a.k.a. Roy/SAC



…cu at dA

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

NOTE! I believe in the right for freedom of speech and personal opinion and are against censorship, so feel free to tell me what you think and let me and others hear your opinion on this subject, but please avoid using the f-word and s-word as much as you possibly can, because at the end of the day this blog exists for the purpose of useful exchanges of thoughts, ideas and opinions and not as a valve for your accumulated anger and frustration. Get a shrink for that! Thanks.

Welcome , today is Friday, April 26, 2024