Published on: April 10, 2006

Wikipedia reborn
Wikipedia is now up again after several hours down time. I found the Wikipedia/Wikitech Server Admin Log which provides some insights about what happened.

It seems that they had a major power failure. Even though they seemed to have gotten power back fairly quickly, did it take a lot more time to get all the servers up and running properly again.

See here the excerpt from from Server Admin Log. Please note that the Date and Times are GMT Time.

April 10
04:26 jeluf: ixia, thistle, lomaria, db1 have broken replication settings, webster has database page corruption. Taking db2 out of rotation to create copies from it.
04:20 jeluf: mounted /home on all DB servers
04:03 brion: ran mass-correction of bad-timestamped entries on enwiki (1529 revision records)
03:05 brion: srv71-srv79 had wrong clock, apparently set to local time instead of UTC.
01:45 brion: irc feeds online. had to rescue udprec from kate’s old home dir
01:38 brion: taking thistle and db1 out of rotation; broken replication.
01:32 brion: turning read_only off on adler. seems to be set to go on always on boot.
01:28 brion: things look mostly good; tried to take site read/write but someone has put adler into read-only? examining
01:23 brion: got fs-squids on the right ip. seems to work now.
01:20 brion: had to start lighty on amane
01:18 brion: trying to get fileserver squids+lvs up. (avicenna as lvs master)
01:10 brion: didn’t take previously; seems to have helped now
01:04 brion: trying to add on dalembert also. no idea if this is correct. works internally, but squids still don’t show anything. there’s no explanation for this that is obvious to me.
00:55 brion: added the lvs master ip on dalembert; http’ing to it internally seems to work, but still nothing from outside
00:49 brion: trying starting LVS monitor thingy on dalembert. no clue if it’s working
00:45 brion: turning on apaches

April 9
23:45 brion: srv33, srv36 should now replicate properly.
External storage borkgage, 2006-04-09
23:20 brion: looking at srv33, srv36 external storage; jens reports replication seems borked
22:00 brion: added izwinger ip to suda; it wasn’t automatic.
21:52 brion: finally got into srv1 and albert. maybe working
21:49 brion: ldap depends on dns; dns is still broken. we can’t reach srv1 or albert.
21:32 brion: still trying to get some core machines online (suda booting; albert ?? srv1 ??). kyle should be available in 30 minutes
20:55 brion: bw is onsite and available to poke at machines. there was a power problem; some machines seem to still eb booting
20:42 brion: phoned kyle (message)
20:38 brion: network mostly back up, still trying to get in
19:20 brion: PowerMedium offline?

Btw. None of my changes got lost and I was able to finish my changes to the ASCII art Article. Check it out.

I also created a new ASCII and ANSI. Yes, a new one. I created it for deviantART. Enjoy.

deviantART ANSIdeviantART ASCII
       Ciao Carsten a.k.a. Roy/SAC

…cu at dA

