2012-01-03
§
|
23:00 |
<Tim> |
on spence: restarting gmetad |
[production] |
22:58 |
<reedy> |
synchronizing Wikimedia installation... : Pushing [[rev:107953|r107953]], [[rev:107955|r107955]], [[rev:107956|r107956]], [[rev:107957|r107957]] |
[production] |
22:47 |
<LeslieCarr> |
stopping and then starting apache2 on spence to try and lower load |
[production] |
22:29 |
<RobH> |
added in the lo addres to lvs4, now its working and generating thumbnails |
[production] |
22:09 |
<reedy> |
synchronizing Wikimedia installation... : Push [[rev:107938|r107938]] [[rev:107948|r107948]] |
[production] |
21:45 |
<RobH> |
ganglia graphs will have missing data for past 30 to 40 minutes |
[production] |
21:45 |
<RobH> |
spence back online, ganglia and nagios confirmed operational |
[production] |
21:38 |
<RobH> |
resetting spence and dropping to serial to try to fix it |
[production] |
21:25 |
<RobH> |
nagios and ganglia down due to spence reboot, system still coming back online |
[production] |
21:21 |
<RobH> |
spence is unresponsive to ssh and serial console, rebooting |
[production] |
21:14 |
<LeslieCarr> |
resetting DRAC 5 on spence for management connectivity |
[production] |
21:05 |
<binasher> |
that fixed it. but how did that happen? |
[production] |
21:05 |
<binasher> |
ran ip addr add 10.2.1.22/32 label "lo:LVS" dev lo on lvs4 |
[production] |
19:36 |
<reedy> |
synchronized php-1.18/skins/common/images/ '[[rev:107930|r107930]]' |
[production] |
17:36 |
<mutante> |
killing more runJobs.php / nextJobDB.php processes on a bunch of servers (/home/catrope/badjobrunners) |
[production] |
17:26 |
<RoanKattouw> |
Stopping job runners on the following DECOMMISSIONED servers: srv151 srv152 srv153 srv158 srv160 srv164 srv165 srv166 srv167 srv168 srv170 srv176 srv177 srv178 srv181 srv184 srv185 |
[production] |
15:55 |
<RobH> |
torrus back, took forever to recompile |
[production] |
15:53 |
<reedy> |
synchronized wmf-config/InitialiseSettings.php 'Bug 33485 - Enable WikiLove in si.wikipedia' |
[production] |
15:52 |
<Reedy> |
Created wikilove tables on siwiki |
[production] |
15:46 |
<RobH> |
torrus deadlocked, kicking |
[production] |
14:00 |
<RoanKattouw> |
Restarting job runners on srv242 and mw25, those are the last ones that are stuck |
[production] |
13:57 |
<RoanKattouw> |
Restarting all job runners that are stuck |
[production] |
13:48 |
<RoanKattouw> |
Restarting job runner on srv236, seems to be stuck |
[production] |
02:02 |
<LocalisationUpdate> |
completed (1.18) at Tue Jan 3 02:05:21 UTC 2012 |
[production] |
2012-01-02
§
|
23:36 |
<Reedy> |
Seems to potentially be an issue with job runners, enwiki backed up to over 90k over the last week or so. Needs investigating |
[production] |
23:18 |
<tstarling> |
synchronized php-1.18/includes/parser/Parser.php '[[rev:107856|r107856]]' |
[production] |
22:58 |
<tstarling> |
synchronizing Wikimedia installation... : |
[production] |
18:08 |
<nikerabbit> |
synchronized wmf-config/InitialiseSettings.php 'Bug 33368: WebFonts on bpywiki' |
[production] |
18:05 |
<nikerabbit> |
synchronized php-1.18/languages/messages/ 'i18ndeploy [[rev:107843|r107843]]' |
[production] |
18:04 |
<nikerabbit> |
synchronized php-1.18/extensions/WebFonts/WebFonts.i18n.php 'i18ndeploy [[rev:107843|r107843]]' |
[production] |
16:58 |
<mutante> |
installed SiteMap extension on Bugzilla - soon bugs should be googleable |
[production] |
16:33 |
<mutante> |
upgraded Bugzilla from 4.0.2 to 4.0.3 (http://www.bugzilla.org/releases/4.0.3/release-notes.html#v40_point) (RT #2194) |
[production] |
14:47 |
<mutante> |
cleaned out gammu spool to stop sms bomb - sorry. deamon runs again now though.. |
[production] |
14:36 |
<mutante> |
fixed gammu-smsd on spence per wikitech "Nagios#Fixing_the_USB_dongle" (sending out queued SMS now ) |
[production] |
14:30 |
<mutante> |
puppet ran on spence, ganglia also seems ok despite the errors i logged before. gammu-smsd cant find device again though |
[production] |
14:03 |
<mutante> |
spence / gmetad - RRD_update .. illegal attempt to update using time .. last update time is .. (minimum one second step) |
[production] |
13:57 |
<mutante> |
gmond complains about missing kernel modules on spence when trying to start on boot |
[production] |
13:54 |
<mutante> |
spence down, no ssh, no mgmt output, powercycling it .. |
[production] |
02:01 |
<LocalisationUpdate> |
completed (1.18) at Mon Jan 2 02:04:47 UTC 2012 |
[production] |
00:08 |
<tstarling> |
synchronized php-1.18/includes/media/SVGMetadataExtractor.php '[[rev:107792|r107792]]' |
[production] |
2012-01-01
§
|
21:28 |
<Ryan_Lane> |
restarted pdns-recursor on dobson |
[production] |
21:26 |
<Ryan_Lane> |
restarted pdns on ns2 about an hour ago |
[production] |
09:46 |
<apergos> |
restarted lucene search on srch 10, 11, then later on 3,4,9,1 |
[production] |
09:35 |
<apergos> |
removed log.1 from /a/search/logs on search6, it was 35gb |
[production] |
03:55 |
<Tim> |
fixed broken package on search7 and search11 |
[production] |
02:01 |
<LocalisationUpdate> |
completed (1.18) at Sun Jan 1 02:04:30 UTC 2012 |
[production] |
01:36 |
<Tim> |
adjusted FD limit in /etc/init.d/lsearchd on all search servers with sed |
[production] |
01:34 |
<Tim> |
increased FD limit on search6 and restarted lsearchd |
[production] |
00:46 |
<Tim> |
removed some logs on search6 to fix /a disk space exhaustion |
[production] |