2011-12-16
§
|
22:30 |
<RobH> |
reclaimed space on db9, restarted mysql, services seem to be recovering |
[production] |
22:24 |
<maplebed> |
restarting mysql on db9; brief downtime for a number of apps (bugzilla, blog, etc.) expected. |
[production] |
22:03 |
<RobH> |
db9 space reclaimed back to 94% full, related services should start recovering |
[production] |
21:57 |
<RobH> |
db9 disk full, related services are messing up, fixing |
[production] |
21:56 |
<RobH> |
kicking apache for bz related issues on kaulen |
[production] |
19:14 |
<catrope> |
synchronized php-1.18/resources/startup.js 'touch' |
[production] |
19:07 |
<catrope> |
synchronized wmf-config/InitialiseSettings.php 'Set AFTv4 lottery odds to 100% on en_labswikimedia' |
[production] |
18:48 |
<LeslieCarr> |
removed the ssl* yaml logs on stafford to fix the puppet not running error |
[production] |
16:13 |
<apergos> |
thumb cleaner to bed for the night. definitely need an alarm clock for this... good thing it's only got about 4 days of backlog left |
[production] |
15:41 |
<RobH> |
es1002 being actively worked on for hdd controller testing |
[production] |
15:39 |
<RobH> |
lvs1003 disk dead per RT 1549, will torubleshoot on site later today or Monday |
[production] |
15:32 |
<RobH> |
lvs1003 unresponsive to serial console, rebooting |
[production] |
15:18 |
<RobH> |
reinstalling dataset1 |
[production] |
14:45 |
<mutante> |
puppet was broken on all servers including "nrpe" due to package conflict with nagios-plugins-basic i added to base, revert+fix |
[production] |
13:29 |
<RoanKattouw> |
Dropping and recreating AFTv5 tables on en_labswikimedia and enwiki |
[production] |
13:26 |
<catrope> |
synchronized php-1.18/extensions/ArticleFeedbackv5/ 'Updating to trunk state' |
[production] |
13:25 |
<mutante> |
tweaked Nagios earlier today: external command_check_interval & event_broker_options (see comments in gerrit Id3b4a458) |
[production] |
13:01 |
<mark> |
Found lvs5 and lvs6 with offload-gro enabled, even though it's set disabled in /etc/network/interfaces... corrected |
[production] |
09:21 |
<apergos> |
restarted lighthttpd on ds2, it had stopped (and why didn't nagios tell us? ) |
[production] |
08:38 |
<mutante> |
spence - had killed additional notifications.cgi and history.cgi procs, waited 5 minutes, load went down a lot, restarting nagios |
[production] |
08:23 |
<mutante> |
spence - almost unusable, Nagios notifications.cgi and history.cgi use a lot of memory, stopping Nagios, watching swap |
[production] |
08:15 |
<mutante> |
spence slow again, side-note: tried to use "sar" to investigate but "Please check if data collecting is enabled in /etc/default/sysstat" (want to?) |
[production] |
07:54 |
<nikerabbit> |
synchronized php-1.18/extensions/WebFonts/resources/ext.webfonts.js 'JS fix [[rev:106418|r106418]]' |
[production] |
07:09 |
<apergos> |
thumbs cleaner awake for the day |
[production] |
01:57 |
<LocalisationUpdate> |
failed (1.18) at Fri Dec 16 02:00:14 UTC 2011 |
[production] |
2011-12-15
§
|
23:19 |
<LeslieCarr> |
pushing rule to planet.wikimedia.org which should redirect all https to http |
[production] |
23:00 |
<LeslieCarr> |
puppetized planet.wikimedia.org on singer |
[production] |
22:41 |
<LeslieCarr> |
removing https support from planet.wikimedia.org |
[production] |
21:43 |
<awjrichards> |
synchronized php/extensions/LandingCheck/SpecialLandingCheck.php '[[rev:106377|r106377]]' |
[production] |
21:42 |
<awjrichards> |
synchronized php/extensions/LandingCheck/LandingCheck.php '[[rev:106377|r106377]]' |
[production] |
21:34 |
<awjrichards> |
synchronized php/extensions/ContributionTracking/ContributionTracking_body.php '[[rev:106375|r106375]]' |
[production] |
21:34 |
<awjrichards> |
synchronized php/extensions/ContributionTracking/ContributionTracking.processor.php '[[rev:106375|r106375]]' |
[production] |
21:33 |
<awjrichards> |
synchronized php/extensions/ContributionTracking/ContributionTracking.php '[[rev:106375|r106375]]' |
[production] |
20:21 |
<K4-713> |
synchronized payments cluster to [[rev:106360|r106360]] |
[production] |
19:16 |
<LeslieCarr> |
removed mw* and virt* yaml files from stafford in order to clear up broken files and make puppet run again |
[production] |
18:59 |
<LocalisationUpdate> |
completed (1.18) at Thu Dec 15 19:02:58 UTC 2011 |
[production] |
18:58 |
<RobH> |
cp1018 also offline, stealing cables from it for testing |
[production] |
18:52 |
<reedy> |
synchronized php-1.18/includes/Block.php 'Testing fix for [[bugzilla:33101|bug 33101]]' |
[production] |
18:43 |
<catrope> |
synchronized php-1.18/extensions/LocalisationUpdate/ '[[rev:106352|r106352]]' |
[production] |
18:25 |
<RobH> |
es1002 and cp1019 offline for harddisk controller testing |
[production] |
18:07 |
<RoanKattouw> |
Running LU /again/ to hopefully fix issues |
[production] |
16:31 |
<mark> |
Fixed database entries in william's exim.conf |
[production] |
16:24 |
<catrope> |
synchronized wmf-config/InitialiseSettings.php 'Disabling Contest extension because of XSS' |
[production] |
15:31 |
<apergos> |
thumb cleaner to bed for the night |
[production] |
14:59 |
<LocalisationUpdate> |
completed (1.18) at Thu Dec 15 15:02:05 UTC 2011 |
[production] |
14:52 |
<RoanKattouw> |
Running l10nupdate by hand |
[production] |