2401-2450 of 10000 results (17ms)
2014-07-15 §
20:12 <bd808> log volume up after logstash restart [production]
20:10 <bd808> restarted logstash on logstash1001; log volume looked to be down from "normal" [production]
19:55 <Reedy> Applied extensions/UploadWizard/UploadWizard.sql to rowiki (re bug 59242) [production]
18:53 <manybubbles> bouncing elastic1018 to pick up new merge policy. hopefully that'll help with io thrashing [production]
17:58 <ori> _joe_ deployed jobrunner to all job runners [production]
17:40 <manybubbles> my last attempt to lower the concurrent traffic for recovery was a failure - tried again and succeeded. that seems to have fixed the echo service disruption from taking elastic1017 out of service [production]
17:37 <ori> updated jobrunner to bef32b9120 [production]
17:29 <manybubbles> elastic1017 went nuts again. just shutting elasticsearch off on it for now [production]
16:25 <_joe_> all mw servers updated [production]
16:10 <_joe_> mw1100 and onwards updated [production]
16:00 <_joe_> mw1060-mw1099 updated [production]
15:58 <manybubbles> restarting Elasticsearch on elastic1017 - its thrashing the disk again. I'm still not 100% sure why [production]
15:57 <_joe_> mw1020-mw1059 updated [production]
15:53 <_joe_> mw101[0-9] updated [production]
15:47 <_joe_> starting rolling update of all appservers to apache2 2.2.22-1ubuntu1.6, half of them are on 2.2.22-1ubuntu1.5 now [production]
15:42 <manybubbles> setting the filter cache on one node in the cluster set it on all. yay, I guess. Anyway, I'm going to let it soak for a while. [production]
15:32 <manybubbles> setting filter cache size to 20% on elastic1001 to see if it takes/helps us [production]
15:19 <anomie> Synchronized wmf-config/: SWAT: Remove dead ULS variable [[gerrit:145861]] (duration: 00m 10s) [production]
15:18 <anomie> anomie actually committed a live hack someone left on tin (removing db1035) [production]
15:16 <anomie> updated /a/common to {{Gerrit|I7ca6a16d5}}: Switch jawiki back to lsearchd [production]
13:42 <manybubbles> Synchronized wmf-config/InitialiseSettings.php: jawiki back to lsearchd (duration: 00m 05s) [production]
13:38 <manybubbles> elastic1017 had a load average of 60 - was thashing in io. bounced Elasticsearch. lets see if it recovers on its own [production]
09:09 <_joe_> restarting mailman on sodium, again, for testing [production]
08:50 <godog> restart mailman on sodium after inodes freed [production]
07:27 <_joe_> restarted mailman on sodium [production]
07:22 <_joe_> stopping mailman on sodium for repairing [production]
06:54 <_joe_> killed jenkins stale process on gallium, stuck in a futex while shutting down [production]
04:48 <springle> db1035 crash cycle. down for memtest and stuff [production]
03:34 <LocalisationUpdate> ResourceLoader cache refresh completed at Tue Jul 15 03:33:38 UTC 2014 (duration 33m 37s) [production]
03:01 <LocalisationUpdate> completed (1.24wmf13) at 2014-07-15 03:00:03+00:00 [production]
02:34 <springle> Synchronized wmf-config/db-eqiad.php: depool db1035, crashed (duration: 00m 13s) [production]
02:30 <LocalisationUpdate> completed (1.24wmf12) at 2014-07-15 02:29:02+00:00 [production]
02:27 <springle> powercycle db1035 unresponsive [production]
2014-07-14 §
23:32 <mwalker> Started scap: Updating for SWAT {{gerrit|146304}}, {{gerrit|146306}}, {{gerrit|146149}}, {{gerrit|146165}}, {{gerrit|146166}}, {{gerrit|146282}}, and {{gerrit|146281}}. Also finishing awight's deploy of FundraisingTranslateWorkflow. [production]
20:22 <cscott> updated Parsoid to version d51e64097bb1b18e356584d4f3ddcfd90a6071ba [production]
19:57 <ori> postponing jobrunner deployment to tomorrow; ran over time [production]
19:45 <_joe_> doing the same on mw1064, segfaulted for the same reason [production]
19:44 <_joe_> killed a lone apache2 child on mw1152, stuck in a futex, after a segfault of another apache process. Restarted apache, now working correctly [production]
19:04 <godog> re-enabling mailman on sodium, missing list config restored [production]
18:49 <awight> Synchronized wmf-config: Deploying FundraisingTranslateWorkflow on metawiki (t [production]
18:45 <awight> Synchronized php-1.24wmf13/extensions/FundraisingTranslateWorkflow: Update FundraisingTranslateWorkflow extension (wmf13) (duration: 00m 05s) [production]
18:44 <awight> Synchronized php-1.24wmf12/extensions/FundraisingTranslateWorkflow: Update FundraisingTranslateWorkflow extension (duration: 00m 05s) [production]
18:15 <awight> Synchronized wmf-config: Revert: Deploying FundraisingTranslateWorkflow on metawiki (duration: 00m 04s) [production]
18:03 <awight> Synchronized wmf-config: Deploying FundraisingTranslateWorkflow on metawiki (duration: 00m 05s) [production]
18:03 <awight> updated /a/common to {{Gerrit|Ie7599fb6e}}: jawiki gets Cirrus as primary search [production]
17:43 <Krinkle> npm-cache for integration slaves got corrupted again. Depooling/Repooling integration-slave100{1,2,3} onoe by one to clear cache and let it warm up again. [production]
17:35 <Krinkle> Jenkins slaves in labs are unable to reach zuul.eqiad.wmnet [production]
17:10 <andrewbogott> purging old local-* service group entries from labs ldap (via purgeOldServiceGroups.php) [production]
17:05 <godog> started mailman on sodium post-reboot [production]
17:04 <demon> Synchronized wmf-config/InitialiseSettings.php: nlwiki getting cirrus as primary (duration: 00m 04s) [production]