2014-07-16
§
|
08:52 |
<godog> |
repool ms-fe1003 and depool ms-fe1004 |
[production] |
08:46 |
<godog> |
repool ms-fe1002 and depool ms-fe1003 |
[production] |
08:39 |
<godog> |
depool ms-fe1002 for swift upgrade |
[production] |
05:54 |
<springle> |
resuming page content model schema changes, osc_host.sh processes on terbium ok to kill in emergency |
[production] |
04:23 |
<springle> |
restarted gitblit on antimony |
[production] |
03:04 |
<LocalisationUpdate> |
ResourceLoader cache refresh completed at Wed Jul 16 03:03:41 UTC 2014 (duration 3m 40s) |
[production] |
02:27 |
<LocalisationUpdate> |
completed (1.24wmf13) at 2014-07-16 02:26:12+00:00 |
[production] |
02:15 |
<LocalisationUpdate> |
completed (1.24wmf12) at 2014-07-16 02:14:32+00:00 |
[production] |
01:34 |
<manybubbles> |
moving shards off of elastic101[789] |
[production] |
2014-07-15
§
|
23:20 |
<maxsem> |
Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/146615/ (duration: 00m 04s) |
[production] |
23:16 |
<maxsem> |
Synchronized php-1.24wmf12/extensions/CirrusSearch/: https://gerrit.wikimedia.org/r/#q,146471,n,z (duration: 00m 05s) |
[production] |
23:14 |
<maxsem> |
Synchronized php-1.24wmf13/includes/specials/SpecialVersion.php: (no message) (duration: 00m 04s) |
[production] |
23:13 |
<maxsem> |
Synchronized php-1.24wmf13/extensions/CirrusSearch/: https://gerrit.wikimedia.org/r/#q,146471,n,z (duration: 00m 04s) |
[production] |
22:35 |
<K4-713> |
synchronized payments to afa12be34769000bf8 |
[production] |
21:34 |
<_joe_> |
disabling puppet on mw1001, tests |
[production] |
21:26 |
<aude> |
Synchronized php-1.24wmf13/extensions/Wikidata: Update submodule to fix entity search issue on Wikidata (duration: 00m 21s) |
[production] |
21:15 |
<ori> |
to test r146607, locally modified upstart conf for jobrunner on mw1001 to log to /var/log/mediawiki, and restarted service |
[production] |
20:24 |
<ori> |
restarted jobrunner on all jobrunners |
[production] |
20:23 |
<AaronSchulz> |
Deployed /srv/jobrunner to 31e54c564d369e89613db48977eec0a5891b6498 |
[production] |
20:21 |
<reedy> |
Synchronized docroot and w: (no message) (duration: 00m 21s) |
[production] |
20:18 |
<reedy> |
rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.24wmf13 |
[production] |
20:13 |
<Krinkle> |
Reloading Zuul to deploy If2312bcf18bdbe8dee |
[production] |
20:12 |
<bd808> |
log volume up after logstash restart |
[production] |
20:10 |
<bd808> |
restarted logstash on logstash1001; log volume looked to be down from "normal" |
[production] |
19:55 |
<Reedy> |
Applied extensions/UploadWizard/UploadWizard.sql to rowiki (re bug 59242) |
[production] |
18:53 |
<manybubbles> |
bouncing elastic1018 to pick up new merge policy. hopefully that'll help with io thrashing |
[production] |
17:58 |
<ori> |
_joe_ deployed jobrunner to all job runners |
[production] |
17:40 |
<manybubbles> |
my last attempt to lower the concurrent traffic for recovery was a failure - tried again and succeeded. that seems to have fixed the echo service disruption from taking elastic1017 out of service |
[production] |
17:37 |
<ori> |
updated jobrunner to bef32b9120 |
[production] |
17:29 |
<manybubbles> |
elastic1017 went nuts again. just shutting elasticsearch off on it for now |
[production] |
16:25 |
<_joe_> |
all mw servers updated |
[production] |
16:10 |
<_joe_> |
mw1100 and onwards updated |
[production] |
16:00 |
<_joe_> |
mw1060-mw1099 updated |
[production] |
15:58 |
<manybubbles> |
restarting Elasticsearch on elastic1017 - its thrashing the disk again. I'm still not 100% sure why |
[production] |
15:57 |
<_joe_> |
mw1020-mw1059 updated |
[production] |
15:53 |
<_joe_> |
mw101[0-9] updated |
[production] |
15:47 |
<_joe_> |
starting rolling update of all appservers to apache2 2.2.22-1ubuntu1.6, half of them are on 2.2.22-1ubuntu1.5 now |
[production] |
15:42 |
<manybubbles> |
setting the filter cache on one node in the cluster set it on all. yay, I guess. Anyway, I'm going to let it soak for a while. |
[production] |
15:32 |
<manybubbles> |
setting filter cache size to 20% on elastic1001 to see if it takes/helps us |
[production] |
15:19 |
<anomie> |
Synchronized wmf-config/: SWAT: Remove dead ULS variable [[gerrit:145861]] (duration: 00m 10s) |
[production] |
15:18 |
<anomie> |
anomie actually committed a live hack someone left on tin (removing db1035) |
[production] |
15:16 |
<anomie> |
updated /a/common to {{Gerrit|I7ca6a16d5}}: Switch jawiki back to lsearchd |
[production] |
13:42 |
<manybubbles> |
Synchronized wmf-config/InitialiseSettings.php: jawiki back to lsearchd (duration: 00m 05s) |
[production] |
13:38 |
<manybubbles> |
elastic1017 had a load average of 60 - was thashing in io. bounced Elasticsearch. lets see if it recovers on its own |
[production] |
09:09 |
<_joe_> |
restarting mailman on sodium, again, for testing |
[production] |
08:50 |
<godog> |
restart mailman on sodium after inodes freed |
[production] |
07:27 |
<_joe_> |
restarted mailman on sodium |
[production] |
07:22 |
<_joe_> |
stopping mailman on sodium for repairing |
[production] |
06:54 |
<_joe_> |
killed jenkins stale process on gallium, stuck in a futex while shutting down |
[production] |
04:48 |
<springle> |
db1035 crash cycle. down for memtest and stuff |
[production] |