2015-12-22
§
|
18:10 |
<jynus> |
disabling event scheduling on db1046 |
[production] |
18:03 |
<jynus> |
rolling schema change (ALTER TABLE ENGINE=TokuDB) on m4-master (db1046) log (eventlogging) |
[production] |
17:00 |
<hashar> |
If in doubt, restart Jenkins. |
[releng] |
16:44 |
<godog> |
bounce cassandra on restbase1004, restart bootstrap |
[production] |
16:42 |
<mutante> |
powercycling crashed mw1144 |
[production] |
16:41 |
<jynus> |
converting dbstore2001 (delayed slave) into an actual delayed slave, adding redundancy to dbstore1002 |
[production] |
16:40 |
<godog> |
bounce cassandra on restbase1003 |
[production] |
16:15 |
<akosiaris> |
upgrade cassandra on maps-test2001 |
[production] |
16:15 |
<akosiaris> |
upgrade cassandra on maps-test2002 |
[production] |
15:53 |
<mutante> |
kafka1001,1002 - crit - eventlogging not running (?) |
[production] |
15:52 |
<mutante> |
restbase1003 - disk space, restbase1008 - disk space, restbase1004 - cassandra cql refused |
[production] |
15:23 |
<akosiaris> |
upgrade cassandra on maps-test2003 |
[production] |
15:06 |
<jynus> |
restarting and reconfiguring mysql at dbstore2001 |
[production] |
15:06 |
<mutante> |
labtestcontrol2001 - puppet had not been running for a while, a bunch of changes have been applied incl. keys and passwords |
[production] |
15:04 |
<mutante> |
enabling puppet on labtestcontrol2001 |
[production] |
15:04 |
<akosiaris> |
upgraded cassandra on maps-test2004 |
[production] |
11:54 |
<apergos> |
salt packages with wmf packages precise running on ms-{bf}e* in esams; trusty running on analytics103* in eqiad; jessie running on restbase2* in codfw |
[production] |
11:43 |
<godog> |
restart cassandra bootstrap on restbase1004 |
[production] |
10:09 |
<jynus> |
online resizing /srv/postgres on labsdb1006 +100GB |
[production] |
10:06 |
<hashar> |
Restarting Jenkins |
[releng] |
10:06 |
<hashar> |
Restarting Jenkins |
[production] |
09:58 |
<hashar> |
Delete integration-zuul-debian-glue-* files. Leftover from an experiment |
[releng] |
09:57 |
<hashar> |
deleted cdb-* Jenkins jobs. Repo uses generic jobs |
[releng] |
09:54 |
<apergos> |
precise and trusty salt packages with wmf patches deployed manually on dataset1001 and analytics1001, seem to work fine |
[production] |
08:42 |
<jynus> |
restarting and reconfiguring mysql at db2036 |
[production] |
02:30 |
<l10nupdate@tin> |
ResourceLoader cache refresh completed at Tue Dec 22 02:30:28 UTC 2015 (duration 6m 54s) |
[production] |
02:23 |
<mwdeploy@tin> |
sync-l10n completed (1.27.0-wmf.9) (duration: 09m 47s) |
[production] |
01:42 |
<YuviPanda> |
rebooting tools-worker-08 |
[tools] |
00:29 |
<krenair@tin> |
Synchronized php-1.27.0-wmf.9/extensions/VisualEditor: https://gerrit.wikimedia.org/r/#/c/260492/ (duration: 00m 32s) |
[production] |
00:22 |
<krenair@tin> |
Synchronized php-1.27.0-wmf.9/extensions/SyntaxHighlight_GeSHi/modules/ve-syntaxhighlight/ve.ui.MWSyntaxHighlightDialogTool.js: https://gerrit.wikimedia.org/r/#/c/260429/ (duration: 00m 30s) |
[production] |
2015-12-21
§
|
20:49 |
<godog> |
restbase1004 bootstrap failed, restbase1007-a is down java.lang.RuntimeException: A node required to move the data consistently is down (/10.64.0.230). |
[production] |
20:28 |
<YuviPanda> |
depool ores-web-01 from lb |
[ores] |
20:06 |
<hashar> |
Downgrading Jenkins plugin from 1.24 to 1.21 |
[releng] |
19:27 |
<legoktm> |
running checkLocalUser.php --delete=1 for real this time on terbium |
[production] |
19:22 |
<godog> |
reimage restbase1004 |
[production] |
19:14 |
<paravoid> |
powercycling mw1011 |
[production] |
19:11 |
<paravoid> |
rolling restart of hhvm on the eqiad jobrunners |
[production] |
19:01 |
<marxarelli> |
Purging TMPDIR contents on idle integration slaves |
[releng] |
18:47 |
<jynus> |
common-sync: Copying to mw1016.eqiad.wmnet from tin.eqiad.wmnet |
[production] |
18:44 |
<YuviPanda> |
reboot tools-proxy-01 |
[tools] |
18:43 |
<marxarelli> |
Updating slave scripts on all integration slaves to deploy I4edf7099acfeb0f06ea2042902bef03097137d6e |
[releng] |
18:35 |
<ori> |
correction: previous log message was for mw1015, not mw1017 |
[production] |
18:31 |
<YuviPanda> |
failover proxy to tools-proxy-02 |
[tools] |
18:31 |
<valhallasw`cloud> |
and restarted with fab start-jobs. Welcome back, wikibugs. |
[tools.wikibugs] |
18:31 |
<legoktm> |
same thing on 1015 |
[releng] |
18:30 |
<valhallasw`cloud> |
ah, there are SGE processes running. OK, killing those as well. |
[tools.wikibugs] |
18:28 |
<valhallasw`cloud> |
what's even weirder is that it starts both wikibugs.py and redis2irc.py, which are two distinct SGE jobs. Uuh? |
[tools.wikibugs] |
18:28 |
<legoktm> |
deleted some large npm directories from tmpfs on 1017 due to tmpfs being full |
[releng] |
18:27 |
<ori> |
mw1017: enabled jemalloc profiling, restarted hhvm, now running hhvm-collect-heaps |
[production] |
18:27 |
<valhallasw`cloud> |
yet it respawns! What on earth. Again from 208.80.155.186, and killed again. |
[tools.wikibugs] |