2015-12-22
§
|
09:58 |
<hashar> |
Delete integration-zuul-debian-glue-* files. Leftover from an experiment |
[releng] |
09:57 |
<hashar> |
deleted cdb-* Jenkins jobs. Repo uses generic jobs |
[releng] |
09:54 |
<apergos> |
precise and trusty salt packages with wmf patches deployed manually on dataset1001 and analytics1001, seem to work fine |
[production] |
08:42 |
<jynus> |
restarting and reconfiguring mysql at db2036 |
[production] |
02:30 |
<l10nupdate@tin> |
ResourceLoader cache refresh completed at Tue Dec 22 02:30:28 UTC 2015 (duration 6m 54s) |
[production] |
02:23 |
<mwdeploy@tin> |
sync-l10n completed (1.27.0-wmf.9) (duration: 09m 47s) |
[production] |
01:42 |
<YuviPanda> |
rebooting tools-worker-08 |
[tools] |
00:29 |
<krenair@tin> |
Synchronized php-1.27.0-wmf.9/extensions/VisualEditor: https://gerrit.wikimedia.org/r/#/c/260492/ (duration: 00m 32s) |
[production] |
00:22 |
<krenair@tin> |
Synchronized php-1.27.0-wmf.9/extensions/SyntaxHighlight_GeSHi/modules/ve-syntaxhighlight/ve.ui.MWSyntaxHighlightDialogTool.js: https://gerrit.wikimedia.org/r/#/c/260429/ (duration: 00m 30s) |
[production] |
2015-12-21
§
|
20:49 |
<godog> |
restbase1004 bootstrap failed, restbase1007-a is down java.lang.RuntimeException: A node required to move the data consistently is down (/10.64.0.230). |
[production] |
20:28 |
<YuviPanda> |
depool ores-web-01 from lb |
[ores] |
20:06 |
<hashar> |
Downgrading Jenkins plugin from 1.24 to 1.21 |
[releng] |
19:27 |
<legoktm> |
running checkLocalUser.php --delete=1 for real this time on terbium |
[production] |
19:22 |
<godog> |
reimage restbase1004 |
[production] |
19:14 |
<paravoid> |
powercycling mw1011 |
[production] |
19:11 |
<paravoid> |
rolling restart of hhvm on the eqiad jobrunners |
[production] |
19:01 |
<marxarelli> |
Purging TMPDIR contents on idle integration slaves |
[releng] |
18:47 |
<jynus> |
common-sync: Copying to mw1016.eqiad.wmnet from tin.eqiad.wmnet |
[production] |
18:44 |
<YuviPanda> |
reboot tools-proxy-01 |
[tools] |
18:43 |
<marxarelli> |
Updating slave scripts on all integration slaves to deploy I4edf7099acfeb0f06ea2042902bef03097137d6e |
[releng] |
18:35 |
<ori> |
correction: previous log message was for mw1015, not mw1017 |
[production] |
18:31 |
<YuviPanda> |
failover proxy to tools-proxy-02 |
[tools] |
18:31 |
<valhallasw`cloud> |
and restarted with fab start-jobs. Welcome back, wikibugs. |
[tools.wikibugs] |
18:31 |
<legoktm> |
same thing on 1015 |
[releng] |
18:30 |
<valhallasw`cloud> |
ah, there are SGE processes running. OK, killing those as well. |
[tools.wikibugs] |
18:28 |
<valhallasw`cloud> |
what's even weirder is that it starts both wikibugs.py and redis2irc.py, which are two distinct SGE jobs. Uuh? |
[tools.wikibugs] |
18:28 |
<legoktm> |
deleted some large npm directories from tmpfs on 1017 due to tmpfs being full |
[releng] |
18:27 |
<ori> |
mw1017: enabled jemalloc profiling, restarted hhvm, now running hhvm-collect-heaps |
[production] |
18:27 |
<valhallasw`cloud> |
yet it respawns! What on earth. Again from 208.80.155.186, and killed again. |
[tools.wikibugs] |
18:26 |
<valhallasw`cloud> |
killed wikibugs manually, no SGE in sight. |
[tools.wikibugs] |
18:24 |
<valhallasw`cloud> |
using `listlogins` in nickserv, we find one running on 208.80.155.186 (-1409), one on 208.80.155.145 (-1405, just restarted) |
[tools.wikibugs] |
18:20 |
<valhallasw`cloud> |
duplicate wikibugs, trying qmod -rj |
[tools.wikibugs] |
17:48 |
<akosiaris> |
restarted hhvm on mw1012.eqiad.wmnet |
[production] |
16:57 |
<thcipriani> |
timeout on sync-file to mw1016.eqiad.wmnet |
[production] |
16:56 |
<thcipriani@tin> |
Synchronized php-1.27.0-wmf.9/extensions/Popups/Popups.hooks.php: SWAT: Use ExtensionRegistry to determine whether TextExtracts is installed [[gerrit:260346]] (duration: 02m 48s) |
[production] |
16:34 |
<jynus> |
sync-common to mw1085 |
[production] |
16:26 |
<jynus> |
powercycling mw1085.eqiad.wmnet |
[production] |
16:22 |
<thcipriani> |
mw1085.eqiad.wmnet times out on SSH connection |
[production] |
16:19 |
<godog> |
reboot restbase1007, load through the roof |
[production] |
16:18 |
<thcipriani@tin> |
Synchronized php-1.27.0-wmf.9/extensions/CentralNotice/resources/subscribing/ext.centralNotice.geoIP.js: SWAT: Update CentralNotice [[gerrit:260316]] (duration: 03m 03s) |
[production] |
16:08 |
<godog> |
depool restbase1007 |
[production] |
16:01 |
<apergos> |
jessie packages for salt with local patches deployed on restbase1001, looks fine but just in case. |
[production] |
15:44 |
<godog> |
adding new 1TB disk to restbase1007 |
[production] |
14:22 |
<andrewbogott> |
disabling puppet on labnet1002 for dnsmasq tests |
[production] |
14:07 |
<MaxSem> |
me and yurik are nuking old maps data and reimporting planet |
[production] |
13:46 |
<jynus> |
extending online s2-master data disk by +100GB |
[production] |
13:15 |
<akosiaris> |
disabled puppet on maps-test2001 and commented out osmupdater crontab entry until we fix the sync process |
[production] |
13:04 |
<hashar> |
restarting cxserver on deployment-cxserver03 |
[releng] |
11:02 |
<jynus> |
emergency restart of db1047's mysql |
[production] |
10:48 |
<hashar> |
Banned testing-shinken- bot (useless duplicate notifications) |
[releng] |