2016-01-10
§
|
10:13 |
<ori> |
disabled categoryMembershipChange on mw1165 too, then restart jobrunner / jobchron / hhvm on mw1165 and mw1164 |
[production] |
08:55 |
<ori> |
mw1166 -- disabled puppet; disabled categoryMembershipChange jobs |
[production] |
08:48 |
<ori> |
mw1167 -- disabled puppet; disabled deleteLinks and refreshLinks* jobs |
[production] |
08:45 |
<ori> |
mw1168 -- disabled puppet; disabled restbase jobs |
[production] |
08:41 |
<ori> |
mw1169 -- disables cirrus jobs. |
[production] |
08:33 |
<ori> |
Attempting to isolate cause of T122069 by toggling job types on mw1169. Disabling Puppet to prevent it from clobbering config changes. |
[production] |
08:29 |
<paravoid> |
restarting hhvm on jobrunners again |
[production] |
04:58 |
<paravoid> |
powercycling mw1005, mw1008, mw1009 -- unresponsive due to OOM |
[production] |
04:56 |
<paravoid> |
restarting HHVM on eqiad jobrunners, OOM, memleak faster than the 24h restarts |
[production] |
2016-01-07
§
|
23:24 |
<akosiaris> |
repooled scb1002 for mobileapps |
[production] |
23:24 |
<akosiaris> |
enabled puppet,salt on scb1001 |
[production] |
23:23 |
<mobrovac> |
mobileapps deploying 58b371a on scb1001 |
[production] |
23:09 |
<mobrovac> |
mobileapps deploying 58b371a on scb1002 |
[production] |
23:01 |
<akosiaris> |
apt-mark hold nodejs on scb1001, etherpad1001 and maps-test200{1,2,3,4} |
[production] |
22:58 |
<akosiaris> |
disable puppet and salt on scb1001 from nodejs 4.2 transition |
[production] |
22:57 |
<akosiaris> |
depool scb1002 for mobileapps. Transition to nodejs 4.2 ongoing |
[production] |
19:21 |
<YuviPanda> |
started tools / maps backup on labstore1001 |
[production] |
19:13 |
<YuviPanda> |
remove snapshots others20150815030010, others20150815030010, maps20151216040005 and maps20151028040004 that were all stale and should've been removed anyway (on labstore2001) |
[production] |
19:13 |
<YuviPanda> |
remove snapshots others20150815030010, others20150815030010, maps20151216040005 and maps20151028040004 that were all stale and should've been removed anyway |
[production] |
19:11 |
<YuviPanda> |
run sudo lvremove backup/tools20151216020005 on labstore2001 to clean up full snapshot |
[production] |
19:11 |
<jynus> |
setting up watchdog process killing long running queries on db1051 |
[production] |
18:54 |
<_joe_> |
also resetting the drac |
[production] |
18:53 |
<_joe_> |
powercycling ms-be1013 |
[production] |
02:32 |
<l10nupdate@tin> |
l10nupdate@tin ResourceLoader cache refresh completed at Thu Jan 7 02:32:04 UTC 2016 (duration 6m 54s) |
[production] |
02:25 |
<mwdeploy@tin> |
mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 33s) |
[production] |
2016-01-05
§
|
22:38 |
<aaron@tin> |
aaron@tin Synchronized rpc: 830e1ed8d80295710dc02f18102b4fadae7fca86 (duration: 00m 55s) |
[production] |
18:34 |
<jzerebecki@tin> |
jzerebecki@tin scap aborted: deploy-log (duration: 00m 04s) |
[production] |
18:34 |
<jzerebecki@tin> |
jzerebecki@tin Started scap: deploy-log |
[production] |
15:47 |
<ottomata> |
transitioned analytics1001 to active namenode |
[production] |
03:51 |
<krinkle@tin> |
krinkle@tin Synchronized php-1.27.0-wmf.9/includes/specials/SpecialJavaScriptTest.php: Idaacf71870 (duration: 00m 30s) |
[production] |
03:50 |
<krinkle@tin> |
krinkle@tin Synchronized php-1.27.0-wmf.9/resources/src/mediawiki.special/: Idaacf71870 (duration: 00m 30s) |
[production] |
03:49 |
<krinkle@tin> |
krinkle@tin Synchronized php-1.27.0-wmf.9/resources/Resources.php: Idaacf71870 (duration: 00m 36s) |
[production] |
02:31 |
<l10nupdate@tin> |
l10nupdate@tin ResourceLoader cache refresh completed at Tue Jan 5 02:31:46 UTC 2016 (duration 6m 54s) |
[production] |
02:24 |
<mwdeploy@tin> |
mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 13s) |
[production] |
2016-01-04
§
|
20:50 |
<mutante> |
ms-be1011 - powercycled, was frozen |
[production] |
20:43 |
<mutante> |
ms-be2007 - System halted!Error: Integrated RAID |
[production] |
20:42 |
<mutante> |
ms-be2007 - powercycle (was status: on but all frozen) (i assume xfs like be2006 appears in SAL recently) |
[production] |
20:36 |
<mutante> |
mw2019 - puppet run (icinga claimed it failed but just here) |
[production] |
20:19 |
<mutante> |
rutherfordium - attempt to restart with gnt-instance |
[production] |
20:12 |
<mutante> |
rutherfordium (people.wm) was down for days per icinga - then magically fixes itself when i connect to console but before even loggin in (ganeti VM) |
[production] |
20:00 |
<mutante> |
mw1123 - start HHVM (was 503 and service stopped) |
[production] |
19:28 |
<mutante> |
elastic1006 - out of disk - gzip eqiad_index_search_slowlog.log files |
[production] |