2018-01-09
ยง
|
20:13 |
<mutante> |
netmon2001 - rebooting |
[production] |
20:12 |
<twentyafterfour@tin> |
Started scap: Deploy 1.31.0-wmf.16 to test wikis and rebuild l10n. refs T180749 |
[production] |
20:04 |
<mutante> |
gerrit2001 - rebooting |
[production] |
20:00 |
<mutante> |
phab2001 - reboot for upgrade |
[production] |
19:20 |
<mepps> |
rolledback SmashPig from 0c45b1a684 to 45aa62650c |
[production] |
19:07 |
<mepps> |
updated SmashPig from 45aa62650c to 0c45b1a684 |
[production] |
18:42 |
<mutante> |
ms-fe3002,ms-fe3001 - powering down, removing from puppet and icinga, ms-be* removing from puppet/icinga (T169518) |
[production] |
18:38 |
<mutante> |
ms-fe3001 - shutting down for decom, removed from puppet |
[production] |
18:38 |
<mutante> |
mw1227 still not showing recovery, using restart-hhvm |
[production] |
18:29 |
<mutante> |
mw1227 killed it one more time and also restarted apache.. now load going down |
[production] |
18:26 |
<mutante> |
mw1227 hhvm-dump-debug > /root/hhvm-dump-debug-20170109-1024PST.log ; then killed hhvm and restarted it with systemctl |
[production] |
17:56 |
<twentyafterfour> |
MediaWiki Train: Branching 1.31.0-wmf.16 |
[production] |
17:41 |
<moritzm> |
rebooting image scalers in codfw for kernel security update (along with HHVM update) |
[production] |
17:30 |
<volans> |
re-enabled Icinga event handlers on RAID checks for lvs3001 |
[production] |
17:17 |
<ema> |
failover traffic back to lvs3001, raid rebuilt |
[production] |
17:15 |
<godog> |
depool restbase cassandra 2 nodes - T184100 |
[production] |
16:35 |
<cmjohnson1> |
disabling pupppet for decom on mw1180-1200 |
[production] |
16:28 |
<volans> |
disabled Icinga event handlers on RAID checks for lvs3001, WIP on the host |
[production] |
16:18 |
<gehel> |
starting cluster reboot for elasticsearch / cirrus codfw |
[production] |
16:09 |
<bd808> |
data-services: added s8.{analytics,web}.db.svc.eqiad.wmflabs and aliases (T181643, T184179) |
[production] |
16:09 |
<elukey> |
re-started mysql on dbstore1002 (and slave replication) after hw maintenance |
[production] |
15:44 |
<godog> |
roll-restart swift frontends in codfw and eqiad |
[production] |
15:40 |
<akosiaris@tin> |
Finished deploy [servermon/servermon@10e165e]: Testing scap check (duration: 00m 02s) |
[production] |
15:40 |
<akosiaris@tin> |
Started deploy [servermon/servermon@10e165e]: Testing scap check |
[production] |
15:31 |
<gehel> |
reboot maps-test* for kernel upgrade |
[production] |
15:30 |
<elukey> |
stop mysql on dbstore1002 as prep step for shutdown (stop all slaves, mysql stop) |
[production] |
15:23 |
<herron> |
puppet master reboots complete. re-enabling puppet agents |
[production] |
15:18 |
<ema> |
lvs3001 disk swap: failover traffic to lvs3003 T166965 |
[production] |
15:10 |
<elukey> |
reboot analytics1028 (hadoop worker and hdfs journal node) for kernel updates |
[production] |
15:07 |
<anomie> |
Creating MCR tables on all wikis (T183486) |
[production] |
15:01 |
<herron> |
temporarily disabling puppet agents and rebooting puppet masters for security updates |
[production] |
15:00 |
<elukey> |
reboot kafka-jumbo1006 for kernel updates |
[production] |
14:59 |
<ema> |
lvs3001: upgrade to latest jessie point release (8.10) T182656 and linux kernel 4.9.65-3+deb9u1~bpo8+2 (KPTI) T184267, replace sdb T166965 |
[production] |
14:48 |
<moritzm> |
rolling reboot of scb in eqiad for kernel security update |
[production] |
14:41 |
<elukey> |
reboot kafka-jumbo1005 for kernel updates |
[production] |
14:36 |
<godog> |
upgrade and roll-restart thumbor in codfw/eqiad - T182656 T183907 T169144 |
[production] |
14:32 |
<elukey> |
reboot kafka1023 for kernel updates |
[production] |
14:21 |
<elukey> |
reboot kafka-jumbo1004 for kernel updates |
[production] |
14:14 |
<moritzm> |
rolling reboot of scb in codfw for kernel security update |
[production] |
14:14 |
<ema> |
lvs3003: upgrade to latest jessie point release (8.10) T182656 and linux kernel 4.9.65-3+deb9u1~bpo8+2 (KPTI) T184267 |
[production] |
14:07 |
<hashar@tin> |
Synchronized wmf-config/InitialiseSettings.php: Save -> Publish on remaining Wikinewses which haven't updated - https://gerrit.wikimedia.org/r/#/c/403077/ (duration: 00m 53s) |
[production] |
14:06 |
<ema> |
lvs3002: upgrade to latest jessie point release (8.10) T182656 and linux kernel 4.9.65-3+deb9u1~bpo8+2 (KPTI) T184267 |
[production] |
14:04 |
<elukey> |
reboot kafka1022 for kernel updates |
[production] |
14:01 |
<godog> |
copy poolcounter from jessie-wikimedia into stretch-wikimedia - T183385 |
[production] |
13:51 |
<elukey> |
reboot kafka-jumbo1003 for kernel updates |
[production] |
13:34 |
<moritzm> |
rebooting remaining video scalers in eqiad for kernel security update (along with HHVM update) |
[production] |
13:10 |
<elukey> |
reboot kafka1020 for kernel updates |
[production] |
13:07 |
<mobrovac@tin> |
Finished deploy [restbase/deploy@837f5a9]: Force deploy on all targets - T184110 (duration: 07m 23s) |
[production] |
13:00 |
<mobrovac@tin> |
Started deploy [restbase/deploy@837f5a9]: Force deploy on all targets - T184110 |
[production] |
12:58 |
<moritzm> |
rebooting labnodepool* for kernel security update |
[production] |