2018-01-10
§
|
09:50 |
<godog> |
shut cassandra 2 on restbase legacy nodes - T184100 |
[production] |
09:40 |
<moritzm> |
rebooting kubernetes workers (plus staging hosts) for kernel security update |
[production] |
09:39 |
<ema> |
eqiad LVSs: upgrade to latest jessie point release (8.10) T182656 and linux kernel 4.9.65-3+deb9u1~bpo8+2 (KPTI) T184267 |
[production] |
09:32 |
<marostegui> |
Upgrade kernel on db1067 |
[production] |
09:27 |
<godog> |
stop restbase on cassandra 2 nodes - T184100 |
[production] |
09:15 |
<marostegui> |
Deploy schema change on db1051 - T174569 |
[production] |
09:12 |
<moritzm> |
rebooting radium (tor relay) for kernel security update |
[production] |
08:42 |
<marostegui> |
Stop replication in sync on db1089 and db1067 - T162807 |
[production] |
08:41 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Depool db1067 and db1089 - T162807 (duration: 01m 05s) |
[production] |
08:38 |
<marostegui> |
Deploy schema change on s5 dbstore1001 - T174569 |
[production] |
08:33 |
<moritzm> |
rebooting mw1299-mw1306 (job runners) for kernel security update (along with update to HHVM 3.18.6) |
[production] |
08:28 |
<hashar> |
contint1001: upgraded Zuul 2.5.0-8-gcbc7f62-wmf4jessie1 .. 2.5.0-8-gcbc7f62-wmf6 | T158243 |
[production] |
08:13 |
<marostegui> |
Deploy schema change on s5 dbstore1002 - T174569 |
[production] |
07:44 |
<moritzm> |
rebooting mw1262-mw1275 for kernel security update (along with update to HHVM 3.18.6) |
[production] |
07:37 |
<marostegui> |
Drop external_user from wikidatawiki - T184247 |
[production] |
06:17 |
<marostegui> |
Deploy schema change on s5 codfw master (db2052) with replication (this will generate lag on codfw) - T174569 |
[production] |
02:24 |
<l10nupdate@tin> |
scap sync-l10n completed (1.31.0-wmf.15) (duration: 06m 02s) |
[production] |
01:39 |
<mutante> |
mw1226 - high load - hhvm-dump-debug > /root/hhvm-dump-debug-20170109-1739PST.log ; restart-hhvm |
[production] |
00:43 |
<mutante> |
rebooting gerrit server for kernel upgrade |
[production] |
00:18 |
<mutante> |
rebooting phabricator server for kernel upgrade |
[production] |
2018-01-09
§
|
22:52 |
<godog> |
ms-be1033 truncate unrotated and big server.log |
[production] |
22:22 |
<aaron@tin> |
Synchronized php-1.31.0-wmf.16/includes/Setup.php: 68b4bbfbc12c626 (duration: 01m 15s) |
[production] |
22:20 |
<mutante> |
netmon2001 - arming keyholder for rancid |
[production] |
21:10 |
<mepps> |
updated SmashPig from 45aa62650c to 778e8f87b4 |
[production] |
20:57 |
<twentyafterfour@tin> |
Finished scap: Deploy 1.31.0-wmf.16 to test wikis and rebuild l10n. refs T180749 (attempt 2) (duration: 36m 34s) |
[production] |
20:21 |
<twentyafterfour@tin> |
Started scap: Deploy 1.31.0-wmf.16 to test wikis and rebuild l10n. refs T180749 (attempt 2) |
[production] |
20:14 |
<twentyafterfour@tin> |
scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="test2wiki" --outdir="/tmp/scap_l10n_3984299293" --threads=10 --lang en --quiet' returned non-zero exit status 1 (duration: 02m 44s) |
[production] |
20:13 |
<mutante> |
netmon2001 - rebooting |
[production] |
20:12 |
<twentyafterfour@tin> |
Started scap: Deploy 1.31.0-wmf.16 to test wikis and rebuild l10n. refs T180749 |
[production] |
20:04 |
<mutante> |
gerrit2001 - rebooting |
[production] |
20:00 |
<mutante> |
phab2001 - reboot for upgrade |
[production] |
19:20 |
<mepps> |
rolledback SmashPig from 0c45b1a684 to 45aa62650c |
[production] |
19:07 |
<mepps> |
updated SmashPig from 45aa62650c to 0c45b1a684 |
[production] |
18:42 |
<mutante> |
ms-fe3002,ms-fe3001 - powering down, removing from puppet and icinga, ms-be* removing from puppet/icinga (T169518) |
[production] |
18:38 |
<mutante> |
ms-fe3001 - shutting down for decom, removed from puppet |
[production] |
18:38 |
<mutante> |
mw1227 still not showing recovery, using restart-hhvm |
[production] |
18:29 |
<mutante> |
mw1227 killed it one more time and also restarted apache.. now load going down |
[production] |
18:26 |
<mutante> |
mw1227 hhvm-dump-debug > /root/hhvm-dump-debug-20170109-1024PST.log ; then killed hhvm and restarted it with systemctl |
[production] |
17:56 |
<twentyafterfour> |
MediaWiki Train: Branching 1.31.0-wmf.16 |
[production] |
17:41 |
<moritzm> |
rebooting image scalers in codfw for kernel security update (along with HHVM update) |
[production] |
17:30 |
<volans> |
re-enabled Icinga event handlers on RAID checks for lvs3001 |
[production] |
17:17 |
<ema> |
failover traffic back to lvs3001, raid rebuilt |
[production] |
17:15 |
<godog> |
depool restbase cassandra 2 nodes - T184100 |
[production] |
16:35 |
<cmjohnson1> |
disabling pupppet for decom on mw1180-1200 |
[production] |
16:28 |
<volans> |
disabled Icinga event handlers on RAID checks for lvs3001, WIP on the host |
[production] |
16:18 |
<gehel> |
starting cluster reboot for elasticsearch / cirrus codfw |
[production] |
16:09 |
<bd808> |
data-services: added s8.{analytics,web}.db.svc.eqiad.wmflabs and aliases (T181643, T184179) |
[production] |
16:09 |
<elukey> |
re-started mysql on dbstore1002 (and slave replication) after hw maintenance |
[production] |
15:44 |
<godog> |
roll-restart swift frontends in codfw and eqiad |
[production] |
15:40 |
<akosiaris@tin> |
Finished deploy [servermon/servermon@10e165e]: Testing scap check (duration: 00m 02s) |
[production] |