2018-05-29
§
|
13:19 |
<gehel> |
rolling restart of relforge for plugin upgrade - T193734 |
[production] |
13:16 |
<volans> |
running puppet on failed only hosts |
[production] |
13:12 |
<volans> |
stopped ircecho temporarily |
[production] |
12:54 |
<moritzm> |
installing xdg-utils security updates |
[production] |
11:21 |
<marostegui> |
Restar db1125 mysql - T195595 |
[production] |
11:14 |
<moritzm> |
upgrading snapshot hosts to hhvm-wikidiff 1.7.0 (HHVM is unused, just for completeness) |
[production] |
11:08 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Disable read only on s6 T194939 T187962 (duration: 01m 37s) |
[production] |
10:59 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Enable read only on s6 T194939 T187962 (duration: 01m 35s) |
[production] |
10:55 |
<XioNoX> |
Eqiad row C server move starting - T187962 |
[production] |
10:53 |
<XioNoX> |
Eqiad row C server move starting |
[production] |
10:35 |
<moritzm> |
upgrading mw1308-mw1311 to hhvm-wikidiff 1.7.0 (HHVM bytecode cache needs to be pruned during rollout) |
[production] |
10:09 |
<mobrovac@tin> |
Synchronized wmf-config/InitialiseSettings.php: Switch all jobs to EventBus file 2/2 - T190327 T195500 (duration: 01m 47s) |
[production] |
10:06 |
<mobrovac@tin> |
Synchronized wmf-config/jobqueue.php: Switch all jobs to EventBus file 1/2 - T190327 T195500 (duration: 01m 39s) |
[production] |
10:05 |
<ppchelko@tin> |
Finished deploy [cpjobqueue/deploy@c6dc83d]: Enable all jobs apart from exceptions for everything. T190327 (duration: 00m 58s) |
[production] |
10:04 |
<ppchelko@tin> |
Started deploy [cpjobqueue/deploy@c6dc83d]: Enable all jobs apart from exceptions for everything. T190327 |
[production] |
09:20 |
<XioNoX> |
redirect ns0 to baham - T187962 |
[production] |
09:16 |
<XioNoX> |
disable ping1001 redirect - T187962 |
[production] |
09:13 |
<marostegui> |
Downtime s6 replicas for 4 hours - T195595 |
[production] |
09:07 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Depool all databases in row C - T187962 (duration: 01m 35s) |
[production] |
09:05 |
<moritzm> |
upgrading labweb servers to hhvm-wikidiff 1.7.0 (HHVM bytecode cache needs to be pruned during rollout) |
[production] |
08:40 |
<jynus> |
performing topology changes on s6 ahead of a possible failover |
[production] |
08:24 |
<moritzm> |
upgrading remaining API servers in eqiad to hhvm-wikidiff 1.7.0 (HHVM bytecode cache needs to be pruned during rollout) |
[production] |
07:56 |
<moritzm> |
upgrading mw1276-mw1290 to hhvm-wikidiff 1.7.0 (HHVM bytecode cache needs to be pruned during rollout) |
[production] |
07:49 |
<elukey> |
reimage druid1002 to debian stretch |
[production] |
07:47 |
<gilles@tin> |
Synchronized wmf-config/InitialiseSettings.php: T187299 Launch performance survey on ruwiki (duration: 01m 50s) |
[production] |
07:26 |
<moritzm> |
upgrading remaining app servers in eqiad to hhvm-wikidiff 1.7.0 (HHVM bytecode cache needs to be pruned during rollout) |
[production] |
06:52 |
<elukey> |
roll restart hadoop master daemons to pick up the new zookeeper settings |
[production] |
05:20 |
<marostegui> |
Restart MySQL on db2045 (s8 codfw master) - T195598 |
[production] |
05:13 |
<marostegui> |
Stop MySQL on db2094 and db2095 for testing - T190704 |
[production] |
04:12 |
<l10nupdate@tin> |
ResourceLoader cache refresh completed at Tue May 29 04:12:10 UTC 2018 (duration 14m 32s) |
[production] |
03:57 |
<l10nupdate@tin> |
scap sync-l10n completed (1.32.0-wmf.5) (duration: 14m 29s) |
[production] |
02:59 |
<l10nupdate@tin> |
scap sync-l10n completed (1.32.0-wmf.4) (duration: 13m 18s) |
[production] |
2018-05-28
§
|
20:14 |
<twentyafterfour> |
Test failures on https://gerrit.wikimedia.org/r/#/c/435825/ are preventing deployment of the fix for a critical deployment blocker (see T195514) 1.32.0-wmf.5 still blocked refs T191051 |
[production] |
20:10 |
<twentyafterfour> |
train still held up by test failures: https://gerrit.wikimedia.org/r/#/c/435825/ |
[production] |
20:02 |
<elukey> |
restart kafka on kafka1003 as attempt to solve the under-replicated partitions warning |
[production] |
19:22 |
<twentyafterfour@tin> |
Synchronized php-1.32.0-wmf.5/extensions/CentralNotice/: sync wmf.5 CentralNotice for AndyRussG (duration: 01m 25s) |
[production] |
19:12 |
<elukey> |
roll restart of kafka-mirror maker (main eqiad -> jumbo) on kafka-jumbo* for zookeeper conf updates |
[production] |
19:07 |
<twentyafterfour> |
attempting to get the wmf.5 train back on track. Deploying a fix for T195514 (https://gerrit.wikimedia.org/r/c/435292/) to unblock T191051 |
[production] |
18:16 |
<elukey> |
restart kafka mirror maker on kafka1012->14 - failed after the last round of kafka restarts |
[production] |
17:26 |
<elukey> |
roll restart of kafka on kafka-jumbo* to pick up the new zookeeper settings |
[production] |
17:20 |
<gehel@tin> |
Finished deploy [wdqs/wdqs@0e40344]: WDQS updater and GUI (duration: 08m 59s) |
[production] |
17:19 |
<elukey> |
restart kafka on kafka1012->23 to pick up the new zookeeper settings |
[production] |
17:11 |
<gehel@tin> |
Started deploy [wdqs/wdqs@0e40344]: WDQS updater and GUI |
[production] |
17:08 |
<moritzm> |
upgrading mwdebug servers in codfw to hhvm-wikidiff 1.7.0 (HHVM bytecode cache needs to be pruned during rollout) |
[production] |
16:48 |
<moritzm> |
upgrading codfw video scalers to hhvm-wikidiff 1.7.0 |
[production] |
16:31 |
<elukey> |
roll restart kafka on kafka100[1-3] to pick up new zookeeper settings |
[production] |
16:21 |
<elukey> |
zookeeper cluster restart completed (main-eqiad / conf1*) |
[production] |
16:18 |
<elukey> |
stop and mask zookeeper on conf1002 |
[production] |
16:16 |
<elukey> |
restart prometheus-burrow-exporter on kafkamon* |
[production] |
16:12 |
<moritzm> |
upgrading job runners in codfw to hhvm-wikidiff 1.7.0 (HHVM bytecode cache needs to be pruned during rollout) |
[production] |