2018-01-17
§
|
12:10 |
<moritzm> |
updating HHVM in deployment-prep to 3.18.5+wmf4 |
[production] |
11:44 |
<elukey> |
re-run pageview-druid-hourly-wf-2018-1-17-9 and pageview-druid-hourly-wf-2018-1-17-8 (failed due to druid1002's middlemanager being in a weird state after reboot) |
[analytics] |
11:44 |
<elukey> |
restart druid middlemanager on druid1002 |
[analytics] |
11:40 |
<godog> |
bootstrap cassandra-b on restbase1016 |
[production] |
11:28 |
<moritzm> |
rearmed keyholder on neodymium |
[production] |
11:24 |
<moritzm> |
rebooting neodymium for kernel security update |
[production] |
11:19 |
<_joe_> |
restarted nginx on mw1346, was in a bad state |
[production] |
10:51 |
<moritzm> |
reset RAC on chromium, serial console is inaccessible |
[production] |
10:42 |
<moritzm> |
repooling hydrogen |
[production] |
10:39 |
<moritzm> |
rebooting hydrogen for kernel security update |
[production] |
10:38 |
<elukey> |
stopped all crons on hadoop-coordinator-1 |
[analytics] |
10:37 |
<elukey> |
re-run webrequest-druid-hourly-wf-2018-1-17-8 (failed due to druid1002's reboot) |
[analytics] |
10:34 |
<moritzm> |
depooling hydrogen again |
[production] |
10:22 |
<moritzm> |
repooling hydrogen (and pdns-recursor restarted), experiment concluded |
[production] |
10:22 |
<elukey> |
reboot druid1002 for kernel upgrades |
[analytics] |
10:14 |
<moritzm> |
depooling hydrogen (and keeping pdns-recursor stopped for a few minutes to check whether problems with load-balanced recdns traffic are still an issue) |
[production] |
10:11 |
<moritzm> |
reset RAC on hydrogen, serial console was inaccessible |
[production] |
10:01 |
<godog> |
start cassandra-a on restbase1016 |
[production] |
09:53 |
<elukey> |
disable druid middlemanager on druid1002 as prep step for reboot |
[analytics] |
09:52 |
<elukey> |
reboot druid1005 for kernel upgrades |
[production] |
09:46 |
<elukey> |
rebooted analytics1003 |
[analytics] |
09:46 |
<elukey> |
removed upstart config for brrd on eventlog1001 (failing and spamming syslog, old leftover?) |
[analytics] |
09:46 |
<elukey> |
removed upstart config for brrd on eventlog1001 (failing and spamming syslog, old leftover?) |
[production] |
09:34 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Full repool db1101:3318 (duration: 01m 11s) |
[production] |
09:30 |
<moritzm> |
rebooting flerovium and furud for kernel security update |
[production] |
09:17 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Increase traffic for db1101:3318 (duration: 01m 12s) |
[production] |
09:14 |
<godog> |
reimage restbase1016 - T184100 |
[production] |
09:06 |
<elukey> |
reboot analytics1003 for kernel upgrades |
[production] |
09:00 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Depool db1065 - T162807 (duration: 01m 11s) |
[production] |
08:56 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Slowly repool db1101:3318 (duration: 15m 42s) |
[production] |
08:53 |
<elukey> |
disabled camus as prep step for analytics1003 reboot |
[analytics] |
08:44 |
<elukey> |
reboot stat100[456] for kernel upgrades |
[production] |
07:48 |
<elukey> |
restart varnish backend on cp4024 (ton of 503s, icinga alerting for mailbox lag) |
[production] |
07:46 |
<oblivian@neodymium> |
conftool action : set/pooled=inactive; selector: cluster=appserver,name=mw12([0-1][0-9]|20)\.eqiad\.wmnet |
[production] |
07:45 |
<_joe_> |
depooling mw1209-1220 from the appserver cluster for decommissioning, T185004 |
[production] |
06:47 |
<marostegui> |
Remove labsdb1001 and labsdb1003 from tendril - T184832 |
[production] |
06:40 |
<marostegui> |
Stop MySQL on labsdb1001 (already dead) and labsdb1003 - T184832 |
[production] |
06:29 |
<marostegui> |
Stop replication in sync on db1089 and s1 codfw master (db2048) - T162807 |
[production] |
06:28 |
<marostegui> |
Deploy schema change on db1104 - T174569 |
[production] |
06:21 |
<marostegui> |
Upgrade mariadb and kernel on db1104 |
[production] |
06:20 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Depool db1104 - T174569 (duration: 01m 14s) |
[production] |
02:31 |
<l10nupdate@tin> |
scap sync-l10n completed (1.31.0-wmf.16) (duration: 07m 11s) |
[production] |
00:28 |
<ebernhardson@tin> |
Synchronized wmf-config/InitialiseSettings.php: SWAT: T182616 Remove cirrus AB test config for hewiki (duration: 01m 09s) |
[production] |
00:26 |
<ebernhardson@tin> |
Synchronized fc-list: SWAT: T184664 Updating fonts list and sorting it (duration: 01m 12s) |
[production] |
00:21 |
<ebernhardson@tin> |
Synchronized fc-list: SWAT: T184664 Updating fonts list and sorting it (duration: 01m 12s) |
[production] |
00:10 |
<ebernhardson@tin> |
Synchronized php-1.31.0-wmf.16/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.searchSatisfaction.js: SWAT: T182616 Turn off cirrus AB test on hewiki (duration: 01m 12s) |
[production] |
00:08 |
<ebernhardson@tin> |
Synchronized php-1.31.0-wmf.17/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.searchSatisfaction.js: SWAT: T182616 Turn off cirrus AB test on hewiki (duration: 01m 14s) |
[production] |