2018-01-17
ยง
|
13:45 |
<chasemp> |
labstore2002:~# sudo update-grub && /sbin/reboot |
[production] |
13:40 |
<chasemp> |
labstore2001:~# /sbin/reboot |
[production] |
13:39 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Slowly repool db1104 (duration: 01m 13s) |
[production] |
13:31 |
<akosiaris> |
reboot acrab for PCID,INVPCID enabling |
[production] |
13:22 |
<marostegui> |
Deploy schema change on db1099:3318 - https://phabricator.wikimedia.org/T174569 |
[production] |
13:22 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Depool db1099:3318 - T174569 (duration: 01m 12s) |
[production] |
13:17 |
<moritzm> |
upgrading app server canaries to 3.18.5+wmf4 |
[production] |
13:12 |
<marostegui> |
Fixing drifts on db1065 - T162807 |
[production] |
13:10 |
<hashar> |
nodepool: updating snapshot to get hhvm +wmf4 for T185024 : nodepool image-update wmflabs-eqiad snapshot-ci-jessie |
[releng] |
12:28 |
<moritzm> |
uploading HHVM 3.18.5+wmf4 for jessie-wikimedia to apt.wikimedia.org (3.18.7 with the patch https://github.com/facebook/hhvm/commit/bd7b2bcfe70b053a3a001480653012f68599250f backed out) |
[production] |
12:10 |
<moritzm> |
updating HHVM in deployment-prep to 3.18.5+wmf4 |
[production] |
11:44 |
<elukey> |
re-run pageview-druid-hourly-wf-2018-1-17-9 and pageview-druid-hourly-wf-2018-1-17-8 (failed due to druid1002's middlemanager being in a weird state after reboot) |
[analytics] |
11:44 |
<elukey> |
restart druid middlemanager on druid1002 |
[analytics] |
11:40 |
<godog> |
bootstrap cassandra-b on restbase1016 |
[production] |
11:28 |
<moritzm> |
rearmed keyholder on neodymium |
[production] |
11:24 |
<moritzm> |
rebooting neodymium for kernel security update |
[production] |
11:19 |
<_joe_> |
restarted nginx on mw1346, was in a bad state |
[production] |
10:51 |
<moritzm> |
reset RAC on chromium, serial console is inaccessible |
[production] |
10:42 |
<moritzm> |
repooling hydrogen |
[production] |
10:39 |
<moritzm> |
rebooting hydrogen for kernel security update |
[production] |
10:38 |
<elukey> |
stopped all crons on hadoop-coordinator-1 |
[analytics] |
10:37 |
<elukey> |
re-run webrequest-druid-hourly-wf-2018-1-17-8 (failed due to druid1002's reboot) |
[analytics] |
10:34 |
<moritzm> |
depooling hydrogen again |
[production] |
10:22 |
<moritzm> |
repooling hydrogen (and pdns-recursor restarted), experiment concluded |
[production] |
10:22 |
<elukey> |
reboot druid1002 for kernel upgrades |
[analytics] |
10:14 |
<moritzm> |
depooling hydrogen (and keeping pdns-recursor stopped for a few minutes to check whether problems with load-balanced recdns traffic are still an issue) |
[production] |
10:11 |
<moritzm> |
reset RAC on hydrogen, serial console was inaccessible |
[production] |
10:01 |
<godog> |
start cassandra-a on restbase1016 |
[production] |
09:53 |
<elukey> |
disable druid middlemanager on druid1002 as prep step for reboot |
[analytics] |
09:52 |
<elukey> |
reboot druid1005 for kernel upgrades |
[production] |
09:46 |
<elukey> |
rebooted analytics1003 |
[analytics] |
09:46 |
<elukey> |
removed upstart config for brrd on eventlog1001 (failing and spamming syslog, old leftover?) |
[analytics] |
09:46 |
<elukey> |
removed upstart config for brrd on eventlog1001 (failing and spamming syslog, old leftover?) |
[production] |
09:34 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Full repool db1101:3318 (duration: 01m 11s) |
[production] |
09:30 |
<moritzm> |
rebooting flerovium and furud for kernel security update |
[production] |
09:17 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Increase traffic for db1101:3318 (duration: 01m 12s) |
[production] |
09:14 |
<godog> |
reimage restbase1016 - T184100 |
[production] |
09:06 |
<elukey> |
reboot analytics1003 for kernel upgrades |
[production] |
09:00 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Depool db1065 - T162807 (duration: 01m 11s) |
[production] |
08:56 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Slowly repool db1101:3318 (duration: 15m 42s) |
[production] |
08:53 |
<elukey> |
disabled camus as prep step for analytics1003 reboot |
[analytics] |
08:44 |
<elukey> |
reboot stat100[456] for kernel upgrades |
[production] |
07:48 |
<elukey> |
restart varnish backend on cp4024 (ton of 503s, icinga alerting for mailbox lag) |
[production] |
07:46 |
<oblivian@neodymium> |
conftool action : set/pooled=inactive; selector: cluster=appserver,name=mw12([0-1][0-9]|20)\.eqiad\.wmnet |
[production] |
07:45 |
<_joe_> |
depooling mw1209-1220 from the appserver cluster for decommissioning, T185004 |
[production] |
06:47 |
<marostegui> |
Remove labsdb1001 and labsdb1003 from tendril - T184832 |
[production] |
06:40 |
<marostegui> |
Stop MySQL on labsdb1001 (already dead) and labsdb1003 - T184832 |
[production] |
06:29 |
<marostegui> |
Stop replication in sync on db1089 and s1 codfw master (db2048) - T162807 |
[production] |
06:28 |
<marostegui> |
Deploy schema change on db1104 - T174569 |
[production] |
06:21 |
<marostegui> |
Upgrade mariadb and kernel on db1104 |
[production] |