2020-05-21
§
|
19:16 |
<andrewbogott> |
systemctl disable block_sync-tools-project.service on cloudbackup2001.codfw.wmnet to avoid stepping on current upgrade |
[admin] |
18:24 |
<twentyafterfour> |
restarting phabricator on phab1001 to deploy https://phabricator.wikimedia.org/rPHEX2687d08786a9dadcbaa96709de991f471f239830 |
[production] |
17:24 |
<elukey> |
add druid100[7,8] to the druid public cluster (not serving load balancer traffic for the moment, only joining the cluster) - T252771 |
[analytics] |
17:24 |
<bblack> |
anycast experiment done, all back to normal |
[production] |
17:20 |
<bblack> |
anycast experimentation commencing in ulsfo (test route withdrawal)... |
[production] |
17:04 |
<bstorm_> |
starting labstore1005 upgrades T224582 |
[production] |
16:44 |
<elukey> |
roll restart druid historical nodes on druid100[4-6] (public cluster) to pick up new settings - T252771 |
[analytics] |
16:42 |
<Reedy> |
Reloading Zuul to deploy https://gerrit.wikimedia.org/r/597825 |
[releng] |
16:34 |
<Reedy> |
Reloading Zuul to deploy https://gerrit.wikimedia.org/r/597820 |
[releng] |
16:19 |
<James_F> |
Zuul: [mediawiki/extensions/Bootstrap] Switch down to quibble-composer for now. |
[releng] |
16:14 |
<andrew@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
16:12 |
<andrew@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
16:04 |
<Urbanecm> |
Restart StewardBot |
[tools.stewardbots] |
16:04 |
<sbassett@deploy1001> |
Synchronized private/PrivateSettings.php: Update mitigations for T250887 (duration: 01m 08s) |
[production] |
16:01 |
<Urbanecm> |
Investigating StewardBot's outage |
[tools.stewardbots] |
15:55 |
<Reedy> |
Reloading Zuul to deploy https://gerrit.wikimedia.org/r/597810 |
[releng] |
15:48 |
<andrewbogott> |
rebuilding cloudnet1003.eqiad.wmnet with Debian Buster for T253124 |
[production] |
15:48 |
<andrewbogott> |
re-imaging cloudnet1003 with Buster |
[admin] |
15:23 |
<ZI_Jony> |
staff restarted CVNBot21 on #cvn-mediawiki |
[cvn] |
15:22 |
<XioNoX> |
Add BGP between cr1/2-eqiad and authdns1001 - T253196 |
[production] |
15:09 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
15:09 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
15:08 |
<dzahn@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
15:08 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
15:07 |
<dzahn@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
15:07 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
14:59 |
<dzahn@cumin1001> |
conftool action : set/pooled=inactive; selector: name=mw217[0-2].codfw.wmnet |
[production] |
14:59 |
<dzahn@cumin1001> |
conftool action : set/pooled=inactive; selector: name=mw216[0-9].codfw.wmnet |
[production] |
14:58 |
<dzahn@cumin1001> |
conftool action : set/pooled=inactive; selector: name=mw215[8-9].codfw.wmnet |
[production] |
14:53 |
<bstorm_> |
adding the hiera values to horizon for bootstrapping k8s T211096 |
[paws] |
14:50 |
<bblack@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
14:47 |
<bblack@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
14:44 |
<akosiaris@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'canary' . |
[production] |
14:39 |
<arturo> |
point record `k8s.svc.paws.eqiad1.wikimedia.cloud` to `172.16.1.186` (which is paws-k8s-control-1, for the initial bootstrap) (T211096) |
[paws] |
14:33 |
<akosiaris> |
upload helmfile 0.109.0 to apt.wikimedia.org/buster-wikimedia and stretch-wikimedia, component main |
[production] |
14:02 |
<elukey> |
restart druid kafka supervisor for wmf_netflow after maintenance |
[analytics] |
13:53 |
<elukey> |
restart druid-historical on an-druid100[1,2] to pick up new settings |
[analytics] |
13:51 |
<ZI_Jony> |
restarted Cubbie on #cvn-commons-uploads |
[cvn] |
13:51 |
<vgutierrez> |
depool cp4032 for some ats tests |
[production] |
13:22 |
<mutante> |
cloudnet1004 - reboot to test PXE boot |
[production] |
13:17 |
<elukey> |
kill wmf_netflow druid supervisor for maintenance |
[analytics] |
13:13 |
<elukey> |
stop druid-daemons on druid100[1-3] (one at the time) to move the druid partition from /srv/druid to /srv (didn't think about it before) - T252771 |
[analytics] |
12:48 |
<arturo> |
created record `k8s.svc.paws.eqiad1.wikimedia.cloud` pointing to `172.16.0.191` (which is paws-k8s-haproxy-1) (T211096) |
[paws] |
12:44 |
<andrewbogott> |
reimaging cloudnet1004.eqiad.wmnet for T253124 |
[production] |
12:34 |
<arturo> |
created and transferred DNS zone `svc.paws.eqiad1.wikimedia.cloud` (T211096) |
[paws] |
12:29 |
<elukey> |
roll restart druid-public cluster (druid100[4-6], backend for the AQS API) to apply new settings + openjdk upgrade - T252771 |
[production] |
12:13 |
<mutante> |
depooled mw2158 through mw2172 to make room again in C3 as planned (T247018) |
[production] |
12:12 |
<marostegui> |
Repool labsdb1011 into the analytics role 🤞- T249188 |
[production] |
12:12 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw217[0-2].codfw.wmnet |
[production] |
12:10 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw216[0-9].codfw.wmnet |
[production] |