2021-02-08
§
|
11:25 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database |
[production] |
11:25 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database |
[production] |
11:25 |
<Urbanecm> |
Deploy security patch for T71617 |
[production] |
11:25 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host mc2019.codfw.wmnet |
[production] |
11:23 |
<hnowlan@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet |
[production] |
11:23 |
<hnowlan> |
resyncing postgres on maps1005 |
[production] |
11:22 |
<hnowlan> |
resyncing postgres on maps1001 |
[production] |
11:22 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host lvs2010.codfw.wmnet |
[production] |
11:19 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4005.ulsfo.wmnet |
[production] |
11:14 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host lvs4005.ulsfo.wmnet |
[production] |
11:11 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4006.ulsfo.wmnet |
[production] |
11:07 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host lvs4006.ulsfo.wmnet |
[production] |
11:00 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4007.ulsfo.wmnet |
[production] |
10:55 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host lvs4007.ulsfo.wmnet |
[production] |
10:25 |
<jiji@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2025.codfw.wmnet |
[production] |
10:05 |
<moritzm> |
updating netboot images to Buster 10.8 T274099 |
[production] |
10:05 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host mc2025.codfw.wmnet |
[production] |
09:43 |
<XioNoX> |
failover pfw3-eqiad RG1 to node 0 T263833 |
[production] |
09:42 |
<marostegui> |
Stop MySQL on db1111 T273982 |
[production] |
09:36 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4007.ulsfo.wmnet |
[production] |
09:23 |
<vgutierrez> |
restart varnish-fe on cp1087 |
[production] |
09:21 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host lvs4007.ulsfo.wmnet |
[production] |
09:20 |
<vgutierrez> |
rolling restart of LVS instances to catch up on kernel upgrades |
[production] |
09:00 |
<gehel> |
depool and restart blazegraph on wdqs1005 / wdqs1012 |
[production] |
08:56 |
<XioNoX> |
push pfw policies T273989 |
[production] |
08:33 |
<godog> |
swift codfw-prod decrease HDD weight for ms-be20[16-27] - T272837 |
[production] |
07:08 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1111 T273982', diff saved to https://phabricator.wikimedia.org/P14229 and previous config saved to /var/cache/conftool/dbconfig/20210208-070858-marostegui.json |
[production] |
06:50 |
<effie> |
Removed mc1024 from mcrouter, some resharding is expected |
[production] |
06:13 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Remove db1094 from dbctl T273710', diff saved to https://phabricator.wikimedia.org/P14228 and previous config saved to /var/cache/conftool/dbconfig/20210208-061319-marostegui.json |
[production] |
2021-02-06
§
|
08:59 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 |
[production] |
08:58 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 |
[production] |
08:52 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 |
[production] |
08:52 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 |
[production] |
03:40 |
<ryankemper> |
Deleted dump taking up diskspace on `wdqs1009`, disk space warning will resolve now |
[production] |
01:30 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1319.eqiad.wmnet |
[production] |
01:29 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1313.eqiad.wmnet |
[production] |
01:25 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1319.eqiad.wmnet |
[production] |
01:25 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1313.eqiad.wmnet |
[production] |
01:00 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2265.codfw.wmnet |
[production] |
00:57 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1366.eqiad.wmnet |
[production] |
00:46 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1366.eqiad.wmnet |
[production] |
00:46 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw2265.codfw.wmnet |
[production] |
00:30 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1313.eqiad.wmnet with reason: REIMAGE |
[production] |
00:28 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw1313.eqiad.wmnet with reason: REIMAGE |
[production] |
00:25 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1319.eqiad.wmnet with reason: REIMAGE |
[production] |
00:23 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw1319.eqiad.wmnet with reason: REIMAGE |
[production] |
00:19 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2265.codfw.wmnet with reason: REIMAGE |
[production] |
00:17 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2265.codfw.wmnet with reason: REIMAGE |
[production] |
00:15 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1366.eqiad.wmnet with reason: REIMAGE |
[production] |