2022-08-04
ยง
|
18:24 |
<dzahn@cumin2002> |
START - Cookbook sre.hosts.remove-downtime for mw2271.codfw.wmnet |
[production] |
18:23 |
<milimetric@deploy1002> |
Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] |
[production] |
18:23 |
<milimetric@deploy1002> |
Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 01m 32s) |
[production] |
18:23 |
<dzahn@cumin2002> |
conftool action : set/pooled=yes; selector: dc=codfw,name=mw2276.codfw.wmnet |
[production] |
18:23 |
<dzahn@cumin2002> |
conftool action : set/pooled=yes; selector: dc=codfw,name=mw2275.codfw.wmnet |
[production] |
18:23 |
<dzahn@cumin2002> |
conftool action : set/pooled=yes; selector: dc=codfw,name=mw2274.codfw.wmnet |
[production] |
18:22 |
<dzahn@cumin2002> |
conftool action : set/pooled=yes; selector: dc=codfw,name=mw2273.codfw.wmnet |
[production] |
18:22 |
<dzahn@cumin2002> |
conftool action : set/pooled=yes; selector: dc=codfw,name=mw2272.codfw.wmnet |
[production] |
18:22 |
<Emperor> |
shutdown moss-fe2001.codfw.wmnet,ms-fe2011.codfw.wmnet,ms-be20[34,35,42,48,68].codfw.wmnet PDU work T310145 |
[production] |
18:22 |
<mvernon@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 8 hosts with reason: PDU work |
[production] |
18:21 |
<milimetric@deploy1002> |
Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] |
[production] |
18:21 |
<mvernon@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 8 hosts with reason: PDU work |
[production] |
18:21 |
<milimetric@deploy1002> |
Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 03s) |
[production] |
18:21 |
<milimetric@deploy1002> |
Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] |
[production] |
18:21 |
<milimetric@deploy1002> |
Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 03s) |
[production] |
18:21 |
<milimetric@deploy1002> |
Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] |
[production] |
18:20 |
<milimetric@deploy1002> |
Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 01m 49s) |
[production] |
18:20 |
<mvernon@cumin1001> |
END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 9 hosts |
[production] |
18:20 |
<mvernon@cumin1001> |
START - Cookbook sre.hosts.remove-downtime for 9 hosts |
[production] |
18:19 |
<milimetric@deploy1002> |
Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] |
[production] |
18:14 |
<mutante> |
mw2272 and upwards: scap pull, checking monitoring, repooling.. one by one |
[production] |
18:13 |
<dzahn@cumin2002> |
conftool action : set/pooled=yes; selector: dc=codfw,name=mw2271.codfw.wmnet |
[production] |
18:12 |
<btullis@deploy1002> |
Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 51s) |
[production] |
18:11 |
<btullis@deploy1002> |
Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] |
[production] |
18:06 |
<btullis@deploy1002> |
Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 01m 54s) |
[production] |
18:04 |
<btullis@deploy1002> |
Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] |
[production] |
17:55 |
<sukhe@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2009.codfw.wmnet with reason: shutdown for PDU upgrade |
[production] |
17:55 |
<sukhe@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2009.codfw.wmnet with reason: shutdown for PDU upgrade |
[production] |
17:43 |
<mutante> |
maps2008 - downtime and shutdown for D3 maintenance |
[production] |
17:42 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps2008.codfw.wmnet with reason: codfw reboots |
[production] |
17:42 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on maps2008.codfw.wmnet with reason: codfw reboots |
[production] |
17:42 |
<mutante> |
thunmbor2006 - downtime and shutdown for D3 maintenance |
[production] |
17:42 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on thumbor2006.codfw.wmnet with reason: codfw reboots |
[production] |
17:41 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on thumbor2006.codfw.wmnet with reason: codfw reboots |
[production] |
17:39 |
<mutante> |
mw2386 - systemctl reset-failed |
[production] |
17:31 |
<mutante> |
phab2001 - systemctl restart ssh-phab, attempting to clear Icinga pybal alerts, related to reboots |
[production] |
17:30 |
<sukhe@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on dns2001.wikimedia.org with reason: shutdown for PDU upgrade |
[production] |
17:30 |
<sukhe@cumin2002> |
START - Cookbook sre.hosts.downtime for 3:00:00 on dns2001.wikimedia.org with reason: shutdown for PDU upgrade |
[production] |
17:29 |
<sukhe@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns2001.wikimedia.org with reason: shutdown for PDU upgrade |
[production] |
17:29 |
<sukhe@cumin2002> |
START - Cookbook sre.hosts.downtime for 1:00:00 on dns2001.wikimedia.org with reason: shutdown for PDU upgrade |
[production] |
17:28 |
<Amir1> |
dbmaint at s4@eqiad (T312863) |
[production] |
17:26 |
<bd808@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply |
[production] |
17:26 |
<bd808@deploy1002> |
helmfile [eqiad] START helmfile.d/services/developer-portal: apply |
[production] |
17:24 |
<bd808@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/developer-portal: apply |
[production] |
17:23 |
<bd808@deploy1002> |
helmfile [codfw] START helmfile.d/services/developer-portal: apply |
[production] |
17:23 |
<bd808@deploy1002> |
helmfile [staging] DONE helmfile.d/services/developer-portal: apply |
[production] |
17:23 |
<bd808@deploy1002> |
helmfile [staging] START helmfile.d/services/developer-portal: apply |
[production] |
17:20 |
<mutante> |
[an-launcher1002:~] $ sudo systemctl reset-failed |
[production] |
17:20 |
<mvernon@cumin1001> |
conftool action : set/pooled=no; selector: name=ms-fe2012.codfw.wmnet |
[production] |
17:18 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance |
[production] |