2020-06-11
ยง
|
20:31 |
<pt1979@cumin2001> |
START - Cookbook sre.hosts.downtime |
[production] |
20:15 |
<pt1979@cumin2001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
20:13 |
<pt1979@cumin2001> |
START - Cookbook sre.hosts.downtime |
[production] |
20:00 |
<pt1979@cumin2001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
19:59 |
<jhuneidi@deploy1001> |
rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.36 |
[production] |
19:58 |
<pt1979@cumin2001> |
START - Cookbook sre.hosts.downtime |
[production] |
19:33 |
<akosiaris> |
apply emergency sessionstore fixes in codfw as well |
[production] |
19:32 |
<akosiaris@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'sessionstore' for release 'production' . |
[production] |
19:20 |
<gilles@deploy1001> |
Finished deploy [performance/asoranking@0a096c4]: T252424 (duration: 00m 47s) |
[production] |
19:19 |
<gilles@deploy1001> |
Started deploy [performance/asoranking@0a096c4]: T252424 |
[production] |
19:12 |
<akosiaris> |
repool eqiad for sessionstore |
[production] |
19:12 |
<akosiaris@cumin1001> |
conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=sessionstore |
[production] |
19:10 |
<akosiaris> |
remove the podaffinity restrictions for sessionstore in eqiad |
[production] |
19:10 |
<akosiaris@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'sessionstore' for release 'production' . |
[production] |
19:07 |
<akosiaris> |
increase memory limits for sessionstore in eqiad to 400Mi from 300Mi |
[production] |
19:07 |
<akosiaris@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'sessionstore' for release 'production' . |
[production] |
19:00 |
<akosiaris> |
increase sessionstore capacity in codfw from 4 pods to 6 |
[production] |
19:00 |
<akosiaris@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'sessionstore' for release 'production' . |
[production] |
18:59 |
<akosiaris> |
depool eqiad, switch to codfw |
[production] |
18:58 |
<akosiaris@cumin1001> |
conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=sessionstore |
[production] |
18:08 |
<ppchelko@deploy1001> |
Synchronized wmf-config/reverse-proxy-staging.php: Beta: Switch from HTCP purging to kafka purging gerrit:603530, reverse-proxy-staging.php (duration: 01m 06s) |
[production] |
18:06 |
<ppchelko@deploy1001> |
Synchronized wmf-config/InitialiseSettings-labs.php: Beta: Switch from HTCP purging to kafka purging gerrit:603530, IS-labs.php (duration: 01m 06s) |
[production] |
17:29 |
<mbsantos@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' . |
[production] |
17:26 |
<mbsantos@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' . |
[production] |
17:22 |
<mbsantos@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' . |
[production] |
17:19 |
<mbsantos@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' . |
[production] |
17:12 |
<bstorm_> |
reboot for stretch upgrade on labstore1004 T224582 |
[production] |
16:49 |
<bstorm_> |
doing stretch upgrade for labstore1004 T224582 |
[production] |
16:36 |
<bstorm_> |
rebooting labstore1004 for upgrades T224582 |
[production] |
16:12 |
<bstorm_> |
downtimed labstore1005 for upgrades on 1004 since that will alert as well T224582 |
[production] |
16:10 |
<bstorm_> |
downtimed labstore1004 for upgrades T224582 |
[production] |
15:50 |
<cstone> |
SmashPig revision changed from b9de3c7aac to 2246685626 |
[production] |
15:34 |
<jmm@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) |
[production] |
15:31 |
<jmm@cumin1001> |
START - Cookbook sre.hosts.reboot-single |
[production] |
15:25 |
<moritzm> |
installing buster kernel security updates (no reboots yet) |
[production] |
15:04 |
<jmm@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) |
[production] |
15:04 |
<mforns@deploy1001> |
Finished deploy [analytics/refinery@c969b56]: Regular analytics weekly train [analytics/refinery@c969b56afae1b2532e07f0ff699c2ce161360966] (duration: 01m 39s) |
[production] |
15:04 |
<root@cumin1001> |
END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99) |
[production] |
15:04 |
<root@cumin1001> |
START - Cookbook sre.network.prepare-upgrade |
[production] |
15:02 |
<mforns@deploy1001> |
Started deploy [analytics/refinery@c969b56]: Regular analytics weekly train [analytics/refinery@c969b56afae1b2532e07f0ff699c2ce161360966] |
[production] |
15:02 |
<jmm@cumin1001> |
START - Cookbook sre.hosts.reboot-single |
[production] |
14:56 |
<herron> |
bounced elasticsearch on logstash1012 |
[production] |
14:41 |
<akosiaris@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) |
[production] |
14:40 |
<akosiaris@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
14:37 |
<herron> |
enabled VO incident resolution notification in global settings |
[production] |
14:34 |
<akosiaris@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) |
[production] |
14:31 |
<akosiaris@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
14:30 |
<godog> |
bounce logstash on logstash1009, apparent GC death spiral |
[production] |
14:03 |
<jmm@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) |
[production] |
14:03 |
<jmm@cumin1001> |
START - Cookbook sre.hosts.reboot-single |
[production] |