2020-06-11
ยง
|
23:35 |
<bstorm_> |
rebooting tools-k8s-control-2 because it seems to be confused on NFS, interestingly enough |
[tools] |
23:30 |
<MacFan4000> |
restarting for config/code changes |
[tools.zppixbot-test] |
23:27 |
<Reedy> |
Reloading Zuul to deploy https://gerrit.wikimedia.org/r/604719 T255096 |
[releng] |
22:31 |
<wm-bot> |
<zppixbot> auto-update@website: Synced website repo in 39.s |
[tools.zppixbot] |
20:44 |
<Urbanecm> |
tools.stewardbots@tools-sgebastion-07:~$ bash ./stewardbots/StewardBot/restart_stewardbot.sh |
[tools.stewardbots] |
20:34 |
<pt1979@cumin2001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
20:31 |
<pt1979@cumin2001> |
START - Cookbook sre.hosts.downtime |
[production] |
20:15 |
<pt1979@cumin2001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
20:13 |
<pt1979@cumin2001> |
START - Cookbook sre.hosts.downtime |
[production] |
20:00 |
<pt1979@cumin2001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
19:59 |
<jhuneidi@deploy1001> |
rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.36 |
[production] |
19:58 |
<pt1979@cumin2001> |
START - Cookbook sre.hosts.downtime |
[production] |
19:33 |
<akosiaris> |
apply emergency sessionstore fixes in codfw as well |
[production] |
19:32 |
<akosiaris@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'sessionstore' for release 'production' . |
[production] |
19:20 |
<gilles@deploy1001> |
Finished deploy [performance/asoranking@0a096c4]: T252424 (duration: 00m 47s) |
[production] |
19:19 |
<bstorm_> |
proceeding with failback to labstore1004 now that DRBD devices are consistent T224582 |
[admin] |
19:19 |
<gilles@deploy1001> |
Started deploy [performance/asoranking@0a096c4]: T252424 |
[production] |
19:17 |
<RhinosF1> |
i meant wikimedia/wikitech - my brain was down as well apparently |
[tools.zppixbot-test] |
19:16 |
<RhinosF1> |
restarted a few times while wikidata was down to deploy various stuff |
[tools.zppixbot-test] |
19:16 |
<bd808> |
Testing wikitech logging |
[tools.zppixbot-test] |
19:15 |
<bd808> |
Testing wikitech logging |
[tools.stashbot] |
19:12 |
<akosiaris> |
repool eqiad for sessionstore |
[production] |
19:12 |
<akosiaris@cumin1001> |
conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=sessionstore |
[production] |
19:10 |
<akosiaris> |
remove the podaffinity restrictions for sessionstore in eqiad |
[production] |
19:10 |
<akosiaris@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'sessionstore' for release 'production' . |
[production] |
19:07 |
<akosiaris> |
increase memory limits for sessionstore in eqiad to 400Mi from 300Mi |
[production] |
19:07 |
<akosiaris@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'sessionstore' for release 'production' . |
[production] |
19:00 |
<akosiaris> |
increase sessionstore capacity in codfw from 4 pods to 6 |
[production] |
19:00 |
<akosiaris@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'sessionstore' for release 'production' . |
[production] |
18:59 |
<akosiaris> |
depool eqiad, switch to codfw |
[production] |
18:58 |
<akosiaris@cumin1001> |
conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=sessionstore |
[production] |
18:47 |
<Texas> |
kubectl delete pods --all |
[tools.zppixbot-test] |
18:47 |
<Texas> |
kubectl delete pods --all |
[tools.zppixbot-test] |
18:08 |
<ppchelko@deploy1001> |
Synchronized wmf-config/reverse-proxy-staging.php: Beta: Switch from HTCP purging to kafka purging gerrit:603530, reverse-proxy-staging.php (duration: 01m 06s) |
[production] |
18:06 |
<ppchelko@deploy1001> |
Synchronized wmf-config/InitialiseSettings-labs.php: Beta: Switch from HTCP purging to kafka purging gerrit:603530, IS-labs.php (duration: 01m 06s) |
[production] |
17:29 |
<mbsantos@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' . |
[production] |
17:26 |
<mbsantos@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' . |
[production] |
17:22 |
<mbsantos@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' . |
[production] |
17:22 |
<bstorm_> |
delaying failback labstore1004 for drive syncs T224582 |
[admin] |
17:19 |
<mbsantos@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' . |
[production] |
17:17 |
<bstorm_> |
failing NFS back to labstore1004 to complete the upgrade process T224582 |
[admin] |
17:12 |
<bstorm_> |
reboot for stretch upgrade on labstore1004 T224582 |
[production] |
16:49 |
<bstorm_> |
doing stretch upgrade for labstore1004 T224582 |
[production] |
16:36 |
<bstorm_> |
rebooting labstore1004 for upgrades T224582 |
[production] |
16:15 |
<bstorm_> |
failing over NFS for labstore1004 to labstore1005 T224582 |
[admin] |
16:12 |
<bstorm_> |
downtimed labstore1005 for upgrades on 1004 since that will alert as well T224582 |
[production] |
16:10 |
<bstorm_> |
downtimed labstore1004 for upgrades T224582 |
[production] |
15:50 |
<cstone> |
SmashPig revision changed from b9de3c7aac to 2246685626 |
[production] |
15:34 |
<jmm@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) |
[production] |
15:31 |
<jmm@cumin1001> |
START - Cookbook sre.hosts.reboot-single |
[production] |