2019-09-13
ยง
|
14:36 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.downtime |
[production] |
14:30 |
<akosiaris@> |
helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . |
[production] |
14:30 |
<moritzm> |
installing cups security update on buster (only client-side libs installed) |
[production] |
14:22 |
<moritzm> |
installing bzip2 update from Buster 10.1 point release |
[production] |
14:18 |
<moritzm> |
installing reportbug update from Buster 10.1 point release |
[production] |
14:14 |
<akosiaris@> |
helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' . |
[production] |
14:05 |
<akosiaris@> |
helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' . |
[production] |
13:57 |
<oblivian@deploy1001> |
Synchronized wmf-config/logging.php: unbreak mediawiki logging on scandium (duration: 01m 04s) |
[production] |
13:28 |
<akosiaris@> |
helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . |
[production] |
13:27 |
<akosiaris@> |
helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . |
[production] |
13:21 |
<akosiaris@> |
helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . |
[production] |
13:20 |
<akosiaris@> |
helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . |
[production] |
13:19 |
<akosiaris@> |
helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . |
[production] |
12:56 |
<_joe_> |
banning more urls on maps1003 |
[production] |
12:37 |
<_joe_> |
temp ban of class of urls on maps1003 nginx |
[production] |
12:14 |
<jbond42> |
add timing information to maps1003 access logs |
[production] |
11:39 |
<jbond42> |
enable access logs on maps1003 |
[production] |
11:38 |
<_joe_> |
manually raising the worker heap limit to 600 MB on kartotherian on maps1003 |
[production] |
11:11 |
<elukey> |
reboot an-conf100* (Analytics Zookeeper nodes - not yet in production) for kernel upgrades |
[production] |
11:10 |
<elukey> |
reboot an-tool1007 (runs turnilo) for kernel upgrades |
[production] |
11:08 |
<jmm@cumin2001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
11:08 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.downtime |
[production] |
11:05 |
<godog> |
silence kartotherian pages for 2h, known issue |
[production] |
10:47 |
<vgutierrez> |
rebooting acmechief-test servers to catch up latest kernel upgrades |
[production] |
10:42 |
<akosiaris@> |
helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . |
[production] |
10:41 |
<moritzm> |
reimage restbase2009 to stretch T224553 |
[production] |
10:38 |
<moritzm> |
repool restbase1018 after reimage to stretch and completed Cassandra bootstrap |
[production] |
10:36 |
<akosiaris@> |
helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . |
[production] |
10:36 |
<akosiaris@> |
helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . |
[production] |
10:13 |
<vgutierrez> |
disable ATS-TLS debug options on cp5001 - T232298 |
[production] |
10:09 |
<akosiaris@> |
helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . |
[production] |
09:46 |
<gehel> |
re-enabling /geoline on maps1004 - T232817 |
[production] |
09:45 |
<@> |
helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' . |
[production] |
09:44 |
<@> |
helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' . |
[production] |
09:42 |
<@> |
helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' . |
[production] |
09:40 |
<godog> |
install linux-perf-4.9 on maps1002 and attempt to capture a stack sample |
[production] |
09:38 |
<gehel> |
drop /geoshape and restart kartotherian on maps1004 - T232817 |
[production] |
09:27 |
<gehel> |
restart kartotherian on maps1004 - T232817 |
[production] |
09:24 |
<gehel> |
deny access to /geoline on maps1004 - T232817 |
[production] |
09:11 |
<oblivian@puppetmaster1001> |
conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad |
[production] |
09:08 |
<godog> |
downtime kartotherian pages for 1h in codfw |
[production] |
09:01 |
<oblivian@puppetmaster1001> |
conftool action : set/pooled=inactive; selector: name=elastic1046.eqiad.wmnet |
[production] |
09:00 |
<oblivian@puppetmaster1001> |
conftool action : set/pooled=inactive; selector: name=elastic1017.eqiad.wmnet |
[production] |
08:57 |
<oblivian@puppetmaster1001> |
conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad |
[production] |
08:52 |
<godog> |
downtime kartotherian pages for 1h |
[production] |
08:48 |
<jmm@cumin2001> |
END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0) |
[production] |
08:48 |
<jmm@cumin2001> |
Updating IPMI password on 1 hosts - jmm@cumin2001 |
[production] |
08:47 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.ipmi-password-reset |
[production] |
08:47 |
<jmm@cumin2001> |
END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99) |
[production] |
08:47 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.ipmi-password-reset |
[production] |