7401-7450 of 10000 results (128ms)
2019-09-13 ยง
13:57 <oblivian@deploy1001> Synchronized wmf-config/logging.php: unbreak mediawiki logging on scandium (duration: 01m 04s) [production]
13:28 <akosiaris@> helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . [production]
13:27 <akosiaris@> helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . [production]
13:21 <akosiaris@> helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . [production]
13:20 <akosiaris@> helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . [production]
13:19 <akosiaris@> helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . [production]
12:56 <_joe_> banning more urls on maps1003 [production]
12:37 <_joe_> temp ban of class of urls on maps1003 nginx [production]
12:14 <jbond42> add timing information to maps1003 access logs [production]
11:39 <jbond42> enable access logs on maps1003 [production]
11:38 <_joe_> manually raising the worker heap limit to 600 MB on kartotherian on maps1003 [production]
11:11 <elukey> reboot an-conf100* (Analytics Zookeeper nodes - not yet in production) for kernel upgrades [production]
11:10 <elukey> reboot an-tool1007 (runs turnilo) for kernel upgrades [production]
11:08 <jmm@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
11:08 <jmm@cumin2001> START - Cookbook sre.hosts.downtime [production]
11:05 <godog> silence kartotherian pages for 2h, known issue [production]
10:47 <vgutierrez> rebooting acmechief-test servers to catch up latest kernel upgrades [production]
10:42 <akosiaris@> helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . [production]
10:41 <moritzm> reimage restbase2009 to stretch T224553 [production]
10:38 <moritzm> repool restbase1018 after reimage to stretch and completed Cassandra bootstrap [production]
10:36 <akosiaris@> helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . [production]
10:36 <akosiaris@> helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . [production]
10:13 <vgutierrez> disable ATS-TLS debug options on cp5001 - T232298 [production]
10:09 <akosiaris@> helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . [production]
09:46 <gehel> re-enabling /geoline on maps1004 - T232817 [production]
09:45 <@> helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' . [production]
09:44 <@> helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' . [production]
09:42 <@> helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' . [production]
09:40 <godog> install linux-perf-4.9 on maps1002 and attempt to capture a stack sample [production]
09:38 <gehel> drop /geoshape and restart kartotherian on maps1004 - T232817 [production]
09:27 <gehel> restart kartotherian on maps1004 - T232817 [production]
09:24 <gehel> deny access to /geoline on maps1004 - T232817 [production]
09:11 <oblivian@puppetmaster1001> conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad [production]
09:08 <godog> downtime kartotherian pages for 1h in codfw [production]
09:01 <oblivian@puppetmaster1001> conftool action : set/pooled=inactive; selector: name=elastic1046.eqiad.wmnet [production]
09:00 <oblivian@puppetmaster1001> conftool action : set/pooled=inactive; selector: name=elastic1017.eqiad.wmnet [production]
08:57 <oblivian@puppetmaster1001> conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad [production]
08:52 <godog> downtime kartotherian pages for 1h [production]
08:48 <jmm@cumin2001> END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0) [production]
08:48 <jmm@cumin2001> Updating IPMI password on 1 hosts - jmm@cumin2001 [production]
08:47 <jmm@cumin2001> START - Cookbook sre.hosts.ipmi-password-reset [production]
08:47 <jmm@cumin2001> END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99) [production]
08:47 <jmm@cumin2001> START - Cookbook sre.hosts.ipmi-password-reset [production]
08:45 <gehel> stop tilerator on maps to help reduce load [production]
08:37 <_joe_> rolling restart of karotherian [production]
08:33 <_joe_> restarting kartotherian on maps1003, all workers seem stuck [production]
05:58 <oblivian@deploy1001> Synchronized w/fatal-error.php: Adding core dump function to fatal-error (duration: 01m 04s) [production]
05:40 <_joe_> live-hacking mw1348, setting rlimit_core = unlimited to allow core dumps to be taken [production]
05:17 <effie> Rolling restart php-fpm across the fleet for 536400 [production]
04:53 <vgutierrez> restarting ats-tls on cp4021 and cp2002 to pick up the new SSL session cache timeout - T231849 [production]