2019-09-13
§
|
12:56 |
<_joe_> |
banning more urls on maps1003 |
[production] |
12:37 |
<_joe_> |
temp ban of class of urls on maps1003 nginx |
[production] |
12:14 |
<jbond42> |
add timing information to maps1003 access logs |
[production] |
11:39 |
<jbond42> |
enable access logs on maps1003 |
[production] |
11:38 |
<_joe_> |
manually raising the worker heap limit to 600 MB on kartotherian on maps1003 |
[production] |
11:11 |
<elukey> |
reboot an-conf100* (Analytics Zookeeper nodes - not yet in production) for kernel upgrades |
[production] |
11:10 |
<elukey> |
reboot an-tool1007 (runs turnilo) for kernel upgrades |
[production] |
11:08 |
<jmm@cumin2001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
11:08 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.downtime |
[production] |
11:05 |
<godog> |
silence kartotherian pages for 2h, known issue |
[production] |
10:47 |
<vgutierrez> |
rebooting acmechief-test servers to catch up latest kernel upgrades |
[production] |
10:42 |
<akosiaris@> |
helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . |
[production] |
10:41 |
<moritzm> |
reimage restbase2009 to stretch T224553 |
[production] |
10:38 |
<moritzm> |
repool restbase1018 after reimage to stretch and completed Cassandra bootstrap |
[production] |
10:36 |
<akosiaris@> |
helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . |
[production] |
10:36 |
<akosiaris@> |
helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . |
[production] |
10:13 |
<vgutierrez> |
disable ATS-TLS debug options on cp5001 - T232298 |
[production] |
10:09 |
<akosiaris@> |
helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . |
[production] |
09:46 |
<gehel> |
re-enabling /geoline on maps1004 - T232817 |
[production] |
09:45 |
<@> |
helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' . |
[production] |
09:44 |
<@> |
helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' . |
[production] |
09:42 |
<@> |
helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' . |
[production] |
09:40 |
<godog> |
install linux-perf-4.9 on maps1002 and attempt to capture a stack sample |
[production] |
09:38 |
<gehel> |
drop /geoshape and restart kartotherian on maps1004 - T232817 |
[production] |
09:27 |
<gehel> |
restart kartotherian on maps1004 - T232817 |
[production] |
09:24 |
<gehel> |
deny access to /geoline on maps1004 - T232817 |
[production] |
09:11 |
<oblivian@puppetmaster1001> |
conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad |
[production] |
09:08 |
<godog> |
downtime kartotherian pages for 1h in codfw |
[production] |
09:01 |
<oblivian@puppetmaster1001> |
conftool action : set/pooled=inactive; selector: name=elastic1046.eqiad.wmnet |
[production] |
09:00 |
<oblivian@puppetmaster1001> |
conftool action : set/pooled=inactive; selector: name=elastic1017.eqiad.wmnet |
[production] |
08:57 |
<oblivian@puppetmaster1001> |
conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad |
[production] |
08:52 |
<godog> |
downtime kartotherian pages for 1h |
[production] |
08:48 |
<jmm@cumin2001> |
END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0) |
[production] |
08:48 |
<jmm@cumin2001> |
Updating IPMI password on 1 hosts - jmm@cumin2001 |
[production] |
08:47 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.ipmi-password-reset |
[production] |
08:47 |
<jmm@cumin2001> |
END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99) |
[production] |
08:47 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.ipmi-password-reset |
[production] |
08:45 |
<gehel> |
stop tilerator on maps to help reduce load |
[production] |
08:37 |
<_joe_> |
rolling restart of karotherian |
[production] |
08:33 |
<_joe_> |
restarting kartotherian on maps1003, all workers seem stuck |
[production] |
05:58 |
<oblivian@deploy1001> |
Synchronized w/fatal-error.php: Adding core dump function to fatal-error (duration: 01m 04s) |
[production] |
05:40 |
<_joe_> |
live-hacking mw1348, setting rlimit_core = unlimited to allow core dumps to be taken |
[production] |
05:17 |
<effie> |
Rolling restart php-fpm across the fleet for 536400 |
[production] |
04:53 |
<vgutierrez> |
restarting ats-tls on cp4021 and cp2002 to pick up the new SSL session cache timeout - T231849 |
[production] |
04:50 |
<eileen> |
process-control config revision is 43a2677bcf - turned off gender import |
[production] |
02:23 |
<eileen> |
civicrm revision changed from c5ab5aea9e to 45dbfdb96f, config revision is 1da8391a9a |
[production] |
01:09 |
<XioNoX> |
add IPv6 sampling to cr1-eqiad |
[production] |
01:07 |
<XioNoX> |
enable netflow sampling on cr2-codfw |
[production] |