5751-5800 of 10000 results (31ms)
2021-07-14 ยง
15:34 <ryankemper> [Elastic] Manually triggering readahead mitigation across whole fleet to prevent any further issues today: `ryankemper@cumin1001:~$ sudo cumin -b 12 'P{elastic*}' 'sudo systemctl restart elasticsearch-disable-readahead.service'` (still need to investigate why `elasticsearch-disable-readahead.timer` isn't re-firing every 30 mins as desired) [production]
15:34 <moritzm> installing apache security updates on otrs1001 (ticket.wikimedia.org) [production]
15:34 <otto@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' . [production]
15:28 <urbanecm> Start server-side upload of 3 large image files (T285708) [production]
15:16 <moritzm> installing apache security updates on lists1001 (lists.wikimedia.org) [production]
14:51 <moritzm> installing apache security updates on puppet masters [production]
14:47 <jiji@cumin1001> conftool action : set/pooled=inactive; selector: name=mw2384.codfw.wmnet [production]
14:47 <effie> set mw2384 as inactive to investigate mw2383 issue - T286463 [production]
14:44 <jiji@deploy1002> helmfile [codfw] START helmfile.d/admin 'apply'. [production]
14:44 <moritzm> installing apache security updates on grafana* [production]
14:43 <jiji@deploy1002> helmfile [eqiad] DONE helmfile.d/admin 'apply'. [production]
14:43 <jiji@deploy1002> helmfile [eqiad] START helmfile.d/admin 'apply'. [production]
14:40 <jiji@deploy1002> helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. [production]
14:40 <jiji@deploy1002> helmfile [staging-eqiad] START helmfile.d/admin 'apply'. [production]
14:38 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1422.eqiad.wmnet [production]
14:33 <dcausse> runnning elasticsearch-madvise-random ES_PID on elastic2045 [production]
14:31 <dcausse> runnning elasticsearch-madvise-random 1022 on elastic2054 [production]
14:23 <jiji@deploy1002> helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. [production]
14:19 <jiji@deploy1002> helmfile [staging-codfw] START helmfile.d/admin 'apply'. [production]
14:19 <jiji@deploy1002> helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. [production]
14:19 <jiji@deploy1002> helmfile [staging-codfw] START helmfile.d/admin 'apply'. [production]
14:13 <elukey> restart php-fpm on mw2370 [production]
13:43 <jmm@cumin2002> END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts [production]
13:43 <jmm@cumin2002> START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts [production]
13:09 <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 T277118 [production]
13:09 <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 T277118 [production]
12:47 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1005.eqiad.wmnet [production]
12:43 <urbanecm> Start server-side upload of 3 large image files (T285708) [production]
12:37 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host rdb1005.eqiad.wmnet [production]
12:24 <jmm@cumin2002> END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts [production]
12:23 <jmm@cumin2002> START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts [production]
12:15 <mutante> mw1422 - scap pull [production]
12:09 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1422.eqiad.wmnet [production]
12:02 <moritzm> upgrading python3-wmflib fleetwide to 0.0.8 (needed for new logout.d wrapper) [production]
12:01 <hnowlan@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2008.codfw.wmnet with reason: Bootstrapping cassandra in new cluster [production]
12:01 <hnowlan@cumin1001> START - Cookbook sre.hosts.downtime for 3:00:00 on maps2008.codfw.wmnet with reason: Bootstrapping cassandra in new cluster [production]
11:52 <mutante> mw1422 - new setup, not in prod yet [production]
11:52 <jmm@cumin2002> END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts [production]
11:52 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw1422.eqiad.wmnet with reason: new host [production]
11:52 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw1422.eqiad.wmnet with reason: new host [production]
11:51 <jmm@cumin2002> START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts [production]
11:49 <ladsgroup@deploy1002> Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:704525|Remove reviewer user group in ruwiki (T284589)]] (duration: 01m 05s) [production]
11:40 <hnowlan@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2008.codfw.wmnet with reason: REIMAGE [production]
11:39 <ladsgroup@deploy1002> Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:700854|flaggedrevs: Reduce levels for ruwiki to 1 (T284589)]] (duration: 01m 05s) [production]
11:37 <hnowlan@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on maps2008.codfw.wmnet with reason: REIMAGE [production]
11:23 <ariel@puppetmaster1001> conftool action : set/pooled=inactive; selector: name=mw2383.codfw.wmnet [production]
11:10 <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: 72027e136f10867f5db02043b7505390e49130d1: Disable indexing in NS_USER and NS_USER_TALK on bnwiki (T286152) (duration: 02m 07s) [production]
11:06 <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: 4dc11d2333cbf70a4eb20f3fb94a9e363b41d2df: Change category name of Babel extension on Javanese Wikipedia (T286165) (duration: 02m 10s) [production]
10:40 <hnowlan@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica [production]
10:40 <hnowlan@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica [production]