3751-3800 of 10000 results (49ms)
2022-03-01 ยง
11:21 <_joe_> restarted pybal, removed ipvsadm entry on lvs1019. Now all of MediaWiki has no http LVS endpoint available.T244843 [production]
11:18 <_joe_> also removed the ipvsadm entry for apaches:80 T244843 [production]
11:17 <jayme> rolled back linkrecommendation staging helm release to revision 12 - T302744 [production]
11:17 <_joe_> restarting pybal on lvs1020 T244843 [production]
11:11 <vgutierrez@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3062.esams.wmnet with reason: host reimage [production]
11:11 <_joe_> restarted pybal on lvs2009, T244843 [production]
11:09 <vgutierrez@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on cp3062.esams.wmnet with reason: host reimage [production]
11:07 <_joe_> restarted pybal on lvs2010, T244843 [production]
11:02 <mmandere> restart purged on cp60[09,10,11] [production]
11:00 <cmooney@cumin1001> START - Cookbook sre.hosts.provision for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED [production]
10:47 <cmooney@cumin1001> END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED [production]
10:40 <vgutierrez@cumin1001> START - Cookbook sre.hosts.reimage for host cp3062.esams.wmnet with OS buster [production]
10:40 <jmm@cumin2002> END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 259 hosts [production]
10:40 <jmm@cumin2002> START - Cookbook sre.idm.logout Logging Ema out of all services on: 259 hosts [production]
10:40 <jmm@cumin2002> END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 1353 hosts [production]
10:39 <jmm@cumin2002> START - Cookbook sre.idm.logout Logging Ema out of all services on: 1353 hosts [production]
10:31 <mmandere> restart purged on cp600[6-8] [production]
10:28 <cmooney@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
10:24 <cmooney@cumin1001> START - Cookbook sre.dns.netbox [production]
10:05 <vgutierrez> pool cp2039 running HAProxy as TLS termination layer - T290005 T271421 [production]
09:48 <elukey> elukey@stat1004:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the host) [production]
09:45 <vgutierrez@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2039.codfw.wmnet with OS buster [production]
09:33 <_joe_> restarted pybal on lvs1019, removed the mw api from ipvsadm, the mw api is internally fully encrypted [production]
09:31 <_joe_> restart pybal on lvs1020 [production]
09:25 <jmm@cumin2002> END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Amuigai out of all services on: 1881 hosts [production]
09:25 <elukey> restart varnishkafka-webrequest on cp6009 as attempt to clear a weird status of librdkafka (delivery errors to kafka) [production]
09:25 <_joe_> manually removed ipvs entries on lvs2*, so it is actually now that the http api is not available in codfw anymore [production]
09:24 <jmm@cumin2002> START - Cookbook sre.idm.logout Logging Amuigai out of all services on: 1881 hosts [production]
09:24 <jmm@cumin2002> END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging ZPapierski out of all services on: 1881 hosts [production]
09:22 <jmm@cumin2002> START - Cookbook sre.idm.logout Logging ZPapierski out of all services on: 1881 hosts [production]
09:22 <_joe_> restarted pybal on lvs2009, the mw api is now effectively https-only in codfw T287820 [production]
09:20 <_joe_> restarted pybal on lvs2010 [production]
09:14 <vgutierrez@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage [production]
09:12 <vgutierrez@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage [production]
09:06 <elukey> restart purged on cp6005 [production]
08:57 <elukey> restart purged on cp6004 [production]
08:54 <vgutierrez@cumin1001> START - Cookbook sre.hosts.reimage for host cp2039.codfw.wmnet with OS buster [production]
08:27 <urbanecm> UTC morning B&C window done [production]
08:25 <elukey> restart purged on cp6003 [production]
08:16 <moritzm> drain instances off ganeti2008 for eventual decom [production]
08:08 <urbanecm@deploy1002> Synchronized wmf-config/ProductionServices.php: d149208dfd7e5fbf51f44dd0bf7dae3b2e2f5159: Use service-proxy to connect to linkrecommendation (T302719) (duration: 00m 49s) [production]
07:59 <elukey> restart purged on cp6002 [production]
06:58 <oblivian@deploy1002> Finished deploy [restbase/deploy@0848b15] (dev-cluster): T302464 test (duration: 00m 17s) [production]
06:57 <oblivian@deploy1002> Started deploy [restbase/deploy@0848b15] (dev-cluster): T302464 test [production]
06:56 <elukey> restart purged on cp6001 to clear stale kafka TLS consumer state (or attempting to) [production]
06:46 <_joe_> uploaded scap 4.4.1 to {stretch,buster,bullseye} T302464 [production]
06:46 <_joe_> uploaded scap 4.4.1 to {stretch,buster,bullseye} [production]
02:59 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21618 and previous config saved to /var/cache/conftool/dbconfig/20220301-025938-ladsgroup.json [production]
02:44 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21617 and previous config saved to /var/cache/conftool/dbconfig/20220301-024433-ladsgroup.json [production]
02:29 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21616 and previous config saved to /var/cache/conftool/dbconfig/20220301-022928-ladsgroup.json [production]