2022-03-01
ยง
|
11:32 |
<kharlan@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply |
[production] |
11:30 |
<kharlan@deploy1002> |
helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply |
[production] |
11:28 |
<kharlan@deploy1002> |
helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply |
[production] |
11:27 |
<kharlan@deploy1002> |
helmfile [staging] START helmfile.d/services/linkrecommendation: apply |
[production] |
11:27 |
<cmooney@cumin1001> |
END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED |
[production] |
11:21 |
<_joe_> |
restarted pybal, removed ipvsadm entry on lvs1019. Now all of MediaWiki has no http LVS endpoint available.T244843 |
[production] |
11:18 |
<_joe_> |
also removed the ipvsadm entry for apaches:80 T244843 |
[production] |
11:17 |
<jayme> |
rolled back linkrecommendation staging helm release to revision 12 - T302744 |
[production] |
11:17 |
<_joe_> |
restarting pybal on lvs1020 T244843 |
[production] |
11:11 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3062.esams.wmnet with reason: host reimage |
[production] |
11:11 |
<_joe_> |
restarted pybal on lvs2009, T244843 |
[production] |
11:09 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cp3062.esams.wmnet with reason: host reimage |
[production] |
11:07 |
<_joe_> |
restarted pybal on lvs2010, T244843 |
[production] |
11:02 |
<mmandere> |
restart purged on cp60[09,10,11] |
[production] |
11:00 |
<cmooney@cumin1001> |
START - Cookbook sre.hosts.provision for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED |
[production] |
10:47 |
<cmooney@cumin1001> |
END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED |
[production] |
10:40 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.reimage for host cp3062.esams.wmnet with OS buster |
[production] |
10:40 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 259 hosts |
[production] |
10:40 |
<jmm@cumin2002> |
START - Cookbook sre.idm.logout Logging Ema out of all services on: 259 hosts |
[production] |
10:40 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 1353 hosts |
[production] |
10:39 |
<jmm@cumin2002> |
START - Cookbook sre.idm.logout Logging Ema out of all services on: 1353 hosts |
[production] |
10:31 |
<mmandere> |
restart purged on cp600[6-8] |
[production] |
10:28 |
<cmooney@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
10:24 |
<cmooney@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
10:05 |
<vgutierrez> |
pool cp2039 running HAProxy as TLS termination layer - T290005 T271421 |
[production] |
09:48 |
<elukey> |
elukey@stat1004:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the host) |
[production] |
09:45 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2039.codfw.wmnet with OS buster |
[production] |
09:33 |
<_joe_> |
restarted pybal on lvs1019, removed the mw api from ipvsadm, the mw api is internally fully encrypted |
[production] |
09:31 |
<_joe_> |
restart pybal on lvs1020 |
[production] |
09:25 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Amuigai out of all services on: 1881 hosts |
[production] |
09:25 |
<elukey> |
restart varnishkafka-webrequest on cp6009 as attempt to clear a weird status of librdkafka (delivery errors to kafka) |
[production] |
09:25 |
<_joe_> |
manually removed ipvs entries on lvs2*, so it is actually now that the http api is not available in codfw anymore |
[production] |
09:24 |
<jmm@cumin2002> |
START - Cookbook sre.idm.logout Logging Amuigai out of all services on: 1881 hosts |
[production] |
09:24 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging ZPapierski out of all services on: 1881 hosts |
[production] |
09:22 |
<jmm@cumin2002> |
START - Cookbook sre.idm.logout Logging ZPapierski out of all services on: 1881 hosts |
[production] |
09:22 |
<_joe_> |
restarted pybal on lvs2009, the mw api is now effectively https-only in codfw T287820 |
[production] |
09:20 |
<_joe_> |
restarted pybal on lvs2010 |
[production] |
09:14 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage |
[production] |
09:12 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage |
[production] |
09:06 |
<elukey> |
restart purged on cp6005 |
[production] |
08:57 |
<elukey> |
restart purged on cp6004 |
[production] |
08:54 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.reimage for host cp2039.codfw.wmnet with OS buster |
[production] |
08:27 |
<urbanecm> |
UTC morning B&C window done |
[production] |
08:25 |
<elukey> |
restart purged on cp6003 |
[production] |
08:16 |
<moritzm> |
drain instances off ganeti2008 for eventual decom |
[production] |
08:08 |
<urbanecm@deploy1002> |
Synchronized wmf-config/ProductionServices.php: d149208dfd7e5fbf51f44dd0bf7dae3b2e2f5159: Use service-proxy to connect to linkrecommendation (T302719) (duration: 00m 49s) |
[production] |
07:59 |
<elukey> |
restart purged on cp6002 |
[production] |
06:58 |
<oblivian@deploy1002> |
Finished deploy [restbase/deploy@0848b15] (dev-cluster): T302464 test (duration: 00m 17s) |
[production] |
06:57 |
<oblivian@deploy1002> |
Started deploy [restbase/deploy@0848b15] (dev-cluster): T302464 test |
[production] |
06:56 |
<elukey> |
restart purged on cp6001 to clear stale kafka TLS consumer state (or attempting to) |
[production] |