2022-03-01
ยง
|
12:31 |
<jgiannelos@deploy1002> |
Started deploy [kartotherian/deploy@41d2498] (eqiad): Reduce pool size to 1 connection per node worker |
[production] |
12:30 |
<jgiannelos@deploy1002> |
Finished deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker (duration: 01m 30s) |
[production] |
12:28 |
<jgiannelos@deploy1002> |
Started deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker |
[production] |
12:15 |
<jgiannelos@deploy1002> |
Finished deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration (duration: 01m 41s) |
[production] |
12:13 |
<jgiannelos@deploy1002> |
Started deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration |
[production] |
12:11 |
<jgiannelos@deploy1002> |
Finished deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration (duration: 02m 01s) |
[production] |
12:09 |
<jgiannelos@deploy1002> |
Started deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration |
[production] |
11:43 |
<klausman@cumin2002> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
11:36 |
<kharlan@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply |
[production] |
11:35 |
<klausman@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
11:35 |
<klausman@cumin2002> |
START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2001.codfw.wmnet |
[production] |
11:33 |
<kharlan@deploy1002> |
helmfile [codfw] START helmfile.d/services/linkrecommendation: apply |
[production] |
11:32 |
<kharlan@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply |
[production] |
11:30 |
<kharlan@deploy1002> |
helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply |
[production] |
11:28 |
<kharlan@deploy1002> |
helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply |
[production] |
11:27 |
<kharlan@deploy1002> |
helmfile [staging] START helmfile.d/services/linkrecommendation: apply |
[production] |
11:27 |
<cmooney@cumin1001> |
END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED |
[production] |
11:21 |
<_joe_> |
restarted pybal, removed ipvsadm entry on lvs1019. Now all of MediaWiki has no http LVS endpoint available.T244843 |
[production] |
11:18 |
<_joe_> |
also removed the ipvsadm entry for apaches:80 T244843 |
[production] |
11:17 |
<jayme> |
rolled back linkrecommendation staging helm release to revision 12 - T302744 |
[production] |
11:17 |
<_joe_> |
restarting pybal on lvs1020 T244843 |
[production] |
11:11 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3062.esams.wmnet with reason: host reimage |
[production] |
11:11 |
<_joe_> |
restarted pybal on lvs2009, T244843 |
[production] |
11:09 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cp3062.esams.wmnet with reason: host reimage |
[production] |
11:07 |
<_joe_> |
restarted pybal on lvs2010, T244843 |
[production] |
11:02 |
<mmandere> |
restart purged on cp60[09,10,11] |
[production] |
11:00 |
<cmooney@cumin1001> |
START - Cookbook sre.hosts.provision for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED |
[production] |
10:47 |
<cmooney@cumin1001> |
END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED |
[production] |
10:40 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.reimage for host cp3062.esams.wmnet with OS buster |
[production] |
10:40 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 259 hosts |
[production] |
10:40 |
<jmm@cumin2002> |
START - Cookbook sre.idm.logout Logging Ema out of all services on: 259 hosts |
[production] |
10:40 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 1353 hosts |
[production] |
10:39 |
<jmm@cumin2002> |
START - Cookbook sre.idm.logout Logging Ema out of all services on: 1353 hosts |
[production] |
10:31 |
<mmandere> |
restart purged on cp600[6-8] |
[production] |
10:28 |
<cmooney@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
10:24 |
<cmooney@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
10:05 |
<vgutierrez> |
pool cp2039 running HAProxy as TLS termination layer - T290005 T271421 |
[production] |
09:48 |
<elukey> |
elukey@stat1004:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the host) |
[production] |
09:45 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2039.codfw.wmnet with OS buster |
[production] |
09:33 |
<_joe_> |
restarted pybal on lvs1019, removed the mw api from ipvsadm, the mw api is internally fully encrypted |
[production] |
09:31 |
<_joe_> |
restart pybal on lvs1020 |
[production] |
09:25 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Amuigai out of all services on: 1881 hosts |
[production] |
09:25 |
<elukey> |
restart varnishkafka-webrequest on cp6009 as attempt to clear a weird status of librdkafka (delivery errors to kafka) |
[production] |
09:25 |
<_joe_> |
manually removed ipvs entries on lvs2*, so it is actually now that the http api is not available in codfw anymore |
[production] |
09:24 |
<jmm@cumin2002> |
START - Cookbook sre.idm.logout Logging Amuigai out of all services on: 1881 hosts |
[production] |
09:24 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging ZPapierski out of all services on: 1881 hosts |
[production] |
09:22 |
<jmm@cumin2002> |
START - Cookbook sre.idm.logout Logging ZPapierski out of all services on: 1881 hosts |
[production] |
09:22 |
<_joe_> |
restarted pybal on lvs2009, the mw api is now effectively https-only in codfw T287820 |
[production] |
09:20 |
<_joe_> |
restarted pybal on lvs2010 |
[production] |
09:14 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage |
[production] |