651-700 of 10000 results (46ms)
2022-03-01 ยง
13:49 <vgutierrez@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage [production]
13:48 <klausman@cumin2002> START - Cookbook sre.dns.netbox [production]
13:48 <klausman@cumin2002> START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet [production]
13:48 <klausman@cumin2002> END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2002.codfw.wmnet [production]
13:48 <klausman@cumin2002> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
13:47 <vgutierrez@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage [production]
13:44 <klausman@cumin2002> END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2003.codfw.wmnet [production]
13:43 <klausman@cumin2002> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
13:43 <klausman@cumin2002> START - Cookbook sre.dns.netbox [production]
13:43 <klausman@cumin2002> END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) [production]
13:40 <kormat> Deploying wmfmariadbpy 0.9 T302796 [production]
13:40 <kormat> uploaded wmfmariadbpy 0.9 to apt.wm.o T302796 [production]
13:39 <klausman@cumin2002> START - Cookbook sre.dns.netbox [production]
13:39 <klausman@cumin2002> END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) [production]
13:39 <klausman@cumin2002> START - Cookbook sre.dns.netbox [production]
13:39 <klausman@cumin2002> START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet [production]
13:39 <klausman@cumin2002> START - Cookbook sre.dns.netbox [production]
13:39 <klausman@cumin2002> START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet [production]
13:32 <moritzm> restarting nginx on registry* nodes to pick up expat update [production]
13:31 <vgutierrez@cumin1001> START - Cookbook sre.hosts.reimage for host cp1087.eqiad.wmnet with OS buster [production]
13:15 <XioNoX> restart cr1-drmrs for software upgrade [production]
13:03 <moritzm> restarting FPM/Apache on parsoid hosts to pick up expat update [production]
12:49 <vgutierrez> pool cp3062 running HAProxy as TLS termination layer - T290005 T271421 [production]
12:47 <vgutierrez@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3062.esams.wmnet with OS buster [production]
12:39 <moritzm> installing expat security updates [production]
12:34 <mmandere> restart purged on cp60[12-14] [production]
12:32 <jgiannelos@deploy1002> Finished deploy [kartotherian/deploy@41d2498] (eqiad): Reduce pool size to 1 connection per node worker (duration: 01m 06s) [production]
12:31 <jgiannelos@deploy1002> Started deploy [kartotherian/deploy@41d2498] (eqiad): Reduce pool size to 1 connection per node worker [production]
12:30 <jgiannelos@deploy1002> Finished deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker (duration: 01m 30s) [production]
12:28 <jgiannelos@deploy1002> Started deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker [production]
12:15 <jgiannelos@deploy1002> Finished deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration (duration: 01m 41s) [production]
12:13 <jgiannelos@deploy1002> Started deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration [production]
12:11 <jgiannelos@deploy1002> Finished deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration (duration: 02m 01s) [production]
12:09 <jgiannelos@deploy1002> Started deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration [production]
11:43 <klausman@cumin2002> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
11:36 <kharlan@deploy1002> helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply [production]
11:35 <klausman@cumin2002> START - Cookbook sre.dns.netbox [production]
11:35 <klausman@cumin2002> START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2001.codfw.wmnet [production]
11:33 <kharlan@deploy1002> helmfile [codfw] START helmfile.d/services/linkrecommendation: apply [production]
11:32 <kharlan@deploy1002> helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply [production]
11:30 <kharlan@deploy1002> helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply [production]
11:28 <kharlan@deploy1002> helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply [production]
11:27 <kharlan@deploy1002> helmfile [staging] START helmfile.d/services/linkrecommendation: apply [production]
11:27 <cmooney@cumin1001> END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED [production]
11:21 <_joe_> restarted pybal, removed ipvsadm entry on lvs1019. Now all of MediaWiki has no http LVS endpoint available.T244843 [production]
11:18 <_joe_> also removed the ipvsadm entry for apaches:80 T244843 [production]
11:17 <jayme> rolled back linkrecommendation staging helm release to revision 12 - T302744 [production]
11:17 <_joe_> restarting pybal on lvs1020 T244843 [production]
11:11 <vgutierrez@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3062.esams.wmnet with reason: host reimage [production]
11:11 <_joe_> restarted pybal on lvs2009, T244843 [production]