production SAL

4701-4750 of 10000 results (40ms)

2021-08-03 §
16:14	<pt1979@cumin2002>	START - Cookbook sre.dns.netbox	[production]
16:00	<dcausse@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .	[production]
15:55	<mwdebug-deploy@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
15:50	<mwdebug-deploy@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
15:49	<jmm@cumin2002>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
15:34	<mwdebug-deploy@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
15:30	<mwdebug-deploy@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
15:26	<jmm@cumin2002>	START - Cookbook sre.dns.netbox	[production]
15:25	<moritzm>	prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) T286206	[production]
15:14	<jmm@cumin2002>	END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet	[production]
15:01	<jmm@cumin2002>	START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet	[production]
14:56	<jmm@cumin2002>	END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet	[production]
14:49	<jmm@cumin2002>	START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet	[production]
14:32	<oblivian@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
14:27	<ottomata>	chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos	[production]
14:23	<ottomata>	chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos	[production]
14:13	<ottomata>	chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos	[production]
12:47	<moritzm>	restarting Tomcat on idp1001	[production]
12:05	<moritzm>	installing libgcrypt20 security updates	[production]
11:48	<jmm@cumin2002>	END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet	[production]
11:36	<moritzm>	updated bullseye d-i images to rc3 T275873	[production]
11:28	<godog>	upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - T222113	[production]
11:25	<jmm@cumin2002>	START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet	[production]
11:19	<jmm@cumin2002>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
11:18	<godog>	upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - T222113	[production]
11:15	<jmm@cumin2002>	START - Cookbook sre.dns.netbox	[production]
11:13	<moritzm>	rename Ganeti group for test cluster to row_D T286206	[production]
11:01	<jmm@cumin2002>	END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet	[production]
10:58	<jmm@cumin2002>	START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet	[production]
09:18	<marostegui>	Failover m1, m2 and m3-master T287574	[production]
09:12	<moritzm>	installinh php 7.0 security updates on stretch	[production]
09:11	<jayme>	importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - T286054	[production]
08:57	<moritzm>	installing pillow security updates on stretch	[production]
08:53	<jynus@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE	[production]
08:50	<jynus@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE	[production]
08:17	<legoktm>	pausing refreshLinks run against wikiversities while other issues are figured out	[production]
08:13	<jynus@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE	[production]
08:10	<jynus@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE	[production]
08:03	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue	[production]
08:03	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue	[production]
07:42	<moritzm>	upgrading spicerack on cumin2002 to 0.0.57	[production]
06:31	<kart__>	Updated cxserver to 2021-08-02-164000-production (T286473)	[production]
06:26	<kartik@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .	[production]
06:20	<kartik@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .	[production]
06:15	<kartik@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .	[production]
04:37	<marostegui>	Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020	[production]
00:43	<reedy@deploy1002>	Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)	[production]
00:43	<reedy@deploy1002>	Started deploy [integration/docroot@f9d225d]: with less gref	[production]
00:29	<reedy@deploy1002>	Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)	[production]
00:29	<reedy@deploy1002>	Started deploy [integration/docroot@f7df1c7]: (no justification provided)	[production]