__all__ SAL

7251-7300 of 10000 results (39ms)

2021-03-04 §
13:06	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1061.eqiad.wmnet with reason: REIMAGE	[production]
12:48	<elukey>	drain + reimage analytics10[61,62] to Debian Buster	[analytics]
12:48	<elukey>	drain + reimage analytics10[61,62] to Debian Buster	[production]
12:45	<jakob@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .	[production]
12:40	<mbsantos@deploy1002>	Finished deploy [tilerator/deploy@6fcbb9f]: (no justification provided) (duration: 00m 14s)	[production]
12:40	<wmde-fisch@deploy1002>	Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:668108\|Remove conflicting gadget configuration for hewiki (T276330)]] (duration: 01m 12s)	[production]
12:40	<mbsantos@deploy1002>	Started deploy [tilerator/deploy@6fcbb9f]: (no justification provided)	[production]
12:38	<Majavah>	`git rebase origin/production` on deployment-puppetmaster04 to update few settings for T276419	[releng]
12:34	<jakob@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .	[production]
12:19	<Majavah>	Beta cluster is now using deployment-mwlog01 instead of deployment-fluorine02 for MediaWiki logs. fluorine02 is still used for some other misc services, these will be migrated soon	[releng]
12:11	<kormat@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db1115.eqiad.wmnet,dbmonitor1001.wikimedia.org with reason: Restart db1115 to fix memory leak	[production]
12:11	<kormat@cumin1001>	START - Cookbook sre.hosts.downtime for 0:30:00 on db1115.eqiad.wmnet,dbmonitor1001.wikimedia.org with reason: Restart db1115 to fix memory leak	[production]
12:10	<jakob@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .	[production]
12:06	<Majavah>	deployment-prep Delete lists.beta.wmflabs.org DNS record, points to an unassigned floating IP and not used according to Amir	[releng]
12:00	<marostegui>	Stop mysql on db1117:3321 to clone db1159	[production]
11:42	<jakob@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .	[production]
11:40	<marostegui@cumin1001>	dbctl commit (dc=all): 'Add db2145 to s1 (and repool db2116) - T275633', diff saved to https://phabricator.wikimedia.org/P14625 and previous config saved to /var/cache/conftool/dbconfig/20210304-114052-marostegui.json	[production]
11:29	<arturo>	draining cloudvirt1024 for T275753	[admin]
11:28	<marostegui@cumin1001>	dbctl commit (dc=all): 'Add db2145 into dbctl depooled - T275633', diff saved to https://phabricator.wikimedia.org/P14624 and previous config saved to /var/cache/conftool/dbconfig/20210304-112848-marostegui.json	[production]
11:27	<_joe_>	restarted redis on mc2027 to pick up the replication change	[production]
11:25	<arturo>	rebooted tools-sgewebgrid-generic-0901, repool it again	[tools]
11:24	<dcaro>	rebooted cloudvirt1022, re-adding to ceph and removing from maintenance host aggregate for T275753	[admin]
11:14	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1059.eqiad.wmnet with reason: REIMAGE	[production]
11:11	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1059.eqiad.wmnet with reason: REIMAGE	[production]
11:10	<kormat@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Needs fixing after T274472	[production]
11:10	<kormat@cumin1001>	START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Needs fixing after T274472	[production]
11:08	<dcaro@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1022.eqiad.wmnet	[production]
11:04	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1060.eqiad.wmnet with reason: REIMAGE	[production]
11:02	<Majavah>	live hacking https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/668338/ on deployment-deploy01 to test new deployment-mwlog01 ref T276419	[releng]
11:02	<dcaro@cumin1001>	START - Cookbook sre.hosts.reboot-single for host cloudvirt1022.eqiad.wmnet	[production]
11:02	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1060.eqiad.wmnet with reason: REIMAGE	[production]
11:01	<dcaro>	rebooting cloudvirt1022 for T275753	[admin]
10:51	<Majavah>	stop bogus service udp2log on deployment-mwlog01, no idea what it is but it was using the same port as udp2log-mw.service is	[releng]
10:40	<elukey>	drain + reimage analytics1059/1060 to Debian Buster	[analytics]
10:40	<elukey>	drain + reimage analytics1059/1060 to Debian Buster	[production]
10:32	<moritzm>	uploaded screen 4.2.1-3+deb8u1+wmf1 to jessie-wikimedia	[production]
09:57	<arturo>	depool tools-sgewebgrid-generic-0901 to reboot VM. It was stuck in MIGRATING state when draining cloudvirt1022	[tools]
09:32	<elukey>	reboot an-worker[1097-1101] (GPU workers) to pick up the new kernel (5.10)	[analytics]
09:32	<elukey>	install linux 5.10 on an-worker[1097-1101] (GPU workers) and reboot them	[production]
09:30	<kormat>	disabling puppet on all db hosts while deploying a puppet monitoring change T275497	[production]
09:20	<hashar>	Restored analytics/udp2log cause it got to be packaged for Buster # T276422 T180301	[releng]
09:19	<moritzm>	uploaded udplog 1.8.5+deb10u1 to buster-wikimedia	[production]
09:12	<dcaro>	draining cloudvirt1022 for T275753	[admin]
09:02	<elukey>	kill/start mediawiki-geoeditors-monthly to apply backtick change (hive script)	[analytics]
08:48	<elukey>	deploy refinery to hdfs	[analytics]
08:45	<elukey@deploy1002>	Finished deploy [analytics/refinery@605f8b8]: Fix for geoeditors monthly job (duration: 11m 03s)	[production]
08:34	<elukey>	deploy refinery to fix https://gerrit.wikimedia.org/r/c/analytics/refinery/+/668111	[analytics]
08:33	<elukey@deploy1002>	Started deploy [analytics/refinery@605f8b8]: Fix for geoeditors monthly job	[production]
07:47	<legoktm>	rebuilding php*-compile images https://gerrit.wikimedia.org/r/668259	[releng]
07:38	<elukey>	reboot an-worker1096 to pick up 5.10 kernel	[analytics]