production SAL

3801-3850 of 10000 results (79ms)

2022-12-15 §
08:51	<akosiaris>	nothing noticed with rdb1007 reboot for mw, jobqueue, api-gateway. changeprop had a minor backlog increase, but everything appears fine now.	[production]
08:28	<akosiaris>	reboot rdb1009 for kernel upgrades. possibly (but probably not) affected applications: changeprop, cpjobqueue, api-gateway, redisLockManager	[production]
08:13	<kartik@deploy1002>	Finished scap: Backport for [[gerrit:868215\|Enable Section Translation on 6 WPs (T319177)]] (duration: 10m 55s)	[production]
08:08	<jmm@cumin2002>	END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host puppetdb2003.codfw.wmnet	[production]
08:04	<kartik@deploy1002>	kartik and kartik: Backport for [[gerrit:868215\|Enable Section Translation on 6 WPs (T319177)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet	[production]
08:03	<kartik@deploy1002>	Started scap: Backport for [[gerrit:868215\|Enable Section Translation on 6 WPs (T319177)]]	[production]
07:57	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host puppetdb2003.codfw.wmnet	[production]
01:46	<cwhite@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2026.codfw.wmnet with OS bullseye	[production]
00:58	<cwhite@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2026.codfw.wmnet with reason: host reimage	[production]
00:55	<cwhite@cumin2002>	START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2026.codfw.wmnet with reason: host reimage	[production]
00:32	<mutante>	releases1002 - rebooting	[production]
00:30	<mutante>	releases2002 - rebooting	[production]
00:19	<cwhite@cumin2002>	START - Cookbook sre.hosts.reimage for host logstash2026.codfw.wmnet with OS bullseye	[production]
00:19	<cwhite@cumin2002>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash2026.codfw.wmnet with OS bullseye	[production]
00:15	<cwhite@cumin2002>	START - Cookbook sre.hosts.reimage for host logstash2026.codfw.wmnet with OS bullseye	[production]
00:14	<cwhite@cumin2002>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash2026.codfw.wmnet with OS bullseye	[production]
00:05	<tgr>	EU late backports done	[production]
00:05	<tgr@deploy1002>	Synchronized php-1.40.0-wmf.14/extensions/GrowthExperiments/: Backport: [[gerrit:868052\|User impact: read edit count from primary db in save complete hook (T324930)]] (duration: 07m 03s)	[production]
2022-12-14 §
23:50	<cwhite@cumin2002>	START - Cookbook sre.hosts.reimage for host logstash2026.codfw.wmnet with OS bullseye	[production]
23:48	<cwhite@cumin2002>	END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2026']	[production]
23:44	<ejegg>	civicrm upgraded from a1c2630a to 98b48b9a	[production]
23:41	<cwhite@cumin2002>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2026']	[production]
23:40	<cwhite@cumin2002>	END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2026']	[production]
23:33	<cwhite@cumin2002>	START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2026']	[production]
23:29	<ryankemper>	[WDQS] Downtimed wdqs20[09-12] for the next 7 days	[production]
23:28	<ryankemper>	T301167 wdqs2011/2012 were not visible in pybal (oversight from when I added the other hosts with conftool last week). Fixed that, so now all of the new hosts are showing up properly.	[production]
23:27	<ryankemper@puppetmaster1001>	conftool action : set/weight=10:pooled=no; selector: name=wdqs2012.*	[production]
23:27	<ryankemper@puppetmaster1001>	conftool action : set/weight=10:pooled=no; selector: name=wdqs2011.*	[production]
23:14	<bd808>	Toolhub: rebuilding search indices following app update	[production]
23:12	<bd808@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/toolhub: apply	[production]
23:10	<denisse@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on alert2001.wikimedia.org with reason: kernel update	[production]
23:10	<bd808@deploy1002>	helmfile [eqiad] START helmfile.d/services/toolhub: apply	[production]
23:10	<denisse@cumin1001>	START - Cookbook sre.hosts.downtime for 0:10:00 on alert2001.wikimedia.org with reason: kernel update	[production]
23:04	<bd808@deploy1002>	helmfile [codfw] DONE helmfile.d/services/toolhub: apply	[production]
23:03	<denisse@cumin1001>	END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert2001.wikimedia.org	[production]
23:03	<denisse@cumin1001>	START - Cookbook sre.hosts.reboot-single for host alert2001.wikimedia.org	[production]
23:03	<bd808@deploy1002>	helmfile [codfw] START helmfile.d/services/toolhub: apply	[production]
23:01	<bd808@deploy1002>	helmfile [staging] DONE helmfile.d/services/toolhub: apply	[production]
22:59	<bd808@deploy1002>	helmfile [staging] START helmfile.d/services/toolhub: apply	[production]
22:56	<tgr>	doing the last backport by hand due to T325252	[production]
22:49	<tgr@deploy1002>	Finished scap: Backport for [[gerrit:868047\|NewImpact: Add log event for clicking suggested edits button (T325041)]], [[gerrit:868051\|UserEditTracker: Allow querying primary DB for edit timestamp]] (duration: 11m 37s)	[production]
22:46	<denisse@cumin1001>	END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host alert1001.wikimedia.org	[production]
22:39	<tgr@deploy1002>	tgr and kharlan and tgr: Backport for [[gerrit:868047\|NewImpact: Add log event for clicking suggested edits button (T325041)]], [[gerrit:868051\|UserEditTracker: Allow querying primary DB for edit timestamp]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet	[production]
22:37	<tgr@deploy1002>	Started scap: Backport for [[gerrit:868047\|NewImpact: Add log event for clicking suggested edits button (T325041)]], [[gerrit:868051\|UserEditTracker: Allow querying primary DB for edit timestamp]]	[production]
22:36	<bking@cumin2002>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on wdqs2009.codfw.wmnet with reason: NFS troubleshooting	[production]
22:36	<bking@cumin2002>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2009.codfw.wmnet with reason: NFS troubleshooting	[production]
22:32	<denisse@cumin1001>	START - Cookbook sre.hosts.reboot-single for host alert1001.wikimedia.org	[production]
22:12	<samtar@deploy1002>	Finished scap: Backport for [[gerrit:867311\|Deployment of DiscussionTools reply visual enhancements for more wikis (T323537)]] (duration: 08m 12s)	[production]
22:06	<samtar@deploy1002>	samtar and kemayo: Backport for [[gerrit:867311\|Deployment of DiscussionTools reply visual enhancements for more wikis (T323537)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet	[production]
22:04	<samtar@deploy1002>	Started scap: Backport for [[gerrit:867311\|Deployment of DiscussionTools reply visual enhancements for more wikis (T323537)]]	[production]