production SAL

7051-7100 of 10000 results (121ms)

2023-07-11 §
08:01	<jelto@cumin1001>	START - Cookbook sre.hosts.downtime for 1:00:00 on contint2002.wikimedia.org with reason: Switch contint hosts for hardware replacement	[production]
08:01	<jelto@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on contint2001.wikimedia.org with reason: Switch contint hosts for hardware replacement	[production]
08:01	<jelto@cumin1001>	START - Cookbook sre.hosts.downtime for 1:00:00 on contint2001.wikimedia.org with reason: Switch contint hosts for hardware replacement	[production]
07:55	<kart_>	Updated MinT to 2023-07-10-051738-production (T341335, T333969)	[production]
07:54	<kartik@deploy1002>	helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply	[production]
07:49	<kartik@deploy1002>	helmfile [codfw] START helmfile.d/services/machinetranslation: apply	[production]
07:47	<kartik@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply	[production]
07:42	<kartik@deploy1002>	helmfile [eqiad] START helmfile.d/services/machinetranslation: apply	[production]
07:38	<kartik@deploy1002>	helmfile [staging] DONE helmfile.d/services/machinetranslation: apply	[production]
07:36	<kartik@deploy1002>	helmfile [staging] START helmfile.d/services/machinetranslation: apply	[production]
07:36	<moritzm>	failover broken ganeti2014 node	[production]
07:28	<moritzm>	powercycle ganeti2014	[production]
07:22	<moritzm>	installing libxpm security updates	[production]
07:08	<moritzm>	rebalance ganeti in drmrs after reboots	[production]
06:59	<elukey>	restart kube-apiserver on ml-serve-ctrl1* as attempt to resolve spikes in latencies	[production]
06:36	<moritzm>	rebalance ganeti group eqiad/B after reboots	[production]
05:24	<rzl>	imported otelcol-contrib 0.81.0 to buster-wikimedia and bullseye-wikimedia in component thirdparty/otelcol-contrib	[production]
04:34	<rzl@deploy1002>	helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply	[production]
02:05	<mutante>	LDAP - added urbanecm to wmf group, removed from nda group (conversion volunteer to staff) T341443	[production]
2023-07-10 §
23:11	<Krinkle>	krinkle@xhgui1001$ Define new `xhgui.watches` table via xhguiadmin@m2-master.eqiad.wmnet database, ref T341499	[production]
22:12	<maryum>	Deployed security patch for T340200	[production]
21:42	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
21:39	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
21:37	<bking@deploy1002>	Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 52s)	[production]
21:36	<bking@deploy1002>	Started deploy [wdqs/wdqs@dff41b7]: 0.3.124	[production]
20:46	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
20:43	<TheresNoTime>	close UTC late backport window	[production]
20:42	<samtar@deploy1002>	Finished scap: Backport for [[gerrit:936735\|Revert "log additional events on Special:Diff\|MobileDiff"]] (duration: 07m 27s)	[production]
20:42	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
20:36	<samtar@deploy1002>	samtar: Backport for [[gerrit:936735\|Revert "log additional events on Special:Diff\|MobileDiff"]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet	[production]
20:34	<samtar@deploy1002>	Started scap: Backport for [[gerrit:936735\|Revert "log additional events on Special:Diff\|MobileDiff"]]	[production]
20:25	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'db2112 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P49544 and previous config saved to /var/cache/conftool/dbconfig/20230710-202536-ladsgroup.json	[production]
20:23	<samtar@deploy1002>	Finished scap: Backport for [[gerrit:936748\|log additional events on Special:Diff\|MobileDiff (T326212)]] (duration: 21m 42s)	[production]
20:23	<inflatador>	bking@wdqs1006 Restart wdqs-blazegraph to hopefully clear the free allocators alerts	[production]
20:19	<TheresNoTime>	syncing https://gerrit.wikimedia.org/r/c/936748 untested (T326212) for test after sync	[production]
20:14	<mutante>	miscweb1003/miscweb2003 - rm -rf /srv/org/wikimedia/static-tendril	[production]
20:10	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'db2112 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P49541 and previous config saved to /var/cache/conftool/dbconfig/20230710-201031-ladsgroup.json	[production]
20:07	<eileen>	civicrm upgraded from 0ddd1a51 to 7caf5274	[production]
20:03	<samtar@deploy1002>	samtar and jsn: Backport for [[gerrit:936748\|log additional events on Special:Diff\|MobileDiff (T326212)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet	[production]
20:02	<samtar@deploy1002>	Started scap: Backport for [[gerrit:936748\|log additional events on Special:Diff\|MobileDiff (T326212)]]	[production]
20:00	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1124.eqiad.wmnet with reason: Reboot	[production]
19:59	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 1:00:00 on db1124.eqiad.wmnet with reason: Reboot	[production]
19:55	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'db2112 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P49540 and previous config saved to /var/cache/conftool/dbconfig/20230710-195527-ladsgroup.json	[production]
19:52	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance	[production]
19:52	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance	[production]
19:40	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'db2112 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P49538 and previous config saved to /var/cache/conftool/dbconfig/20230710-194022-ladsgroup.json	[production]
19:23	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance	[production]
19:23	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance	[production]
19:17	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance	[production]
19:17	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance	[production]