production SAL

5501-5550 of 10000 results (78ms)

2023-07-11 §
07:36	<moritzm>	failover broken ganeti2014 node	[production]
07:28	<moritzm>	powercycle ganeti2014	[production]
07:22	<moritzm>	installing libxpm security updates	[production]
07:08	<moritzm>	rebalance ganeti in drmrs after reboots	[production]
06:59	<elukey>	restart kube-apiserver on ml-serve-ctrl1* as attempt to resolve spikes in latencies	[production]
06:36	<moritzm>	rebalance ganeti group eqiad/B after reboots	[production]
05:24	<rzl>	imported otelcol-contrib 0.81.0 to buster-wikimedia and bullseye-wikimedia in component thirdparty/otelcol-contrib	[production]
04:34	<rzl@deploy1002>	helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply	[production]
02:05	<mutante>	LDAP - added urbanecm to wmf group, removed from nda group (conversion volunteer to staff) T341443	[production]
2023-07-10 §
23:11	<Krinkle>	krinkle@xhgui1001$ Define new `xhgui.watches` table via xhguiadmin@m2-master.eqiad.wmnet database, ref T341499	[production]
22:12	<maryum>	Deployed security patch for T340200	[production]
21:42	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
21:39	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
21:37	<bking@deploy1002>	Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 52s)	[production]
21:36	<bking@deploy1002>	Started deploy [wdqs/wdqs@dff41b7]: 0.3.124	[production]
20:46	<btullis@deploy1002>	helmfile [staging] DONE helmfile.d/services/datahub: sync on main	[production]
20:43	<TheresNoTime>	close UTC late backport window	[production]
20:42	<samtar@deploy1002>	Finished scap: Backport for [[gerrit:936735\|Revert "log additional events on Special:Diff\|MobileDiff"]] (duration: 07m 27s)	[production]
20:42	<btullis@deploy1002>	helmfile [staging] START helmfile.d/services/datahub: apply on main	[production]
20:36	<samtar@deploy1002>	samtar: Backport for [[gerrit:936735\|Revert "log additional events on Special:Diff\|MobileDiff"]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet	[production]
20:34	<samtar@deploy1002>	Started scap: Backport for [[gerrit:936735\|Revert "log additional events on Special:Diff\|MobileDiff"]]	[production]
20:25	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'db2112 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P49544 and previous config saved to /var/cache/conftool/dbconfig/20230710-202536-ladsgroup.json	[production]
20:23	<samtar@deploy1002>	Finished scap: Backport for [[gerrit:936748\|log additional events on Special:Diff\|MobileDiff (T326212)]] (duration: 21m 42s)	[production]
20:23	<inflatador>	bking@wdqs1006 Restart wdqs-blazegraph to hopefully clear the free allocators alerts	[production]
20:19	<TheresNoTime>	syncing https://gerrit.wikimedia.org/r/c/936748 untested (T326212) for test after sync	[production]
20:14	<mutante>	miscweb1003/miscweb2003 - rm -rf /srv/org/wikimedia/static-tendril	[production]
20:10	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'db2112 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P49541 and previous config saved to /var/cache/conftool/dbconfig/20230710-201031-ladsgroup.json	[production]
20:07	<eileen>	civicrm upgraded from 0ddd1a51 to 7caf5274	[production]
20:03	<samtar@deploy1002>	samtar and jsn: Backport for [[gerrit:936748\|log additional events on Special:Diff\|MobileDiff (T326212)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet	[production]
20:02	<samtar@deploy1002>	Started scap: Backport for [[gerrit:936748\|log additional events on Special:Diff\|MobileDiff (T326212)]]	[production]
20:00	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1124.eqiad.wmnet with reason: Reboot	[production]
19:59	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 1:00:00 on db1124.eqiad.wmnet with reason: Reboot	[production]
19:55	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'db2112 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P49540 and previous config saved to /var/cache/conftool/dbconfig/20230710-195527-ladsgroup.json	[production]
19:52	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance	[production]
19:52	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance	[production]
19:40	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'db2112 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P49538 and previous config saved to /var/cache/conftool/dbconfig/20230710-194022-ladsgroup.json	[production]
19:23	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance	[production]
19:23	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance	[production]
19:17	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance	[production]
19:17	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance	[production]
19:15	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Depool db2112 T341511', diff saved to https://phabricator.wikimedia.org/P49537 and previous config saved to /var/cache/conftool/dbconfig/20230710-191511-ladsgroup.json	[production]
19:12	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Promote db2103 to s1 primary T341511', diff saved to https://phabricator.wikimedia.org/P49536 and previous config saved to /var/cache/conftool/dbconfig/20230710-191259-ladsgroup.json	[production]
19:12	<Amir1>	Starting s1 codfw failover from db2112 to db2103 - T341511	[production]
18:59	<sukhe>	running authdns-update	[production]
18:57	<ladsgroup@cumin1001>	END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbproxy1012.eqiad.wmnet	[production]
18:57	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
18:56	<sukhe>	finished commissionioning new DNS hosts in eqiad: dns100[4-6]. decomissioned dns100[1-3].	[production]
18:55	<ladsgroup@cumin1001>	START - Cookbook sre.dns.netbox	[production]
18:51	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.decommission for hosts dbproxy1012.eqiad.wmnet	[production]
18:50	<sukhe@cumin2002>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns[1002-1003].wikimedia.org	[production]