production SAL

3051-3100 of 10000 results (39ms)

2021-09-06 §
10:39	<volans@cumin1001>	END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts mc1027.eqiad.wmnet	[production]
10:38	<volans@cumin1001>	START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet	[production]
10:22	<volans@cumin1001>	END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet	[production]
10:17	<volans@cumin1001>	START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet	[production]
09:22	<gehel>	depooling wdqs1007, catching up on lag	[production]
09:06	<gehel>	restart blazegraph and updater on wdqs1007	[production]
08:46	<jbond>	update networking fact - gerrit:715943	[production]
07:57	<godog>	fail sdw on ms-be1062, reported errors	[production]
07:51	<moritzm>	installing libssh security updates	[production]
07:45	<elukey@deploy1002>	helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.	[production]
07:45	<elukey@deploy1002>	helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.	[production]
07:44	<moritzm>	installing squashfs-tools security updates	[production]
06:56	<elukey@deploy1002>	helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.	[production]
06:56	<elukey@deploy1002>	helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.	[production]
06:28	<marostegui>	Optimize table mkwiki.flaggedtemplates in eqiad T290057	[production]
06:26	<marostegui>	Optimize table bewiki.flaggedtemplates in eqiad T290057	[production]
06:23	<marostegui>	Optimize table dewiki.flaggedtemplates in eqiad T290057	[production]
05:34	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2090.codfw.wmnet with reason: REIMAGE	[production]
05:32	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on db2090.codfw.wmnet with reason: REIMAGE	[production]
05:07	<marostegui>	Stop replication on db2090 (old s4 master) T289650 T288803	[production]
05:05	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db2110 (current master) from API T289650', diff saved to https://phabricator.wikimedia.org/P17223 and previous config saved to /var/cache/conftool/dbconfig/20210906-050502-marostegui.json	[production]
05:04	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db2090 T289650', diff saved to https://phabricator.wikimedia.org/P17222 and previous config saved to /var/cache/conftool/dbconfig/20210906-050419-marostegui.json	[production]
05:01	<marostegui@cumin1001>	dbctl commit (dc=all): 'Promote db2110 to s4 primary and set section read-write T289650', diff saved to https://phabricator.wikimedia.org/P17221 and previous config saved to /var/cache/conftool/dbconfig/20210906-050140-root.json	[production]
05:00	<marostegui@cumin1001>	dbctl commit (dc=all): 'Set s4 codfw as read-only for maintenance - T289650', diff saved to https://phabricator.wikimedia.org/P17220 and previous config saved to /var/cache/conftool/dbconfig/20210906-050048-root.json	[production]
05:00	<marostegui>	Starting s4 codfw failover from db2090 to db2110 - T289650	[production]
04:07	<marostegui@cumin1001>	dbctl commit (dc=all): 'Set db2110 with weight 0 T289650', diff saved to https://phabricator.wikimedia.org/P17219 and previous config saved to /var/cache/conftool/dbconfig/20210906-040740-root.json	[production]
04:07	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 33 hosts with reason: Primary switchover s4 T289650	[production]
04:06	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on 33 hosts with reason: Primary switchover s4 T289650	[production]
2021-09-05 §
18:54	<urbanecm>	wikiadmin@10.192.0.119(ptwiki)> update protected_titles set pt_create_perm='editautoreviewprotected' where pt_create_perm='autoreviewer'; # T290396	[production]
2021-09-04 §
13:35	<marostegui@cumin1001>	dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 100%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17217 and previous config saved to /var/cache/conftool/dbconfig/20210904-133532-root.json	[production]
13:20	<marostegui@cumin1001>	dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 75%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17216 and previous config saved to /var/cache/conftool/dbconfig/20210904-132029-root.json	[production]
13:05	<marostegui@cumin1001>	dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 50%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17215 and previous config saved to /var/cache/conftool/dbconfig/20210904-130525-root.json	[production]
12:50	<marostegui@cumin1001>	dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 25%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17214 and previous config saved to /var/cache/conftool/dbconfig/20210904-125021-root.json	[production]
12:35	<marostegui@cumin1001>	dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 10%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17213 and previous config saved to /var/cache/conftool/dbconfig/20210904-123518-root.json	[production]
12:20	<marostegui@cumin1001>	dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 5%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17212 and previous config saved to /var/cache/conftool/dbconfig/20210904-122014-root.json	[production]
09:03	<elukey>	restart wmf_auto_restart_rsyslog.service on puppetdb1002	[production]
09:00	<elukey>	`systemctl reset-failed ifup@ens6.service` on puppetdb2002 - T273026	[production]
03:02	<rzl@cumin2001>	dbctl commit (dc=all): 'Depool db2137:3314', diff saved to https://phabricator.wikimedia.org/P17210 and previous config saved to /var/cache/conftool/dbconfig/20210904-030231-rzl.json	[production]
2021-09-03 §
21:49	<bd808@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .	[production]
20:30	<bd808@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .	[production]
19:33	<krinkle@deploy1002>	Finished deploy [integration/docroot@6492b3d]: I48480e89e5f6 (duration: 00m 10s)	[production]
19:33	<krinkle@deploy1002>	Started deploy [integration/docroot@6492b3d]: I48480e89e5f6	[production]
19:26	<bd808@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .	[production]
19:04	<ryankemper>	T290330 `ryankemper@cumin1001:~$ sudo -E cumin 'P{wdqs2*}' 'sudo rm -fv /etc/cron.hourly/restart-blazegraph'` (Cleaned up manually created crons now that we have [somewhat hacky] systemd timers doing the same job)	[production]
17:42	<dduvall@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .	[production]
17:40	<dduvall@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .	[production]
17:35	<dduvall@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .	[production]
17:17	<ryankemper>	T290330 Deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/717508 across `wdqs` fleet; codfw wdqs hosts will restart on average once per hour now to address ongoing availability issues for wdqs codfw	[production]
16:32	<bd808@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .	[production]
16:10	<gehel>	blazegraph (public cofdfw cluster) will now restart every hour - T290330	[production]