__all__ SAL

1701-1750 of 10000 results (43ms)

2021-04-26 §
08:01	<elukey>	restart hadoop-mapreduce-historyserver on an-master1001 after changes to the yarn ui user	[analytics]
07:54	<jayme@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication	[production]
07:54	<jayme@cumin1001>	START - Cookbook sre.hosts.downtime for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication	[production]
07:53	<filippo@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE	[production]
07:51	<filippo@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE	[production]
07:36	<elukey>	re-enable timers after setting the capacity scheduler	[analytics]
07:32	<godog>	swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - T272836	[production]
07:31	<elukey>	restart hadoop RM on an-master* to pick up capacity scheduler changes	[analytics]
07:24	<moritzm>	installing pear security updates	[production]
07:09	<moritzm>	removed rawdog from bullseye-wikimedia, needs Py2 T280989	[production]
06:44	<elukey>	stop timers on an-launcher1002 again as prep step for capacity scheduler changes	[analytics]
06:32	<elukey>	roll restart of hadoop-yarn-nodemanagers to pick up new log4j settings - T276906	[analytics]
06:25	<elukey>	re-enable timers	[analytics]
06:24	<elukey>	reboot an-coord1001 to pick up kernel security settings (after reimage)	[production]
06:20	<elukey>	reboot an-coord1001 to pick up kernel security settings	[analytics]
05:57	<elukey>	stop timers on an-launcher1002 to allow a reboot of an-coord1001	[analytics]
05:47	<marostegui@cumin1001>	dbctl commit (dc=all): 'Add db1158 to dbctl, depooled, T258361', diff saved to https://phabricator.wikimedia.org/P15521 and previous config saved to /var/cache/conftool/dbconfig/20210426-054700-marostegui.json	[production]
05:32	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1124.eqiad.wmnet with reason: REIMAGE	[production]
05:30	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on db1124.eqiad.wmnet with reason: REIMAGE	[production]
03:43	<kart_>	Updated cxserver to 2021-04-21-044024-production (T279045)	[production]
03:41	<kartik@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .	[production]
03:37	<kartik@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .	[production]
03:32	<kartik@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .	[production]
2021-04-25 §
15:23	<Amir1>	sudo -u list /var/lib/mailman/bin/change_pw -l wikica-l -p $(pwgen -c1 -s 12) (T281066)	[production]
2021-04-24 §
22:24	<bstorm>	Rebooting labstore1007 from ilo after crash	[production]
17:47	<James_F>	Zuul: [mediawiki/extensions/MultimediaViewer] Drop Ruby selenium test job	[releng]
16:19	<arturo>	deleting 2 leaked VMs by hand: 6aefef6f-0723-499d-895f-314f4804c377 \| fullstackd-20210424153344 and af8bc9bd-ea0a-4789-b8dd-cf5cf96c31cc \| fullstackd-20210424074938 (puppet check step timed out)	[admin-monitoring]
08:03	<joal>	Rerun failed webrequest-druid-hourly-wf-2021-4-23-13	[analytics]
2021-04-23 §
22:14	<Krinkle>	Reloading Zuul to deploy https://gerrit.wikimedia.org/r/682029	[releng]
21:36	<foks>	removing 1 file for legal compliance	[production]
21:02	<wm-bot>	<root> Hard restart in an attempt to reset state information at the Toolforge front proxy	[tools.simple]
20:59	<wm-bot>	<root> Restarting webservice which seems to have died due to grid engine instability	[tools.simple]
20:15	<mutante>	[apt1001:~] $ sudo -i reprepro -C main includedeb bullseye-wikimedia /home/dzahn/rawdog_2.23-2_all.deb (T280989)	[production]
19:41	<mutante>	[apt1001:~] $ sudo -i reprepro copy bullseye-wikimedia buster-wikimedia envoyproxy - copy envoy package from buster to bullseye T280989	[production]
19:09	<ebernhardson>	closing duplicate/wrong cluster indices in cloudelastic	[production]
18:51	<Framawiki>	ran apt updates without issues on all 4 servers. T266386 looks fixed.	[quarry]
17:24	<bstorm>	rebooting toolsbeta-test-k8s-control-6 because it was "notready" for some reason	[toolsbeta]
17:02	<elukey@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet	[production]
16:35	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
16:32	<cmjohnson@cumin1001>	START - Cookbook sre.dns.netbox	[production]
16:30	<Majavah>	remove deployment-prep hiera settings for phabricator, given there is no phabricator instance on that project	[releng]
16:24	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
16:19	<cmjohnson@cumin1001>	START - Cookbook sre.dns.netbox	[production]
14:59	<jbond@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE	[production]
14:59	<jbond@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE	[production]
14:25	<moritzm>	revert back bullseye image to daily build from last week (to rule out potential reimage issue)	[production]
14:23	<elukey>	roll restart an-master100[1,2] daemons to pick up new lo4j settings - T276906	[analytics]
13:49	<dcaro>	testing the drain_cloudvirt cookbook on codfw1 openstack cluster, draining cloudvirt2001 (T280641)	[admin]
13:33	<elukey>	roll restart of all thanos-swift proxies to pick up new ML account - T280773	[production]
12:50	<jbond42>	upload new debmonitor-client packages	[production]