__all__ SAL

7351-7400 of 10000 results (72ms)

2021-04-28 §
22:15	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE	[production]
21:49	<legoktm@deploy1002>	helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.	[production]
21:49	<legoktm@deploy1002>	helmfile [staging-eqiad] START helmfile.d/admin 'apply'.	[production]
21:47	<legoktm@deploy1002>	helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.	[production]
21:46	<robh@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE	[production]
21:44	<legoktm@deploy1002>	helmfile [staging-codfw] START helmfile.d/admin 'apply'.	[production]
21:44	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE	[production]
21:41	<ryankemper@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1013.eqiad.wmnet with reason: REIMAGE	[production]
21:39	<ryankemper@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1013.eqiad.wmnet with reason: REIMAGE	[production]
21:39	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.data-transfer	[production]
21:39	<ryankemper@cumin1001>	END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)	[production]
21:38	<ryankemper>	T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`	[production]
21:37	<ryankemper>	T280382 `wdqs2007` is reachable again; glancing at `/srv/wdqs` its `wikidata.jnl` is `839G` when it should be `975G` so I'll re-do the wikidata journal transfer	[production]
21:32	<ryankemper>	T280382 [WDQS] `wdqs2007` ssh is unreachable; power cycling via `racadm>>racadm serveraction powercycle`	[production]
21:24	<ryankemper>	T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1013.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` (previous reimage timed out, instance appears to have rebooted)	[production]
21:11	<andrewbogott>	cleaning up more references to deleted hypervisors with delete from services where topic='compute' and version != 53;	[admin]
21:07	<robh@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE	[production]
21:05	<robh@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE	[production]
21:04	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE	[production]
21:03	<robh@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE	[production]
21:03	<robh@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE	[production]
21:01	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE	[production]
21:01	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE	[production]
21:01	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE	[production]
20:48	<andrewbogott>	cleaning up references to deleted hypervisors with mysql:root@localhost [nova_eqiad1]> delete from compute_nodes where hypervisor_version != '5002000';	[admin]
20:00	<robh@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
19:57	<jhuneidi@deploy1002>	rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.1"	[production]
19:56	<robh@cumin1001>	START - Cookbook sre.dns.netbox	[production]
19:40	<andrewbogott>	putting cloudvirt1040 into the maintenance aggregate pending more info about T281399	[admin]
19:13	<jhuneidi@deploy1002>	Synchronized php: group1 wikis to 1.37.0-wmf.3 refs T278347 (duration: 01m 07s)	[production]
19:12	<jhuneidi@deploy1002>	rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3 refs T278347	[production]
18:21	<legoktm>	added mvolz as listadmin for services@ and reset admin pw (T278516)	[production]
18:11	<andrewbogott>	adding cloudvirt1040, 1041 and 1042 to the 'ceph' host aggregate -- T275081	[admin]
17:46	<hnowlan>	eventlog1003 joined to groups successfully	[analytics]
17:36	<razzi>	sudo mkdir /srv/log/eventlogging and sudo chown eventlogging:eventlogging /srv/log/eventlogging to workaround missing directory puppet error (to be puppetized later)	[analytics]
17:31	<razzi>	remove deployment cache on eventlogging1003: sudo rm -fr /srv/deployment/eventlogging/analytics-cache/	[analytics]
17:26	<razzi>	manually change /srv/deployment/eventlogging/analytics/.git/DEPLOY_HEAD to deployment1002 on deployment1002 to fix puppet scap error	[analytics]
17:11	<urbanecm@deploy1002>	Synchronized php-1.37.0-wmf.3/extensions/Wikibase/client/includes/DataAccess/Scribunto/WikibaseLanguageIndependentLuaBindings.php: b392dba0d77904d7de819043e51d8c3fbf003873: Fix incorrect ItemId typehint in Lua bindings (T281361) (duration: 01m 09s)	[production]
16:53	<hnowlan>	stopping deployment-eventlog05 in deployment-prep	[analytics]
16:52	<papaul>	powerdown logstash2034 for relocation	[production]
16:32	<andrew@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE	[production]
16:30	<andrew@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE	[production]
16:29	<pt1979@cumin2001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
16:29	<andrew@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE	[production]
16:28	<andrew@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE	[production]
16:27	<andrew@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE	[production]
16:27	<pt1979@cumin2001>	START - Cookbook sre.dns.netbox	[production]
16:26	<andrew@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE	[production]
16:25	<andrew@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE	[production]
16:24	<andrew@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE	[production]