production SAL

4051-4100 of 10000 results (55ms)

2021-11-19 §
02:26	<dzahn@cumin1001>	conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet	[production]
02:02	<mutante>	[puppetmaster1001:/var/run/confd-template] $ sudo rm .git-ssh*.err	[production]
02:01	<mutante>	[puppetmaster2001:/var/run/confd-template] $ sudo rm .git-ssh*.err	[production]
01:57	<legoktm@cumin1001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor2001.codfw.wmnet	[production]
01:52	<dzahn@cumin1001>	conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet	[production]
01:48	<dzahn@cumin1001>	conftool action : set/pooled=inactive; selector: name=phab2001-vcs.codfw.wmnet	[production]
01:45	<mutante>	I think git-ssh6_22 is down (see alerts lvs2008/2009) due to the v6 issue from ongoing lvs maintenance. depooled in conftool	[production]
01:42	<dzahn@cumin1001>	conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet	[production]
01:40	<legoktm@cumin1001>	START - Cookbook sre.hosts.decommission for hosts thumbor2001.codfw.wmnet	[production]
01:37	<dzahn@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .	[production]
01:35	<jhuneidi@deploy1002>	Synchronized php-1.38.0-wmf.9/extensions/Cite/modules/ve-cite/ve.dm.MWReferenceNode.js: Backport for T296044 (duration: 00m 55s)	[production]
01:34	<mwdebug-deploy@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
01:31	<dzahn@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .	[production]
01:31	<mwdebug-deploy@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
01:30	<legoktm@cumin1001>	conftool action : set/pooled=inactive; selector: name=thumbor2002.codfw.wmnet	[production]
01:30	<legoktm@cumin1001>	conftool action : set/pooled=inactive; selector: name=thumbor2001.codfw.wmnet	[production]
01:19	<dzahn@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .	[production]
01:18	<dzahn@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .	[production]
01:09	<legoktm@cumin1001>	conftool action : set/pooled=no; selector: name=thumbor2002.codfw.wmnet	[production]
01:09	<legoktm@cumin1001>	conftool action : set/pooled=no; selector: name=thumbor2001.codfw.wmnet	[production]
01:05	<legoktm@cumin1001>	conftool action : set/weight=10; selector: name=thumbor2006.codfw.wmnet	[production]
01:05	<legoktm@cumin1001>	conftool action : set/weight=10; selector: name=thumbor2005.codfw.wmnet	[production]
00:56	<legoktm@cumin1001>	conftool action : set/weight=5; selector: name=thumbor2006.codfw.wmnet	[production]
00:56	<legoktm@cumin1001>	conftool action : set/weight=5; selector: name=thumbor2005.codfw.wmnet	[production]
00:55	<legoktm@cumin1001>	conftool action : set/pooled=yes; selector: name=thumbor2006.codfw.wmnet	[production]
00:55	<legoktm@cumin1001>	conftool action : set/pooled=yes; selector: name=thumbor2005.codfw.wmnet	[production]
00:33	<dzahn@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .	[production]
00:08	<brennen>	end of UTC late deployment training window	[production]
2021-11-18 §
23:47	<ryankemper@cumin1001>	START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade + restart - ryankemper@cumin1001 - T295705	[production]
23:44	<dzahn@cumin1001>	conftool action : set/pooled=no; selector: name=kubernetes1001.eqiad.wmnet,service=miscweb	[production]
23:28	<dzahn@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .	[production]
23:27	<dzahn@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .	[production]
22:52	<ryankemper@cumin1001>	END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - T295705	[production]
22:48	<XioNoX>	asw-b-codfw> request system power-off member 7	[production]
22:44	<ryankemper@cumin1001>	START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - T295705	[production]
22:28	<mutante>	icinga (alert1001) - manually fix IP of mw1488.mgmt (was 0.0.0.0 is: 10.65.1.26) in /etc/icinga/objects/puppet_hosts.cfg , running puppet	[production]
22:06	<legoktm@cumin1001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor1003.eqiad.wmnet	[production]
21:53	<legoktm@cumin1001>	START - Cookbook sre.hosts.decommission for hosts thumbor1003.eqiad.wmnet	[production]
21:50	<legoktm@cumin1001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor1004.eqiad.wmnet	[production]
21:36	<legoktm@cumin1001>	START - Cookbook sre.hosts.decommission for hosts thumbor1004.eqiad.wmnet	[production]
21:31	<XioNoX>	asw-b-codfw> request system power-off member 7	[production]
21:30	<legoktm@cumin1001>	conftool action : set/pooled=inactive; selector: name=thumbor1004.eqiad.wmnet	[production]
21:30	<legoktm@cumin1001>	conftool action : set/pooled=inactive; selector: name=thumbor1003.eqiad.wmnet	[production]
21:01	<ejegg>	updated payments-wiki from abb2bd9d -> d1d6f024	[production]
21:00	<mutante>	[puppetmaster1001:/var/run/confd-template] $ sudo rm .git-ssh*.err	[production]
21:00	<mutante>	[puppetmaster2001:/var/run/confd-template] $ sudo rm .git-ssh*.err	[production]
20:57	<mwdebug-deploy@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
20:53	<mwdebug-deploy@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
20:52	<dzahn@cumin1001>	conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet	[production]
20:51	<dcausse>	restart blazegraph on wdqs1006 (jvm stuck)	[production]