production SAL

2001-2050 of 10000 results (41ms)

2021-11-22 §
06:39	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1131 (re)pooling @ 5%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17786 and previous config saved to /var/cache/conftool/dbconfig/20211122-063959-root.json	[production]
06:24	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1131 (re)pooling @ 1%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17785 and previous config saved to /var/cache/conftool/dbconfig/20211122-062455-root.json	[production]
03:30	<Amir1>	run optimize table on db2140 for image table (T296143)	[production]
2021-11-21 §
13:17	<dcausse>	restarting blazegraph on wdqs1007 (jvm stuck for 10h)	[production]
07:26	<XioNoX>	cr1-eqiad# deactivate protocols bgp group Confed_eqord	[production]
05:22	<Amir1>	running clean up of djvu files in all wikis (T275268)	[production]
05:13	<Amir1>	end of djvu metadata maint script run (T275268)	[production]
2021-11-20 §
01:02	<mutante>	lists1001 - restarted apache, icinga alerts for the web UI, but recovered	[production]
00:27	<cdanis@cumin1001>	END (PASS) - Cookbook sre.network.cf (exit_code=0)	[production]
00:26	<cdanis@cumin1001>	START - Cookbook sre.network.cf	[production]
00:25	<bblack>	lvs3005 - re-enabling puppet + pybal	[production]
00:25	<legoktm@cumin1001>	END (PASS) - Cookbook sre.network.cf (exit_code=0)	[production]
00:25	<legoktm@cumin1001>	START - Cookbook sre.network.cf	[production]
00:24	<cdanis@cumin1001>	END (PASS) - Cookbook sre.network.cf (exit_code=0)	[production]
00:23	<cdanis@cumin1001>	START - Cookbook sre.network.cf	[production]
00:06	<bblack>	lvs3005 - disabling puppet and stopping pybal (traffic will go to lvs3007)	[production]
2021-11-19 §
23:52	<pt1979@cumin2002>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus2005.codfw.wmnet with OS bullseye	[production]
23:25	<pt1979@cumin2002>	START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye	[production]
23:24	<pt1979@cumin2002>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host prometheus2005.codfw.wmnet with OS bullseye	[production]
23:15	<mutante>	LDAP - added mmartorana to wmf (91354e9e-5706-4289-9a60-98e8a7632853) T295789	[production]
22:59	<pt1979@cumin2002>	START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye	[production]
20:24	<pt1979@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2018.codfw.wmnet with OS stretch	[production]
20:21	<mutante>	phabricator - adding eigyan to WMF-NDA (phab projectt 61 - https://phabricator.wikimedia.org/project/members/61/ ) - since that is now standard when adding people to the wmf LDAP group (T295928)	[production]
20:20	<legoktm@cumin1001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor2002.codfw.wmnet	[production]
20:05	<legoktm@cumin1001>	START - Cookbook sre.hosts.decommission for hosts thumbor2002.codfw.wmnet	[production]
20:00	<dzahn@cumin1001>	END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2280.codfw.wmnet	[production]
19:55	<pt1979@cumin2002>	START - Cookbook sre.hosts.reimage for host kubernetes2018.codfw.wmnet with OS stretch	[production]
19:51	<mutante>	shutting down undead server mw2280 - not icinga and puppetdb but in debmonitor and still has IP and puppet cert	[production]
19:45	<dzahn@cumin1001>	START - Cookbook sre.hosts.decommission for hosts mw2280.codfw.wmnet	[production]
18:54	<hnowlan@cumin1001>	END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001	[production]
18:10	<andrew@deploy1002>	Finished deploy [horizon/deploy@ba16257]: moving the proxy endpoint behind keystone (duration: 04m 19s)	[production]
18:06	<andrew@deploy1002>	Started deploy [horizon/deploy@ba16257]: moving the proxy endpoint behind keystone	[production]
17:45	<pt1979@cumin2002>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
17:41	<pt1979@cumin2002>	START - Cookbook sre.dns.netbox	[production]
17:25	<andrew@deploy1002>	Finished deploy [horizon/deploy@ee83e27]: fixing sudo rule editing (duration: 04m 10s)	[production]
17:21	<andrew@deploy1002>	Started deploy [horizon/deploy@ee83e27]: fixing sudo rule editing	[production]
17:19	<mwdebug-deploy@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
17:10	<mwdebug-deploy@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
16:54	<mwdebug-deploy@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
16:50	<mwdebug-deploy@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
16:42	<thcipriani@deploy1002>	rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.38.0-wmf.9 refs T293950 T296098"	[production]
16:35	<thcipriani>	rolling back to group0 for T296098	[production]
16:20	<hnowlan@cumin1001>	START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001	[production]
15:31	<akosiaris>	roll restart wtp10* php7.2-fpm excluding wtp1025, wtp1041	[production]
15:29	<akosiaris>	depooling wtp1041, wtp1025 from traffic. The entire of the parsoid cluster is in a memory pressure situation, it looks like a rolling restart of php-fpm will alleviate the pressure and gives us some time to drill more on the problem before the pressure builds up again.	[production]
15:28	<akosiaris@cumin1001>	conftool action : set/pooled=no; selector: cluster=parsoid,name=wtp1025.eqiad.wmnet	[production]
15:28	<akosiaris@cumin1001>	conftool action : set/pooled=no; selector: cluster=parsoid,name=wtp1041.eqiad.wmnet	[production]
14:52	<jmm@cumin2002>	END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet	[production]
14:49	<jmm@cumin2002>	START - Cookbook sre.ganeti.addnode for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet	[production]
14:44	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet	[production]