production SAL

6101-6150 of 10000 results (39ms)

2020-04-27 §
11:13	<jdrewniak@deploy1001>	Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:592634\| Bumping portals to master (563985)]] (duration: 00m 58s)	[production]
11:09	<cparle@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: [SDC] Enable constraints on production commons (duration: 00m 57s)	[production]
11:08	<cparle@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: [SDC] Enable constraints on production commons (duration: 00m 58s)	[production]
10:52	<hoo>	Running the pruneItemsPerSite on mwmaint1002 maintenance script for Wikidata (T249613)	[production]
10:52	<hoo@deploy1001>	Synchronized php-1.35.0-wmf.28/extensions/Wikibase: pruneItemsPerSite: Fix join_condition call signature (T249613) (duration: 01m 02s)	[production]
10:49	<hoo@deploy1001>	Synchronized php-1.35.0-wmf.28/extensions/Wikibase: pruneItemsPerSite: Fix join_condition call signature (T249613) (duration: 01m 01s)	[production]
10:32	<mutante>	contint2001 - systemd status was degraded. icinga alerted. failed unit was jenkins. starting it failed with "address already in use". manually started without using systemctl? killed jenkins and started again with systemctl. T224591	[production]
10:29	<mutante>	contint2001 - jenkins failed and can't start because address is already in use	[production]
10:22	<addshore>	depool and restart wdqs1007 (deadlocks) T242453	[production]
09:54	<hoo@deploy1001>	Synchronized php-1.35.0-wmf.28/extensions/Wikibase: Add pruneItemsPerSite maintenance script (T249613) (duration: 01m 06s)	[production]
09:34	<jynus@cumin2001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)	[production]
09:34	<jynus@cumin2001>	START - Cookbook sre.hosts.decommission	[production]
09:34	<jynus@cumin2001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)	[production]
09:33	<jynus@cumin2001>	START - Cookbook sre.hosts.decommission	[production]
09:33	<jynus@cumin2001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)	[production]
09:32	<jynus@cumin2001>	START - Cookbook sre.hosts.decommission	[production]
09:32	<jynus@cumin2001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)	[production]
09:31	<jynus@cumin2001>	START - Cookbook sre.hosts.decommission	[production]
09:25	<marostegui>	Stop MySQL on labsdb1012 to reclone labsdb1011 - T249188	[production]
09:11	<marostegui>	Deploy schema change on s1 codfw, lag will show up - T250055	[production]
08:52	<moritzm>	restarting cas on idp1001 to pick up Java 11 security update (will void active SSO sessions)	[production]
08:26	<marostegui>	Deploy schema change on s5 codfw, lag will show up - T250055	[production]
08:24	<kormat>	Truncating and optimizing parsercache for pc1010 and pc2010 T247787	[production]
08:18	<mutante>	running puppet on all cp-ats	[production]
08:15	<godog>	add 80G to prometheus global LV	[production]
07:25	<elukey>	roll restart elastic-chi on cloudelastic100[1-4] to pick up the last JVM GC settings - T231517	[production]
07:15	<marostegui>	Kill updateSpecialPages.php wikidatawiki --override --only=Fewestrevisions as it is causing lag - T238199	[production]
07:14	<elukey>	powercycle an-worker1089 - unreachable via ssh, mgmt serial available, soft cpu lock events registered in dmesg	[production]
06:59	<elukey>	force ifdown/ifup eno1 on analytics1052 - interface negotiated speed flapping	[production]
06:42	<moritzm>	installing Java security updates on IDP hosts, will void current SSO sessions	[production]
06:30	<elukey@puppetmaster1001>	conftool action : set/pooled=inactive; selector: name=mw1280.eqiad.wmnet	[production]
06:22	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
06:19	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
06:00	<marostegui>	Stop MySQL on labsdb1011 for reimage - T249188	[production]
05:58	<moritzm>	installing git security updates on jessie	[production]
05:56	<marostegui>	Compress tables on db1104 - T232446	[production]
05:53	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1104 for defragmentation - T232446', diff saved to https://phabricator.wikimedia.org/P11039 and previous config saved to /var/cache/conftool/dbconfig/20200427-055320-marostegui.json	[production]
05:47	<vgutierrez>	rolling restart ats-tls in cp[1085,1089] and text@esams - T249335	[production]
05:33	<marostegui>	Depool labsdb1011 T249188	[production]
2020-04-26 §
18:08	<elukey>	powercycle puppetmaster1001 - mgmt serial console not usable, no ssh, racadm getsel doesn't show anything	[production]
2020-04-25 §
10:23	<addshore>	going to restart and probably depool for a short time wdqs1005 as it is in a deadlock T242453	[production]
05:52	<_joe_>	depooling mw1407 again, should not be serving traffic	[production]
05:27	<shdubsh>	restart elasticsearch on logstash2022	[production]
2020-04-24 §
21:25	<cdanis@cumin1001>	conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad	[production]
19:41	<Amir1>	applying T114117 on labswiki (wikitech)	[production]
18:58	<shdubsh>	restart elasticsearch on logstash2021	[production]
18:50	<shdubsh>	restart elasticsearch on logstash2020	[production]
15:12	<cdanis@cumin1001>	conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad	[production]
15:08	<addshore>	depool and restart wdqs1006 to catch up with lag after deadlock T242453	[production]
11:13	<Amir1>	apply T250071 on s10 (labswiki)	[production]