production SAL

6401-6450 of 10000 results (53ms)

2019-10-04 §
08:41	<marostegui>	Deploy schema change on db2076 (sanitarium master) with replication T233135 T234066	[production]
08:32	<_joe_>	reuploading the old confd package to stetch-wikimedia, some incompatibility detected	[production]
07:26	<elukey>	execute gnt-instance remove kerberos1001 on ganeti1001 - T234600	[production]
07:24	<elukey@cumin1001>	END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)	[production]
07:24	<elukey@cumin1001>	START - Cookbook sre.hosts.decommission	[production]
06:40	<marostegui>	Deploy schema change on db2114 T233135 T234066	[production]
06:22	<_joe_>	downgrading confd back to 0.9.0 while some templates get fixed.	[production]
06:19	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)	[production]
06:18	<marostegui@cumin1001>	START - Cookbook sre.hosts.decommission	[production]
06:16	<marostegui>	Deploy schema change on dbstore1005:3316 T233135 T234066	[production]
05:59	<marostegui@deploy1001>	Synchronized wmf-config/db-eqiad.php: Fully repool es1019 after on-site maintenance T233698 (duration: 00m 51s)	[production]
05:53	<_joe_>	upgrading confd on puppetmaster1001 T147204	[production]
05:50	<_joe_>	uploading confd 0.16.0 on stretch T147204	[production]
05:49	<marostegui@deploy1001>	Synchronized wmf-config/db-eqiad.php: More traffic to es1019 after on-site maintenance T233698 (duration: 00m 51s)	[production]
05:11	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P9240 and previous config saved to /var/cache/conftool/dbconfig/20191004-051112-marostegui.json	[production]
05:08	<marostegui@deploy1001>	Synchronized wmf-config/db-eqiad.php: Slowly repool es1019 after on-site maintenance T233698 (duration: 00m 53s)	[production]
2019-10-03 §
23:50	<mutante>	gerrit - restarting for replication config tweaks	[production]
20:05	<@>	helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .	[production]
20:01	<@>	helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .	[production]
19:52	<XenoRyet>	updated payments-wiki from 80dead6444 to b94da68f7e	[production]
19:40	<mutante>	mw1290 - depooled and scheduled downtime in Icinga for hardware maintenance T234153	[production]
19:38	<dzahn@cumin1001>	conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet	[production]
19:30	<marxarelli>	1.34.0-wmf.25 promoted to all wikis, cc: T220750. no rise in relevant error rates. no new errors	[production]
19:21	<dduvall@deploy1001>	rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.25	[production]
19:19	<mutante>	puppetmaster1001 - revoke cert for parsoid.discovery.wmnet - creating new ones for each DC and a unified one with both (T233654)	[production]
19:11	<@>	helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .	[production]
18:52	<krinkle@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: no-op / config cached? (duration: 00m 59s)	[production]
18:43	<krinkle@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: c2b3d7ce57e9c422 (duration: 00m 59s)	[production]
18:14	<krinkle@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: no-op / config cache issue? (duration: 01m 00s)	[production]
18:03	<krinkle@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: 5389d0243ee9c (duration: 01m 01s)	[production]
17:13	<mholloway-shell@deploy1001>	Finished deploy [mobileapps/deploy@31b2703]: Update mobileapps to 1db84a7 (duration: 06m 06s)	[production]
17:07	<mholloway-shell@deploy1001>	Started deploy [mobileapps/deploy@31b2703]: Update mobileapps to 1db84a7	[production]
13:49	<elukey>	roll restart hadoop yarn resource managers for openssl updates on Hadoop workers	[production]
13:44	<marostegui>	Stop MySQL and shutdown es1019 for on-site maintenance - T233698	[production]
13:40	<marostegui@deploy1001>	Synchronized wmf-config/db-eqiad.php: Depool es1019 for on-site maintenance T233698 (duration: 01m 01s)	[production]
13:29	<hashar>	Gerrit should be back	[production]
13:26	<hashar>	restarting Gerrit due to a deadlock in SendEmail task and AccountCacheImpl	[production]
13:22	<hashar>	Gerrit might be dead again; taking traces	[production]
13:04	<_joe_>	restarting php7 on mw1275	[production]
12:54	<onimisionipe>	force shard allocation on eqiad chi cluster	[production]
10:27	<elukey>	killed rsync processes in "D" state on stat1007, force umount/mount of /mnt/hdfs	[production]
10:25	<jbond42>	rolling upgrade of openssl packages	[production]
10:21	<Urbanecm>	Manually cleared signup throttle for IP 80.188.128.54 at cswiki, issue with introduced throttle rule	[production]
10:20	<Urbanecm>	Manually cleared signup throttle for IP 88.100.221.84 at cswiki, issue with introduced throttle rule	[production]
10:18	<Urbanecm>	Manually cleared signup throttle for IP 90.176.155.12 at cswiki, issue with introduced throttle rule	[production]
09:32	<elukey>	run apt-get autoremove incrementally on all the hadoop prod workers to remove python2 deps (and verify that they are not used anymore by Hadoop)	[production]
08:33	<marostegui>	Deploy schema change on db2087:3316 T233135 T234066	[production]
08:28	<marostegui>	Deploy schema change on db1096:3316 - T233625	[production]
08:26	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1096:3316 for schema change T233135 T234066', diff saved to https://phabricator.wikimedia.org/P9236 and previous config saved to /var/cache/conftool/dbconfig/20191003-082651-marostegui.json	[production]
08:15	<akosiaris>	slowly rolling restart all pods in eqiad, codfw, staging for log rollover before merging https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/539912	[production]