production SAL

1-50 of 10000 results (53ms)

2019-10-05 §
06:48	<elukey>	force umount/remount of /mnt/hdfs on an-coord1001 - processes stuck in D state, fuser proc consuming a ton of memory	[production]
2019-10-04 §
22:06	<mutante>	ms-be1020 - power cycle via mgmt - host down	[production]
20:43	<krinkle@deploy1001>	Synchronized w/static.php: 9648e03, 97d9384 (duration: 00m 53s)	[production]
20:41	<mutante>	deploy1001 / deploy2001 - remove python-pygerrit2 (version for python3 is needed instead)	[production]
20:32	<mutante>	gerrit1001 - scp /usr/share/java/mysql-connector-java.jar from cobalt into /usr/share/java/ on gerrit1001 and then symlink into /var/lib/gerrit2/review_site/lib/ (T222391)	[production]
19:27	<mutante>	wtp1025 - mediawiki appserver classes are being applied, install in progress will trigger some new icinga alerts	[production]
14:03	<marostegui>	Deploy schema change on db2117 T233135 T234066	[production]
13:50	<@>	helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .	[production]
13:47	<@>	helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .	[production]
13:36	<@>	helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .	[production]
12:28	<marostegui>	Deploy schema change on db2097:3316 T233135 T234066	[production]
12:23	<elukey>	cleaned up old files and apt-cache from an-coord1001	[production]
08:41	<marostegui>	Deploy schema change on db2076 (sanitarium master) with replication T233135 T234066	[production]
08:32	<_joe_>	reuploading the old confd package to stetch-wikimedia, some incompatibility detected	[production]
07:26	<elukey>	execute gnt-instance remove kerberos1001 on ganeti1001 - T234600	[production]
07:24	<elukey@cumin1001>	END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)	[production]
07:24	<elukey@cumin1001>	START - Cookbook sre.hosts.decommission	[production]
06:40	<marostegui>	Deploy schema change on db2114 T233135 T234066	[production]
06:22	<_joe_>	downgrading confd back to 0.9.0 while some templates get fixed.	[production]
06:19	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)	[production]
06:18	<marostegui@cumin1001>	START - Cookbook sre.hosts.decommission	[production]
06:16	<marostegui>	Deploy schema change on dbstore1005:3316 T233135 T234066	[production]
05:59	<marostegui@deploy1001>	Synchronized wmf-config/db-eqiad.php: Fully repool es1019 after on-site maintenance T233698 (duration: 00m 51s)	[production]
05:53	<_joe_>	upgrading confd on puppetmaster1001 T147204	[production]
05:50	<_joe_>	uploading confd 0.16.0 on stretch T147204	[production]
05:49	<marostegui@deploy1001>	Synchronized wmf-config/db-eqiad.php: More traffic to es1019 after on-site maintenance T233698 (duration: 00m 51s)	[production]
05:11	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P9240 and previous config saved to /var/cache/conftool/dbconfig/20191004-051112-marostegui.json	[production]
05:08	<marostegui@deploy1001>	Synchronized wmf-config/db-eqiad.php: Slowly repool es1019 after on-site maintenance T233698 (duration: 00m 53s)	[production]
2019-10-03 §
23:50	<mutante>	gerrit - restarting for replication config tweaks	[production]
20:05	<@>	helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .	[production]
20:01	<@>	helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .	[production]
19:52	<XenoRyet>	updated payments-wiki from 80dead6444 to b94da68f7e	[production]
19:40	<mutante>	mw1290 - depooled and scheduled downtime in Icinga for hardware maintenance T234153	[production]
19:38	<dzahn@cumin1001>	conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet	[production]
19:30	<marxarelli>	1.34.0-wmf.25 promoted to all wikis, cc: T220750. no rise in relevant error rates. no new errors	[production]
19:21	<dduvall@deploy1001>	rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.25	[production]
19:19	<mutante>	puppetmaster1001 - revoke cert for parsoid.discovery.wmnet - creating new ones for each DC and a unified one with both (T233654)	[production]
19:11	<@>	helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .	[production]
18:52	<krinkle@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: no-op / config cached? (duration: 00m 59s)	[production]
18:43	<krinkle@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: c2b3d7ce57e9c422 (duration: 00m 59s)	[production]
18:14	<krinkle@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: no-op / config cache issue? (duration: 01m 00s)	[production]
18:03	<krinkle@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: 5389d0243ee9c (duration: 01m 01s)	[production]
17:13	<mholloway-shell@deploy1001>	Finished deploy [mobileapps/deploy@31b2703]: Update mobileapps to 1db84a7 (duration: 06m 06s)	[production]
17:07	<mholloway-shell@deploy1001>	Started deploy [mobileapps/deploy@31b2703]: Update mobileapps to 1db84a7	[production]
13:49	<elukey>	roll restart hadoop yarn resource managers for openssl updates on Hadoop workers	[production]
13:44	<marostegui>	Stop MySQL and shutdown es1019 for on-site maintenance - T233698	[production]
13:40	<marostegui@deploy1001>	Synchronized wmf-config/db-eqiad.php: Depool es1019 for on-site maintenance T233698 (duration: 01m 01s)	[production]
13:29	<hashar>	Gerrit should be back	[production]
13:26	<hashar>	restarting Gerrit due to a deadlock in SendEmail task and AccountCacheImpl	[production]
13:22	<hashar>	Gerrit might be dead again; taking traces	[production]