production SAL

151-200 of 10000 results (35ms)

2017-06-12 §
08:31	<godog>	reboot ms-be1002, load avg slowly creeping up	[production]
08:22	<elukey>	powercycle scb2005 (console frozen, host unresponsive)	[production]
07:40	<elukey>	restarted citoid on scb1001 (kept failing health checks for Error: write EPIPE)	[production]
07:38	<marostegui>	Reboot https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=1&host=ms-be1008 as xfs is failing	[production]
07:31	<marostegui>	Deploy alter table s2 - db1060 - T166205	[production]
07:31	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Depool db1060 - T166205 (duration: 00m 41s)	[production]
07:26	<elukey>	ran restart-pdfrender on scb1001 (OOM errors in the dmesg from hours ago)	[production]
07:22	<elukey>	ran restart-pdfrender on scb1002 (OOM errors in the dmesg from hours ago)	[production]
07:21	<marostegui>	Deploy alter table s4 - db1064 - https://phabricator.wikimedia.org/T166206	[production]
07:19	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Depool db1064 - T166206 (duration: 00m 41s)	[production]
06:53	<moritzm>	upgrade remaining app servers running HHVM 3.18 to 3.18.2+wmf5	[production]
05:38	<marostegui>	Deploy alter table s4 - labsdb1003 - T166206	[production]
02:14	<l10nupdate@tin>	scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details)	[production]
2017-06-11 §
14:14	<elukey>	executed cumin 'mw22[51-60].codfw.wmnet' 'find /var/log/hhvm/* -user root -exec chown www-data:www-data {} \;' to reduce cron-spam (new hosts added in March) - T146464	[production]
02:25	<l10nupdate@tin>	ResourceLoader cache refresh completed at Sun Jun 11 02:25:53 UTC 2017 (duration 6m 6s)	[production]
02:19	<l10nupdate@tin>	scap sync-l10n completed (1.30.0-wmf.4) (duration: 07m 37s)	[production]
2017-06-10 §
11:54	<andrewbogott>	cleared leaked instances out of the nova fullstack test. Six were up and running and reachable, one had a network failure.	[production]
10:19	<TimStarling>	on terbium: running purgeParserCache.php prior to cron job due to observed disk space usage increase	[production]
10:00	<marostegui>	Purge binary logs on pc1006-pc2006	[production]
09:58	<marostegui>	Purge binary logs on pc1004-pc2004 and pc1005-pc2005	[production]
02:22	<l10nupdate@tin>	ResourceLoader cache refresh completed at Sat Jun 10 02:22:22 UTC 2017 (duration 6m 13s)	[production]
02:16	<l10nupdate@tin>	scap sync-l10n completed (1.30.0-wmf.4) (duration: 05m 33s)	[production]
2017-06-09 §
21:18	<mobrovac@tin>	Finished deploy [restbase/deploy@4e5cb35]: (no justification provided) (duration: 01m 40s)	[production]
21:17	<mobrovac@tin>	Started deploy [restbase/deploy@4e5cb35]: (no justification provided)	[production]
21:07	<mobrovac@tin>	Finished deploy [restbase/deploy@4e5cb35]: Ensure the extract field is always present in the summary response - T167045 (take #2) (duration: 05m 23s)	[production]
21:02	<mobrovac@tin>	Started deploy [restbase/deploy@4e5cb35]: Ensure the extract field is always present in the summary response - T167045 (take #2)	[production]
21:01	<mobrovac@tin>	Finished deploy [restbase/deploy@4e5cb35]: Ensure the extract field is always present in the summary response - T167045 (duration: 04m 57s)	[production]
20:56	<mobrovac@tin>	Started deploy [restbase/deploy@4e5cb35]: Ensure the extract field is always present in the summary response - T167045	[production]
20:54	<mobrovac@tin>	Finished deploy [restbase/deploy@4e5cb35] (staging): Ensure the extract field is always present in the summary response (duration: 03m 39s)	[production]
20:50	<mobrovac@tin>	Started deploy [restbase/deploy@4e5cb35] (staging): Ensure the extract field is always present in the summary response	[production]
20:12	<demon@tin>	Synchronized php-1.30.0-wmf.4/extensions/CirrusSearch/includes/Job/DeleteArchive.php: Really fix it this time (duration: 00m 43s)	[production]
19:49	<mutante>	fermium: $ sudo /usr/local/sbin/disable_list wikino-bureaucrats (T166848)	[production]
19:46	<RainbowSprinkles>	mw1299: running scap pull, maybe out of date?	[production]
18:12	<gehel>	retry allocation of failed shards on elasticsearch eqiad	[production]
15:47	<_joe_>	installed python-service-checker 0.1.3 on einsteinium,tegmen T167048	[production]
15:44	<_joe_>	uploaded service-checker 0.1.3	[production]
15:11	<_joe_>	upgraded python-service-checker to 0.1.2 on tegmen,einsteinium	[production]
13:18	<godog>	upgrade thumbor to 0.1.40 - T167462	[production]
12:36	<gehel>	reducing high watermark on elasticsearch eqiad to rebalance shards	[production]
07:51	<elukey>	run megacli -LDSetProp -Direct -LALL -aALL on analytics[1058-1068] - T166140	[production]
07:40	<moritzm>	upgrade app servers in codfw running HHVM 3.18 to +wmf5	[production]
07:26	<elukey>	run megacli -LDSetProp ADRA -LALL -aALL on analytics[1058-1068] - T166140	[production]
07:15	<elukey>	deleted /etc/logrotate.d/nova-manage from labtestvirt2003 to reduce cronspam (same solution used in T132422#2679434)	[production]
06:58	<moritzm>	updating mw117* to HHVM 3.18+wmf5	[production]
06:41	<moritzm>	updating mw1161 to HHVM 3.18	[production]
05:57	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Repool db1056 - T166206 (duration: 00m 41s)	[production]
05:51	<marostegui@tin>	Synchronized wmf-config/db-eqiad.php: Repool db1074 - T166205 (duration: 00m 42s)	[production]
02:25	<l10nupdate@tin>	ResourceLoader cache refresh completed at Fri Jun 9 02:25:29 UTC 2017 (duration 6m 27s)	[production]
02:19	<l10nupdate@tin>	scap sync-l10n completed (1.30.0-wmf.4) (duration: 06m 04s)	[production]
00:36	<ejegg>	disabled banner impressions loader	[production]