151-200 of 10000 results (36ms)
2017-06-12 §
08:31 <godog> reboot ms-be1002, load avg slowly creeping up [production]
08:22 <elukey> powercycle scb2005 (console frozen, host unresponsive) [production]
07:40 <elukey> restarted citoid on scb1001 (kept failing health checks for Error: write EPIPE) [production]
07:38 <marostegui> Reboot https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=1&host=ms-be1008 as xfs is failing [production]
07:31 <marostegui> Deploy alter table s2 - db1060 - T166205 [production]
07:31 <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1060 - T166205 (duration: 00m 41s) [production]
07:26 <elukey> ran restart-pdfrender on scb1001 (OOM errors in the dmesg from hours ago) [production]
07:22 <elukey> ran restart-pdfrender on scb1002 (OOM errors in the dmesg from hours ago) [production]
07:21 <marostegui> Deploy alter table s4 - db1064 - https://phabricator.wikimedia.org/T166206 [production]
07:19 <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1064 - T166206 (duration: 00m 41s) [production]
06:53 <moritzm> upgrade remaining app servers running HHVM 3.18 to 3.18.2+wmf5 [production]
05:38 <marostegui> Deploy alter table s4 - labsdb1003 - T166206 [production]
02:14 <l10nupdate@tin> scap failed: average error rate on 1/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/3888cca979647b9381a7739b0bdbc88e for details) [production]
2017-06-11 §
14:14 <elukey> executed cumin 'mw22[51-60].codfw.wmnet' 'find /var/log/hhvm/* -user root -exec chown www-data:www-data {} \;' to reduce cron-spam (new hosts added in March) - T146464 [production]
02:25 <l10nupdate@tin> ResourceLoader cache refresh completed at Sun Jun 11 02:25:53 UTC 2017 (duration 6m 6s) [production]
02:19 <l10nupdate@tin> scap sync-l10n completed (1.30.0-wmf.4) (duration: 07m 37s) [production]
2017-06-10 §
11:54 <andrewbogott> cleared leaked instances out of the nova fullstack test. Six were up and running and reachable, one had a network failure. [production]
10:19 <TimStarling> on terbium: running purgeParserCache.php prior to cron job due to observed disk space usage increase [production]
10:00 <marostegui> Purge binary logs on pc1006-pc2006 [production]
09:58 <marostegui> Purge binary logs on pc1004-pc2004 and pc1005-pc2005 [production]
02:22 <l10nupdate@tin> ResourceLoader cache refresh completed at Sat Jun 10 02:22:22 UTC 2017 (duration 6m 13s) [production]
02:16 <l10nupdate@tin> scap sync-l10n completed (1.30.0-wmf.4) (duration: 05m 33s) [production]
2017-06-09 §
21:18 <mobrovac@tin> Finished deploy [restbase/deploy@4e5cb35]: (no justification provided) (duration: 01m 40s) [production]
21:17 <mobrovac@tin> Started deploy [restbase/deploy@4e5cb35]: (no justification provided) [production]
21:07 <mobrovac@tin> Finished deploy [restbase/deploy@4e5cb35]: Ensure the extract field is always present in the summary response - T167045 (take #2) (duration: 05m 23s) [production]
21:02 <mobrovac@tin> Started deploy [restbase/deploy@4e5cb35]: Ensure the extract field is always present in the summary response - T167045 (take #2) [production]
21:01 <mobrovac@tin> Finished deploy [restbase/deploy@4e5cb35]: Ensure the extract field is always present in the summary response - T167045 (duration: 04m 57s) [production]
20:56 <mobrovac@tin> Started deploy [restbase/deploy@4e5cb35]: Ensure the extract field is always present in the summary response - T167045 [production]
20:54 <mobrovac@tin> Finished deploy [restbase/deploy@4e5cb35] (staging): Ensure the extract field is always present in the summary response (duration: 03m 39s) [production]
20:50 <mobrovac@tin> Started deploy [restbase/deploy@4e5cb35] (staging): Ensure the extract field is always present in the summary response [production]
20:12 <demon@tin> Synchronized php-1.30.0-wmf.4/extensions/CirrusSearch/includes/Job/DeleteArchive.php: Really fix it this time (duration: 00m 43s) [production]
19:49 <mutante> fermium: $ sudo /usr/local/sbin/disable_list wikino-bureaucrats (T166848) [production]
19:46 <RainbowSprinkles> mw1299: running scap pull, maybe out of date? [production]
18:12 <gehel> retry allocation of failed shards on elasticsearch eqiad [production]
15:47 <_joe_> installed python-service-checker 0.1.3 on einsteinium,tegmen T167048 [production]
15:44 <_joe_> uploaded service-checker 0.1.3 [production]
15:11 <_joe_> upgraded python-service-checker to 0.1.2 on tegmen,einsteinium [production]
13:18 <godog> upgrade thumbor to 0.1.40 - T167462 [production]
12:36 <gehel> reducing high watermark on elasticsearch eqiad to rebalance shards [production]
07:51 <elukey> run megacli -LDSetProp -Direct -LALL -aALL on analytics[1058-1068] - T166140 [production]
07:40 <moritzm> upgrade app servers in codfw running HHVM 3.18 to +wmf5 [production]
07:26 <elukey> run megacli -LDSetProp ADRA -LALL -aALL on analytics[1058-1068] - T166140 [production]
07:15 <elukey> deleted /etc/logrotate.d/nova-manage from labtestvirt2003 to reduce cronspam (same solution used in T132422#2679434) [production]
06:58 <moritzm> updating mw117* to HHVM 3.18+wmf5 [production]
06:41 <moritzm> updating mw1161 to HHVM 3.18 [production]
05:57 <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1056 - T166206 (duration: 00m 41s) [production]
05:51 <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1074 - T166205 (duration: 00m 42s) [production]
02:25 <l10nupdate@tin> ResourceLoader cache refresh completed at Fri Jun 9 02:25:29 UTC 2017 (duration 6m 27s) [production]
02:19 <l10nupdate@tin> scap sync-l10n completed (1.30.0-wmf.4) (duration: 06m 04s) [production]
00:36 <ejegg> disabled banner impressions loader [production]