2014-08-18
§
|
15:00 |
<bd808> |
Rebooting deployment-eventlogging02 via wikitech; console filling with OOM killer messages and puppet runs failing with "Cannot allocate memory - fork(2)" |
[releng] |
14:50 |
<hashar> |
Jenkins: reverting PHP CodeSniffer upgrade. We are back to 1.4.7. {{gerrit|154825}} had some issue. |
[production] |
14:42 |
<hashar> |
Jenkins: upgrading PHP Codesniffer from 1.4.7 to 1.4.8 (thanks to addshore {{gerrit|154053}}) |
[production] |
14:39 |
<bd808> |
No apache2.log in fluorine:/a/mw-log; Last file in /a/mw-log/archive is apache2.log-20140816.gz |
[production] |
14:31 |
<bd808> |
Restarted logstash on logstash1001; event volume was lower than expected |
[production] |
14:29 |
<bd808> |
Forced puppet run on deployment-cache-upload02 |
[releng] |
14:27 |
<bd808> |
Forced puppet run on deployment-cache-text02 |
[releng] |
14:24 |
<bd808> |
Forced puppet run on deployment-cache-mobile03 |
[releng] |
14:20 |
<bd808> |
Forced puppet run on deployment-cache-bits01 |
[releng] |
13:49 |
<hashar> |
restarting zuul. Got stuck again. |
[production] |
13:29 |
<hashar_> |
Restarted Zuul, some items where stuck in queue. Retrigger your jobs (revote +2 / new patchset / 'recheck' comment) |
[production] |
13:23 |
<reedy> |
Synchronized php-1.24wmf17/extensions/ExtensionDistributor: Unbreak ExtensionDistributor (duration: 00m 13s) |
[production] |
13:18 |
<hashar> |
Zuul stuck, looking. |
[production] |
13:06 |
<Reedy> |
Large amount of incoming traffic to bast1001 is me uploading files |
[production] |
12:11 |
<godog> |
rebalanced swift object ring in eqiad |
[production] |
09:34 |
<godog> |
reenabled puppet on neon and started ircecho |
[production] |
09:24 |
<godog> |
stop ircecho again on neon, disable puppet on neon |
[production] |
09:11 |
<godog> |
restarted apache2 on strontium |
[production] |
08:58 |
<godog> |
stopped ircecho on neon while diagnosing puppet failure |
[production] |
03:13 |
<LocalisationUpdate> |
ResourceLoader cache refresh completed at Mon Aug 18 03:12:27 UTC 2014 (duration 12m 26s) |
[production] |
03:06 |
<hoo> |
Ran sync-common on mw1053 to stop "Unrecognized job type 'ChangeNotification'." exceptions |
[production] |
02:31 |
<LocalisationUpdate> |
completed (1.24wmf17) at 2014-08-18 02:30:17+00:00 |
[production] |
02:20 |
<LocalisationUpdate> |
completed (1.24wmf16) at 2014-08-18 02:18:52+00:00 |
[production] |
2014-08-17
§
|
22:58 |
<bd808> |
Attempting to reboot deployment-cache-bits01.eqiad.wmflabs via wikitech |
[releng] |
22:56 |
<bd808> |
deployment-cache-bits01.eqiad.wmflabs not allowing ssh access and wikitech console full of OOM killer messages |
[releng] |
21:08 |
<legoktm> |
running migrateAccount.php without --safe or --auto on terbium for bug 69291 |
[production] |
18:45 |
<hashar> |
Zuul upgraded |
[production] |
18:41 |
<hashar> |
Upgrading Zuul to latest version (that is not a friday afterall) |
[production] |
09:22 |
<springle> |
ongoing schema change wikidatawiki & testwiki wb_entity_per_page.epp_redirect_target. osc_host.sh processes on terbium ok to kill in emergency |
[production] |
04:34 |
<ottomata> |
restarted udp2log on oxygen |
[production] |
03:05 |
<LocalisationUpdate> |
ResourceLoader cache refresh completed at Sun Aug 17 03:04:22 UTC 2014 (duration 4m 21s) |
[production] |
02:49 |
<springle> |
killed stuff on labsdb1003 using all disk for temp tables. investigating |
[production] |
02:24 |
<LocalisationUpdate> |
completed (1.24wmf17) at 2014-08-17 02:23:08+00:00 |
[production] |
02:14 |
<LocalisationUpdate> |
completed (1.24wmf16) at 2014-08-17 02:13:35+00:00 |
[production] |
2014-08-16
§
|
18:12 |
<bblack> |
(amssq33: and yes, removing from fe/be cache pools) |
[production] |
18:11 |
<bblack> |
powering off amssq33, it's clipping network traffic at peak times due to bad ethernet connection negotiated down to 100Mbps (see existing RT 7933 in esams queue) |
[production] |
18:02 |
<bblack> |
ms-be1006: syslog indicates it started generating repeated "BUG: soft lockup" 10 minutes before dying, in XFS kernel code again... |
[production] |
17:55 |
<bblack> |
rebooting ms-be1006, ping-dead in icinga for 23m, console was unresponsive |
[production] |
17:37 |
<bblack> |
restarted apache2 on palladium... looks like something went horribly wrong with its puppet of itself that somehow killed off puppetmaster service? |
[production] |
03:07 |
<LocalisationUpdate> |
ResourceLoader cache refresh completed at Sat Aug 16 03:06:29 UTC 2014 (duration 6m 28s) |
[production] |
02:27 |
<LocalisationUpdate> |
completed (1.24wmf17) at 2014-08-16 02:26:02+00:00 |
[production] |
02:17 |
<LocalisationUpdate> |
completed (1.24wmf16) at 2014-08-16 02:16:00+00:00 |
[production] |
2014-08-15
§
|
21:57 |
<legoktm> |
set $wgVERPsecret in PrivateSettings.php |
[releng] |
21:42 |
<hashSpeleology> |
Beta cluster database updates are broken due to CentralNotice. Fix up is {{gerrit|154231}} |
[releng] |
20:59 |
<kaldari> |
Synchronized php-1.24wmf16/extensions/MobileFrontend/less: fixing iOS search bug (duration: 00m 05s) |
[production] |
20:57 |
<hashSpeleology> |
deployment-rsync01 : deleting /usr/local/apache/common-local content. Then ln -s /srv/common-local /usr/local/apache/common-local as set by beta::common which is not applied on that host for some reason. {{bug|69590}} |
[releng] |
20:55 |
<hashSpeleology> |
puppet administratively disabled on mediawiki02 . Assuming some work in progress on that host. Leaving it untouched |
[releng] |
20:54 |
<hashSpeleology> |
puppet is proceeding on mediawiki01 |
[releng] |
20:52 |
<hashSpeleology> |
attempting to unbreak mediawiki code update {{bug|69590}} by cherry picking {{gerrit|154329}} |
[releng] |
20:39 |
<hashSpeleology> |
in case it is not in SAL. MediaWiki is no more synced to app server {{bug|69590}} |
[releng] |