2014-07-30
ยง
|
19:46 |
<bd808> |
Restored prior /etc/hhvm/php.ini from puppet filebucket archive on deployment-mediawiki0[12] |
[releng] |
19:32 |
<bd808> |
Disabled puppet on deployment-mediawiki02 for the same reason |
[releng] |
19:31 |
<bd808> |
Disabled puppet on deployment-mediawiki01; Ori will look into hhvm config changes that were being applied |
[releng] |
19:20 |
<bblack> |
icinga is now substantially back online. email/sms still disabled for now, and downtimes/acks need to be re-added for known issues |
[production] |
19:06 |
<csteipp> |
Synchronized php-1.24wmf14/includes/: (no message) (duration: 00m 05s) |
[production] |
19:04 |
<csteipp> |
Synchronized php-1.24wmf15/includes/: (no message) (duration: 00m 07s) |
[production] |
18:59 |
<bblack> |
icinga coming back up again for the first time, expect random strangeness to be ignored |
[production] |
18:46 |
<bblack> |
temporarily hard-disabling email/sms from icinga via 'mv /usr/bin/mail /usr/bin/mail-disabled' on neon to prevent icinga spam on next startup attempt |
[production] |
17:55 |
<bblack> |
stopping icinga service for now while working out other details |
[production] |
17:25 |
<tacotuesday> |
repooled elastic1018 and elastic1019 as well |
[production] |
17:21 |
<Coren> |
labmon1001 rebooting (final check for proper raid+lvm autodetection) |
[production] |
17:08 |
<bblack> |
working on bringing up new neon install (first puppet run, etc) |
[production] |
17:01 |
<Coren> |
labmon1001 rebooting (partitioning changes on primary disks) |
[production] |
16:53 |
<tacotuesday> |
elastic1017 repooled, shards allocating |
[production] |
16:52 |
<bd808> |
Fixed beta-scap-eqiad Jenkins job by correcting ssh problems in beta project |
[releng] |
16:43 |
<bd808> |
Fixed ssh to jobrunner01 and videoscaler01 by correcting unrelated puppet manifest problem and forcing run via salt. |
[releng] |
16:13 |
<bd808> |
scap and dologmsg from tin won't work until neon is back up and running tcpircbot |
[production] |
16:07 |
<bd808|deploy> |
Synchronized touch: no-op sync to test scap update (duration: 00m 05s) |
[production] |
16:06 |
<bd808|deploy> |
scap announce failed -- timeout connecting to tcpircbot on neon.wikimedia.org |
[production] |
16:04 |
<bd808|deploy> |
Updated scap to 4871208 (rely on $PATH for scap scripts) |
[production] |
16:00 |
<bd808> |
Puppet runs on videoscaler01 and jobrunner01 failing for "Could not find dependency Ferm::Rule[bastion-ssh] for Ferm::Rule[deployment-bastion-scap-ssh]" |
[releng] |
16:00 |
<bd808> |
Puppet seems manually disabled on apache0[12]. |
[releng] |
15:59 |
<bd808> |
Can't ssh to apache0[12], videoscaler01 and jobrunner01. Puppet not running on any of them. libnss-ldapd unattended update has broken /etc/nslcd.conf |
[releng] |
15:23 |
<bd808> |
Removed cherry-pick for Iac547efa83cf059a1276b6e279c3ebd4c7224b2c and updated cherry-pick for I5afba2c6b0fbf90ff8495cc4a82f5c7851893b52 to latest patch set. |
[releng] |
15:21 |
<hoo> |
Synchronized php-1.24wmf15/extensions/Wikidata/extensions/Wikibase/lib/resources/wikibase.js: touch (duration: 00m 20s) |
[production] |
15:17 |
<hashar> |
upgrading php5 on jenkins slaves |
[production] |
15:07 |
<cmjohnson1> |
shutting down neon |
[production] |
15:05 |
<bd808> |
Two cherry-picks in puppet conflicting with merged production changes: I5afba2c6b0fbf90ff8495cc4a82f5c7851893b52 and Iac547efa83cf059a1276b6e279c3ebd4c7224b2c (ori, twentyafterfour) |
[releng] |
14:49 |
<bd808> |
Started apache2 service on deployment-mediawiki01 |
[releng] |
14:46 |
<demon> |
Synchronized wmf-config/CirrusSearch-production.php: (no message) (duration: 00m 04s) |
[production] |
14:35 |
<demon> |
Synchronized wmf-config/PrivateSettings.php: Swift config for Cirrus (duration: 00m 08s) |
[production] |
14:30 |
<godog> |
rolling restart of ms-fe* to pick up search backup user |
[production] |
14:17 |
<bblack> |
rebooting neon again, trying to fix the disk situation |
[production] |
14:16 |
<hashar> |
rebooting hhvm |
[releng] |
14:11 |
<Coren> |
reinstalling labmon1001 -> change disk partitioning scheme |
[production] |
13:50 |
<springle> |
neon read-only fs. fsck + reboot |
[production] |
13:17 |
<manybubbles> |
rebuiding Cirrus index for commons to pick up weighted all field |
[production] |
11:17 |
<_joe_> |
enabling puppet on all mw* servers |
[production] |
11:15 |
<_joe_> |
re-enabling puppet on mw1019, last bunch of tests, then re-enabling globally |
[production] |
10:58 |
<_joe_> |
re-enabling puppet on mw1018, testwiki upgraded to the new config and looks fine |
[production] |
09:42 |
<hashar> |
bastion had broken puppet because deployment_server and zuul both declare the same python packages {{gerrit|150501}} |
[releng] |
09:40 |
<hashar> |
restoring on puppetmaster modules/mediawiki/templates/apache/apache2.conf.erb which got deleted somehow |
[releng] |
09:29 |
<hashar> |
Rebooting apache01/02 to see whether it fix the ssh connection issue |
[releng] |
09:27 |
<hashar> |
manually started hhvm on mediawiki01 |
[releng] |
09:25 |
<godog> |
set weight for ms-be1014 and ms-be1015 to 2300 |
[production] |
09:25 |
<hashar> |
rebooting deployment-mediawiki01 hhvm process went zombie |
[releng] |
09:23 |
<hashar> |
restarting hhvm on mediawiki 01/02 |
[releng] |
09:05 |
<hashar_> |
Beta scap script broken since 6:30am UTC https://integration.wikimedia.org/ci/job/beta-scap-eqiad/ |
[releng] |
08:58 |
<_joe_> |
stopping puppet on the appservers, in preparation for releasing change 148099 |
[production] |
08:30 |
<_joe_> |
powercycling neon, doesn't respond to requests, ssh hangs, console dark |
[production] |