2016-03-18
ยง
|
19:52 |
<ottomata> |
temporarily disabling puppet on krypton |
[production] |
19:21 |
<ori> |
rebooting bohrium |
[production] |
19:20 |
<ori> |
upgraded bohrium VM: vcpus 2 => 8, ram 4 => 8g |
[production] |
19:06 |
<ori@tin> |
Synchronized wmf-config/logging.php: Iabca8858e: Allow finer-grained control over debug logging via XWD (duration: 00m 32s) |
[production] |
18:56 |
<demon@tin> |
Synchronized .arclint: no op really, co master sync (duration: 00m 39s) |
[production] |
18:08 |
<gehel> |
restarting elasticsearch server elastic1031.eqiad.wmnet |
[production] |
17:59 |
<mutante> |
netmon1001: failed torrus service - recovery steps as outlined on wikitech [[Torrus]] |
[production] |
17:55 |
<ori> |
on bohrium: /etc/apache2/sites-enabled/.links2 ; was causing puppet to refresh apache2 on each run |
[production] |
17:30 |
<gehel> |
restarting elasticsearch server elastic1030.eqiad.wmnet |
[production] |
17:05 |
<gehel> |
restarting elasticsearch server elastic1029.eqiad.wmnet |
[production] |
16:53 |
<jynus> |
starting enwiki import to labs from dbstore1002 (expect lag and consistency problems during the hot import) |
[production] |
16:37 |
<moritzm> |
restarted hhvm on mw1205 |
[production] |
16:30 |
<moritzm> |
bumped connection tracking table size on mw1161-mw1169 to 524288 to cope with currently elevated connections on those (T130364) |
[production] |
16:19 |
<godog> |
reboot ms-be2010 to pick up new disk ordering |
[production] |
15:23 |
<elukey@tin> |
Synchronized wmf-config/jobqueue-eqiad.php: REVERT - Re-enabled persistence between Job Queues and Job Runners. (duration: 00m 19s) |
[production] |
15:03 |
<elukey@tin> |
Synchronized wmf-config/jobqueue-eqiad.php: Re-enabled persistence between Job Queues and Job Runners. (duration: 00m 30s) |
[production] |
15:02 |
<godog> |
bootstrap restbase1013-a |
[production] |
14:36 |
<gehel> |
restarting elasticsearch server elastic1028.eqiad.wmnet |
[production] |
14:02 |
<elukey> |
restarted eventlog1001.eqiad.wmnet and eventlog2001.codfw.wmnet for kernel upgrade |
[production] |
13:43 |
<gehel> |
restarting elasticsearch server elastic1027.eqiad.wmnet |
[production] |
13:24 |
<gehel> |
restarting pybal on lvs2003.codfw.wmnet |
[production] |
13:22 |
<gehel> |
enabling all nodes for service search.svc.codfw.wmnet:9243 (elastic-https) on codfw |
[production] |
13:22 |
<gehel> |
restarting pybal on lvs2006.codfw.wmnet |
[production] |
13:06 |
<gehel> |
restarting elasticsearch server elastic1026.eqiad.wmnet |
[production] |
12:43 |
<gehel> |
restarting elasticsearch server elastic1025.eqiad.wmnet |
[production] |
12:35 |
<godog> |
finished ms-fe1* rolling reboot |
[production] |
12:15 |
<godog> |
finished ms-be1* rolling reboot |
[production] |
12:00 |
<elukey> |
Forcing puppet agent run on all the Jobrunners and videoscalers since rdb1005 is now back in service. Will also restart jobchron as well. |
[production] |
11:58 |
<elukey> |
Added rdb1005 back to the jobrunners puppet config after maintenance. |
[production] |
11:57 |
<gehel> |
restarting elasticsearch server elastic1024.eqiad.wmnet |
[production] |
11:46 |
<gehel> |
restarting pybal on lvs1003 |
[production] |
11:43 |
<elukey@tin> |
Synchronized wmf-config/jobqueue-eqiad.php: Add rdb1005 back to the Redis Job Queues after maintenance (duration: 01m 22s) |
[production] |
11:23 |
<moritzm> |
powercycled mw1163, hung on reboot and serial console stuck |
[production] |
11:05 |
<moritzm> |
rolling reboot of mw1161 to mw1169 for kernel upgrade |
[production] |
11:04 |
<gehel> |
restarting pybal on lvs1012 |
[production] |
11:04 |
<gehel> |
restarting pybal on lvs1009 |
[production] |
10:58 |
<gehel> |
activating elasticsearch-ssl service on LVS / eqiad |
[production] |
10:51 |
<gehel> |
restarting pybal on lvs1006 |
[production] |
10:48 |
<jynus> |
dbstore2002 just crashed |
[production] |
10:34 |
<godog> |
reboot ms-fe1003 for kernel upgrade |
[production] |
10:33 |
<akosiaris> |
gehel: restarting pybal on lvs1006 |
[production] |
10:27 |
<gehel> |
activating elasticsearch HTTPS on LVS for eqiad - https://gerrit.wikimedia.org/r/#/c/277956/ |
[production] |
10:06 |
<moritzm> |
rolling reboot of swift backend servers in codfw for kernel upgrade |
[production] |
09:46 |
<godog> |
rolling-reboot ms-be1* for kernel updates |
[production] |
09:37 |
<elukey> |
forcing puppet agent and restarting jobchron on all the Job Runners and VideoScalers as rdb1005 has been removed from the configs. |
[production] |
09:32 |
<elukey> |
removed rdb1005 from the Job Runners config for maintenance |
[production] |
09:24 |
<elukey@tin> |
Synchronized wmf-config/jobqueue-eqiad.php: Remove rdb1005 from the Redis Job Queues for maintenance (duration: 01m 07s) |
[production] |
09:19 |
<moritzm> |
rolling reboot of swift frontend servers in codfw for kernel upgrade |
[production] |
09:08 |
<godog> |
Issuing nodetool scrub -s -- local_group_wikipedia_T_parsoid_html data on restbase2004.eqiad.wmnet : T130254 |
[production] |
09:01 |
<moritzm> |
rolling reboot of mw1001 to mw1016 for kernel upgrade |
[production] |