2016-10-22
§
|
15:37 |
<oblivian@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=cp1052.eqiad.wmnet |
[production] |
15:02 |
<bblack@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=cp1052.eqiad.wmnet |
[production] |
15:02 |
<bblack> |
repool cp1052 - T148891 |
[production] |
14:52 |
<bblack> |
rebooted cp1052 - T148891 |
[production] |
14:26 |
<bblack> |
depooled cp1052 (cache_text@eqiad, ethernet linkdown for unknown reasons) |
[production] |
13:45 |
<mforns> |
created 0002687-161020124223818-oozie-oozi-C to re-run webrequest-load-check_sequence_statistics-wf-upload-2016-10-21-17 (oozie errors) |
[analytics] |
12:52 |
<paladox> |
deploying https://gerrit.wikimedia.org/r/317315 T147582 |
[tools.lolrrit-wm] |
12:44 |
<elukey> |
created 0002631-161020124223818-oozie-oozi-C to re-run webrequest-load-check_sequence_statistics-wf-upload-2016-10-21-17 (oozie errors) |
[analytics] |
12:34 |
<marostegui> |
Stopping replication in db2055 to use it to clone another host - T146261 |
[production] |
2016-10-21
§
|
23:45 |
<mutante> |
depooling maps1002 (by running "depool" on the server itself) |
[production] |
23:35 |
<yurik> |
maps1002.eqiad is running older/incorrect/misbehaving software for some reason, restart didn't help. Need to depool |
[production] |
22:17 |
<mutante> |
cp4006,cp4014 gzipped some logs in home for disk space |
[production] |
22:08 |
<mutante> |
cp4006, cp4014 were running out of disk, apt-get clean |
[production] |
21:40 |
<mutante> |
phab2001 that IP was also on iridium/phab1001, it should not be hardcoded in puppet, causing issues in T143363 |
[production] |
21:37 |
<mutante> |
phab2001 - ip addr del 10.64.32.186/21 dev eth0 |
[production] |
21:06 |
<bblack> |
restarting varnish backends (depooled, etc) for eqiad cache_upload: cp1049, cp1072, cp1074 |
[production] |
19:50 |
<cmjohnson1> |
dataset1001 array 1 swap failed disk slot 4 |
[production] |
19:40 |
<cmjohnson1> |
labvirt1005 swapping disk 0 |
[production] |
19:40 |
<gehel> |
routing traffic for cache-maps in codfw -> eqiad |
[production] |
19:29 |
<gehel> |
running puppet on eqiad cache nodes to activate maps traffic redirection |
[production] |
19:06 |
<gehel> |
shutting down cassandra on maps2004, seems to have lost data |
[production] |
18:22 |
<ejegg> |
updated SmashPig from d1ca0632d00dfb608f70ca4b70251a5ba49f4411 to e28b2cd9f0c1429acdd2a08c68f95884dbffb594 |
[production] |
17:03 |
<madhuvishy> |
Applied new changes to previously unreachable instance maps-warper.maps (T147657) |
[maps] |
16:45 |
<ejegg> |
updated fundraising tools from 09ae6e24d8ca8350dc099d63a6ca0d9ec9fdef2b to f83e39291adc55677fc4b49307dc4807eba18019 |
[production] |
16:33 |
<mutante> |
rebooting planet1001 - *.planet.wm.org will be right back |
[production] |
16:30 |
<mutante> |
rebooting planet2001 |
[production] |
16:05 |
<elukey> |
reimaging mc1021 with wmf-auto-reimage (T137345) |
[production] |
15:33 |
<elukey> |
created 0001564-161020124223818-oozie-oozi-C to re-run webrequest-load-check_sequence_statistics-wf-upload-2016-10-21-14 (oozie errors) |
[analytics] |
15:28 |
<elukey> |
reimaging mc1019 with wmf-auto-reimage (T137345) |
[production] |
14:50 |
<elukey> |
reimaging mc1020 with wmf-auto-reimage (T137345) |
[production] |
14:31 |
<_joe_> |
rebooting all kubernetes worker nodes in production |
[production] |
14:31 |
<moritzm> |
rolling reboot of thumbor* for kernel update |
[production] |
14:29 |
<marostegui> |
Stopping replication on db2055 to use it to clone another host - T146261 |
[production] |
13:55 |
<bblack> |
restart isc-dhcp-server on carbon |
[production] |
13:55 |
<moritzm> |
rolling reboot of thumbor* for kernel update |
[production] |
13:40 |
<moritzm> |
completed rolling reboot of restbase in codfw |
[production] |
13:14 |
<marostegui> |
Deploying schema change S6 ruwiki for table ores_model - T147734 |
[production] |
12:29 |
<elukey> |
created 0001387-161020124223818-oozie-oozi-C for webrequest-load-check_sequence_statistics-wf-upload-2016-10-21-11 (oozie errors) |
[analytics] |
12:24 |
<moritzm> |
rebooting ruthenium for kernel update |
[production] |
12:02 |
<moritzm> |
rebooting bromine for kernel update |
[production] |
11:28 |
<gehel> |
starting rolling restart of elasticsearch eqiad cluster |
[production] |
11:04 |
<moritzm> |
rebooting hafnium for kernel update |
[production] |
10:49 |
<jynus@mira> |
Synchronized wmf-config/db-eqiad.php: mariadb: pool db1053 as the new rc special slave after maintenance (duration: 01m 00s) |
[production] |
10:36 |
<marostegui> |
Deploying schema change S2 several wikis for table ores_model - T147734 |
[production] |
10:33 |
<elukey> |
re-run webrequest-load-wf-text-2016-10-21-00 and webrequest-load-wf-maps-2016-10-21-00 |
[analytics] |
10:28 |
<bblack> |
rebooting radon (ns0) |
[production] |
10:22 |
<moritzm> |
rolling reboot of restbase in codfw for kernel update |
[production] |
10:09 |
<marostegui> |
Deploying schema change S7 fawiki.ores_model - T147734 |
[production] |
10:04 |
<moritzm> |
rebooting seaborgium (labs LDAP server) for kernel update |
[production] |