2019-05-16
ยง
|
16:23 |
<elukey> |
chown -R analytics /wmf/data/raw/webrequest - step missed in earlier on migration |
[analytics] |
16:22 |
<XioNoX> |
add BGP session to Hetzner in AMS-IX |
[production] |
16:19 |
<akosiaris> |
switch all etcd* kubestagetcd* servers from "drbd" ganeti disk template to "plain" ganeti disk template |
[production] |
16:17 |
<jbond42> |
reboot ores2001-2002 |
[production] |
16:16 |
<jbond@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
16:16 |
<jbond@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
15:59 |
<akosiaris> |
build service-checker OCI container 0.0.2 with 0.1.5 service-checker version T220401 |
[production] |
15:49 |
<jforrester@deploy1001> |
Synchronized php-1.34.0-wmf.5/extensions/CirrusSearch/includes/InterwikiSearcher.php: Hot-deploy CirrusSearch interwiki no result UBN T223449 (duration: 00m 49s) |
[production] |
15:45 |
<marostegui> |
Drop the following databases from tendril to recreated them with the right user: db1127,db1129,db1130, db1131, db1137,db1138 |
[production] |
15:35 |
<jforrester@deploy1001> |
Synchronized php-1.34.0-wmf.5/includes/specials/pagers/ContribsPager.php: Hot-deploy Contribs getNamespaceInfo UBN fix T223440 (duration: 00m 53s) |
[production] |
15:25 |
<aborrero@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=labweb1001.wikimedia.org,service=labweb |
[production] |
15:02 |
<jbond@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
15:02 |
<jbond@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
15:02 |
<jbond42> |
rebooting aqs1009 |
[production] |
14:54 |
<jbond@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
14:54 |
<jbond@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
14:54 |
<jbond42> |
rebooting aqs1008 |
[production] |
14:45 |
<jbond@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
14:45 |
<jbond@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
14:45 |
<jbond42> |
rebooting aqs1007 |
[production] |
14:34 |
<jbond@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
14:34 |
<jbond@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
14:34 |
<jbond42> |
rebooting aqs1006 |
[production] |
14:28 |
<jbond42> |
rebooting aqs1005 |
[production] |
14:21 |
<jbond@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
14:21 |
<jbond@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
14:18 |
<moritzm> |
powercycling mw2199, stuck during reboot |
[production] |
14:09 |
<elukey> |
restart the webrequest-druid-hourly-coord coordinator with the analytics user |
[analytics] |
14:08 |
<jbond@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
14:08 |
<jbond@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
14:08 |
<elukey> |
restart the webrequest-druid-daily-coord coordinator with the analytics user |
[analytics] |
14:07 |
<jbond@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
14:07 |
<jbond@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
14:07 |
<jbond@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
14:07 |
<jbond@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
13:57 |
<marostegui> |
and recreate the following hosts in tendril: db2103,db2104,db2105,db2106,db2107,db2108,db2109,db2110,db2111,db2112,db2113,db2115,db2116,db2117,db2119 T222772 |
[production] |
13:57 |
<elukey> |
start webrequest-load-bundle from hour 12:00 |
[analytics] |
13:50 |
<jmm@cumin2001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
13:50 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.downtime |
[production] |
13:39 |
<cmjohnson1> |
replacing pdu in rack B5 eqiad |
[production] |
13:27 |
<elukey> |
chown -R analytics:analytics /user/hive/warehouse/wmf_raw.db on HDFS |
[analytics] |
13:23 |
<elukey> |
chown -R analytics:analytics /wmf/data/raw/webrequests_faulty_hosts on HDFS |
[analytics] |
13:07 |
<elukey> |
chown -R analytics:analytics /wmf/data/raw/webrequests_data_loss on HDFS |
[analytics] |
13:04 |
<hashar@deploy1001> |
rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.5 |
[production] |
13:00 |
<arturo> |
labweb1001 depooled |
[production] |
12:59 |
<mobrovac> |
bootstrap restbase1020-c - T219404 |
[production] |
12:58 |
<aborrero@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=labweb1001.wikimedia.org,service=labweb |
[production] |
12:57 |
<elukey> |
chown -R analytics:analytics-privatedata-users /wmf/data/wmf/webrequest on HDFS |
[analytics] |
12:53 |
<elukey> |
kill the webrequest-load-bundle in hue - prep step to migrate the webrequest bundle to the analytics user |
[analytics] |
12:49 |
<elukey> |
kill webrequest-load-coord-upload from hue - prep step to migrate the webrequest bundle to the analytics user |
[analytics] |