2021-01-08
ยง
|
13:37 |
<klausman@cumin2001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2003.codfw.wmnet with reason: REIMAGE |
[production] |
12:52 |
<klausman@cumin2001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve2001.codfw.wmnet with reason: REIMAGE |
[production] |
12:49 |
<klausman@cumin2001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2001.codfw.wmnet with reason: REIMAGE |
[production] |
12:04 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13694 and previous config saved to /var/cache/conftool/dbconfig/20210108-120415-root.json |
[production] |
11:49 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13693 and previous config saved to /var/cache/conftool/dbconfig/20210108-114912-root.json |
[production] |
11:34 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13692 and previous config saved to /var/cache/conftool/dbconfig/20210108-113408-root.json |
[production] |
11:25 |
<arturo> |
rebooting both cloudnet2002-dev/cloudnet2003-dev to make sure interfaces are set up correctl (T271517) |
[admin] |
11:22 |
<arturo> |
connecting cloudnet2002-dev cloudnet2003-dev back to vlan 2120 (T271517) |
[admin] |
11:19 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13691 and previous config saved to /var/cache/conftool/dbconfig/20210108-111905-root.json |
[production] |
11:17 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P13690 and previous config saved to /var/cache/conftool/dbconfig/20210108-111733-marostegui.json |
[production] |
11:13 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13689 and previous config saved to /var/cache/conftool/dbconfig/20210108-111345-root.json |
[production] |
11:06 |
<arturo> |
root@cloudcontrol2001-dev:~# openstack router set --external-gateway wan-transport-codfw --fixed-ip subnet=cloud-instances-transport1-b-codfw,ip-address=208.80.153.190 cloudinstances2b-gw (T271517) |
[admin] |
11:02 |
<arturo> |
root@cloudcontrol2001-dev:~# openstack router set --enable-snat cloudinstances2b-gw --external-gateway wan-transport-codfw (T271517) |
[admin] |
11:01 |
<arturo> |
enabling neutron hacks in codfw1dev (cloudnet2002-dev, cloudnet2003-dev) (T271517) |
[admin] |
10:58 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13688 and previous config saved to /var/cache/conftool/dbconfig/20210108-105842-root.json |
[production] |
10:55 |
<arturo> |
aborrero@labtestvirt2003:~ $ sudo ifdown eno2.2107 (T271517) |
[admin] |
10:55 |
<arturo> |
aborrero@labtestvirt2003:~ $ sudo ifdown eno2.2120 (T271517) |
[admin] |
10:53 |
<arturo> |
root@cloudcontrol2001-dev:~# openstack subnet create --network wan-transport-codfw --gateway 208.80.153.185 --ip-version 4 --network wan-transport-codfw --no-dhcp --subnet-range 208.80.153.184/29 cloud-instances-transport1-b-codfw (T271517) |
[admin] |
10:43 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13676 and previous config saved to /var/cache/conftool/dbconfig/20210108-104338-root.json |
[production] |
10:40 |
<dcaro> |
Finished tests, brining osd online (od.48) for eqiad ceph cluster (T271417) |
[admin] |
10:38 |
<urbanecm@deploy1001> |
Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 01m 10s) |
[production] |
10:28 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13675 and previous config saved to /var/cache/conftool/dbconfig/20210108-102835-root.json |
[production] |
10:26 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1138', diff saved to https://phabricator.wikimedia.org/P13674 and previous config saved to /var/cache/conftool/dbconfig/20210108-102606-marostegui.json |
[production] |
10:01 |
<elukey> |
restart varnishkafka-webrequest on cp5001 - timeouts to kafka-jumbo1001, librdkafka seems not recovering very well |
[analytics] |
10:01 |
<elukey> |
restart varnishkafka-webrequest on cp5001 - timeouts to kafka-jumbo1001, librdkafka seems not recovering very well |
[production] |
10:00 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: After cloning db1155:3316', diff saved to https://phabricator.wikimedia.org/P13673 and previous config saved to /var/cache/conftool/dbconfig/20210108-100040-root.json |
[production] |
09:59 |
<dcaro> |
Started performance tests on sdc (od.48) for eqiad ceph cluster (T271417) |
[admin] |
09:45 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: After cloning db1155:3316', diff saved to https://phabricator.wikimedia.org/P13672 and previous config saved to /var/cache/conftool/dbconfig/20210108-094535-root.json |
[production] |
09:41 |
<dcaro> |
Taking osd.48 from eqiad ceph cluster out to do performance tests (T271417) |
[admin] |
09:30 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: After cloning db1155:3316', diff saved to https://phabricator.wikimedia.org/P13671 and previous config saved to /var/cache/conftool/dbconfig/20210108-093032-root.json |
[production] |
09:30 |
<marostegui> |
Restart mysql on db1115 (tendril/dbtree) |
[production] |
09:15 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: After cloning db1155:3316', diff saved to https://phabricator.wikimedia.org/P13670 and previous config saved to /var/cache/conftool/dbconfig/20210108-091528-root.json |
[production] |
09:08 |
<moritzm> |
installing libxstream-java security updates on Buster |
[production] |
09:04 |
<dcaro> |
manually testing patch https://gerrit.wikimedia.org/r/c/operations/puppet/+/655019 to the puppetmaster to test (T271509) |
[cloudinfra] |
09:01 |
<godog> |
swift codfw-prod: more weight to ms-be20[58-61] - T269337 |
[production] |
08:46 |
<elukey> |
force restart of check_webrequest_partitions.service on an-launcher1002 |
[analytics] |
08:44 |
<elukey> |
force restart of monitor_refine_eventlogging_legacy_failure_flags.service |
[analytics] |
08:18 |
<elukey> |
raise default max executor heap size for Spark refine to 4G |
[analytics] |
08:12 |
<marostegui> |
Deploy schema change on s4 codfw master - T270187 |
[production] |
07:57 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P13669 and previous config saved to /var/cache/conftool/dbconfig/20210108-075714-marostegui.json |
[production] |
07:23 |
<marostegui> |
Deploy schema change on s5 codfw master - T270187 |
[production] |
06:33 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1085 to clone db1155:3316 T268742 ', diff saved to https://phabricator.wikimedia.org/P13666 and previous config saved to /var/cache/conftool/dbconfig/20210108-063301-marostegui.json |
[production] |
06:18 |
<marostegui> |
Deploy schema change on s2 codfw master - T270187 |
[production] |
04:59 |
<mutante> |
mw1266 - restart-php7.2-fpm |
[production] |
03:04 |
<ryankemper> |
[wdqs deploy] Deploy complete, service is healthy. This is done. |
[production] |
02:35 |
<ryankemper> |
[wdqs deploy] Restarting `wdqs-categories` across load-balanced instances, one host at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'` |
[production] |
02:35 |
<ryankemper> |
[wdqs deploy] Restarted `wdqs-categories` across test instances: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'` |
[production] |
02:34 |
<ryankemper> |
[wdqs deploy] Restarted `wdqs-updater` across all instances: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'` |
[production] |
02:27 |
<ryankemper@deploy1001> |
Finished deploy [wdqs/wdqs@b15fc5c]: 0.3.58 (duration: 18m 04s) |
[production] |
02:17 |
<Reedy> |
Reloading Zuul to deploy https://gerrit.wikimedia.org/r/654964 |
[releng] |