2020-11-25
§
|
10:52 |
<jbond42> |
failover idp primary to idp2001 |
[production] |
10:51 |
<kormat> |
deployed wmfmariadbpy 0.6.1 to `C:wmfmariadbpy` |
[production] |
10:43 |
<kormat> |
uploaded wmfmariadbpy 0.6.1 to stretch+buster apt repos |
[production] |
10:21 |
<jynus> |
upgrade wmfbackup-check package on alert* hosts |
[production] |
10:11 |
<kormat> |
uploaded wmfmariadbpy 0.6 to stretch+buster apt repos |
[production] |
09:54 |
<moritzm> |
uploaded krb5 1.12.1+dfsg-19+deb8u5+wmf1 to apt.wikimedia.org |
[production] |
09:52 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1076 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13405 and previous config saved to /var/cache/conftool/dbconfig/20201125-095239-root.json |
[production] |
09:45 |
<marostegui> |
Manually install apt-get install bsd-mailx on clouddb1015, labsdb1012 and labsdb1011 - T268725 |
[production] |
09:37 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1076 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13404 and previous config saved to /var/cache/conftool/dbconfig/20201125-093736-root.json |
[production] |
09:31 |
<_dcaro> |
The OSD seems to be up and running actually, though there's that misleading log, will leave it see if the cluster comes fully healthy (T268722) |
[admin] |
09:22 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1076 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13403 and previous config saved to /var/cache/conftool/dbconfig/20201125-092232-root.json |
[production] |
09:07 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1076 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13402 and previous config saved to /var/cache/conftool/dbconfig/20201125-090729-root.json |
[production] |
08:54 |
<_dcaro> |
Unsetting noup/nodown to allow re-shuffling of the pgs that osd.44 had, will try to rebuild it (T268722) |
[admin] |
08:52 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1076 for schema change', diff saved to https://phabricator.wikimedia.org/P13401 and previous config saved to /var/cache/conftool/dbconfig/20201125-085216-marostegui.json |
[production] |
08:46 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13400 and previous config saved to /var/cache/conftool/dbconfig/20201125-084603-root.json |
[production] |
08:45 |
<_dcaro> |
Tried resetting the class for osd.44 to ssd, no luck, the cluster is in noout/norebalance to avoid data shuffling (opened T268722) |
[admin] |
08:45 |
<_dcaro> |
Tried resetting the class for osd.44 to ssd, no luck, the cluster is in noout/norebalance to avoid data shuffling (opened root@cloudcephosd1005:/var/lib/ceph/osd/ceph-44# ceph osd crush set-device-class ssd osd.44) |
[admin] |
08:43 |
<kormat@deploy1001> |
Synchronized wmf-config/db-eqiad.php: Re-enable writes to es5 T268469 (duration: 00m 59s) |
[production] |
08:34 |
<kormat@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
08:34 |
<kormat@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
08:31 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13399 and previous config saved to /var/cache/conftool/dbconfig/20201125-083059-root.json |
[production] |
08:19 |
<_dcaro> |
Restarting serivce osd.44 resulted on osd.44 being unable to start due to some config inconsistency (can not reset class to hdd) |
[admin] |
08:16 |
<_dcaro> |
After enabling auto pg scaling on ceph eqiad cluster, osd.44 (cloudcephosd1005) got stuck, trying to restart the osd service |
[admin] |
08:16 |
<_dcaro> |
After enabling auto pg scaling on ceph eqiad cluster, osd.44 (cloudcephosd1005) got stuck, trying to restart |
[admin] |
08:15 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13398 and previous config saved to /var/cache/conftool/dbconfig/20201125-081556-root.json |
[production] |
08:14 |
<kormat> |
rebooting es1024 T268469 |
[production] |
08:08 |
<godog> |
swift eqiad-prod: add weight to ms-be106[0-3] - T268435 |
[production] |
08:07 |
<kormat> |
stopping mariadb on es1024 T268469 |
[production] |
08:04 |
<kormat@deploy1001> |
Synchronized wmf-config/db-eqiad.php: Disable writes to es5 T268469 (duration: 00m 58s) |
[production] |
08:02 |
<marostegui> |
Upgrade db2108 |
[production] |
08:02 |
<kormat@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
08:02 |
<kormat@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
08:00 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13397 and previous config saved to /var/cache/conftool/dbconfig/20201125-080053-root.json |
[production] |
07:19 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repool db1130', diff saved to https://phabricator.wikimedia.org/P13396 and previous config saved to /var/cache/conftool/dbconfig/20201125-071951-marostegui.json |
[production] |
07:14 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P13395 and previous config saved to /var/cache/conftool/dbconfig/20201125-071450-marostegui.json |
[production] |
06:38 |
<marostegui> |
Stop mysql on db1125:3317 to clone clouddb1014:3317 clouddb1018:3317 T267090 |
[production] |
06:33 |
<marostegui> |
Restart clouddb1019:3314, clouddb1019:3316 |
[production] |
06:32 |
<marostegui> |
Restart clouddb1015:3314, clouddb1015:3316 |
[production] |
06:28 |
<marostegui> |
Check private data on clouddb1014:3312 and clouddb1018:3312 T267090 |
[production] |
05:48 |
<marostegui> |
Sanitize clouddb1014:3312 and clouddb1018:3312 T267090 |
[production] |
01:10 |
<tgr_> |
Evening deploys done |
[production] |
01:07 |
<tgr@deploy1001> |
Finished scap: Backport: [[gerrit:643156|GrowthExperiments: Add Russian aliases (T268519)]] (duration: 32m 09s) |
[production] |
00:35 |
<tgr@deploy1001> |
Started scap: Backport: [[gerrit:643156|GrowthExperiments: Add Russian aliases (T268519)]] |
[production] |