2020-11-25
§
|
10:11 |
<kormat> |
uploaded wmfmariadbpy 0.6 to stretch+buster apt repos |
[production] |
09:54 |
<moritzm> |
uploaded krb5 1.12.1+dfsg-19+deb8u5+wmf1 to apt.wikimedia.org |
[production] |
09:52 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1076 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13405 and previous config saved to /var/cache/conftool/dbconfig/20201125-095239-root.json |
[production] |
09:45 |
<marostegui> |
Manually install apt-get install bsd-mailx on clouddb1015, labsdb1012 and labsdb1011 - T268725 |
[production] |
09:37 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1076 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13404 and previous config saved to /var/cache/conftool/dbconfig/20201125-093736-root.json |
[production] |
09:31 |
<_dcaro> |
The OSD seems to be up and running actually, though there's that misleading log, will leave it see if the cluster comes fully healthy (T268722) |
[admin] |
09:22 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1076 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13403 and previous config saved to /var/cache/conftool/dbconfig/20201125-092232-root.json |
[production] |
09:07 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1076 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13402 and previous config saved to /var/cache/conftool/dbconfig/20201125-090729-root.json |
[production] |
08:54 |
<_dcaro> |
Unsetting noup/nodown to allow re-shuffling of the pgs that osd.44 had, will try to rebuild it (T268722) |
[admin] |
08:52 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1076 for schema change', diff saved to https://phabricator.wikimedia.org/P13401 and previous config saved to /var/cache/conftool/dbconfig/20201125-085216-marostegui.json |
[production] |
08:46 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13400 and previous config saved to /var/cache/conftool/dbconfig/20201125-084603-root.json |
[production] |
08:45 |
<_dcaro> |
Tried resetting the class for osd.44 to ssd, no luck, the cluster is in noout/norebalance to avoid data shuffling (opened T268722) |
[admin] |
08:45 |
<_dcaro> |
Tried resetting the class for osd.44 to ssd, no luck, the cluster is in noout/norebalance to avoid data shuffling (opened root@cloudcephosd1005:/var/lib/ceph/osd/ceph-44# ceph osd crush set-device-class ssd osd.44) |
[admin] |
08:43 |
<kormat@deploy1001> |
Synchronized wmf-config/db-eqiad.php: Re-enable writes to es5 T268469 (duration: 00m 59s) |
[production] |
08:34 |
<kormat@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
08:34 |
<kormat@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
08:31 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13399 and previous config saved to /var/cache/conftool/dbconfig/20201125-083059-root.json |
[production] |
08:19 |
<_dcaro> |
Restarting serivce osd.44 resulted on osd.44 being unable to start due to some config inconsistency (can not reset class to hdd) |
[admin] |
08:16 |
<_dcaro> |
After enabling auto pg scaling on ceph eqiad cluster, osd.44 (cloudcephosd1005) got stuck, trying to restart the osd service |
[admin] |
08:16 |
<_dcaro> |
After enabling auto pg scaling on ceph eqiad cluster, osd.44 (cloudcephosd1005) got stuck, trying to restart |
[admin] |
08:15 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13398 and previous config saved to /var/cache/conftool/dbconfig/20201125-081556-root.json |
[production] |
08:14 |
<kormat> |
rebooting es1024 T268469 |
[production] |
08:08 |
<godog> |
swift eqiad-prod: add weight to ms-be106[0-3] - T268435 |
[production] |
08:07 |
<kormat> |
stopping mariadb on es1024 T268469 |
[production] |
08:04 |
<kormat@deploy1001> |
Synchronized wmf-config/db-eqiad.php: Disable writes to es5 T268469 (duration: 00m 58s) |
[production] |
08:02 |
<marostegui> |
Upgrade db2108 |
[production] |
08:02 |
<kormat@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
08:02 |
<kormat@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
08:00 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13397 and previous config saved to /var/cache/conftool/dbconfig/20201125-080053-root.json |
[production] |
07:19 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repool db1130', diff saved to https://phabricator.wikimedia.org/P13396 and previous config saved to /var/cache/conftool/dbconfig/20201125-071951-marostegui.json |
[production] |
07:14 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P13395 and previous config saved to /var/cache/conftool/dbconfig/20201125-071450-marostegui.json |
[production] |
06:38 |
<marostegui> |
Stop mysql on db1125:3317 to clone clouddb1014:3317 clouddb1018:3317 T267090 |
[production] |
06:33 |
<marostegui> |
Restart clouddb1019:3314, clouddb1019:3316 |
[production] |
06:32 |
<marostegui> |
Restart clouddb1015:3314, clouddb1015:3316 |
[production] |
06:28 |
<marostegui> |
Check private data on clouddb1014:3312 and clouddb1018:3312 T267090 |
[production] |
05:48 |
<marostegui> |
Sanitize clouddb1014:3312 and clouddb1018:3312 T267090 |
[production] |
01:10 |
<tgr_> |
Evening deploys done |
[production] |
01:07 |
<tgr@deploy1001> |
Finished scap: Backport: [[gerrit:643156|GrowthExperiments: Add Russian aliases (T268519)]] (duration: 32m 09s) |
[production] |
00:35 |
<tgr@deploy1001> |
Started scap: Backport: [[gerrit:643156|GrowthExperiments: Add Russian aliases (T268519)]] |
[production] |
2020-11-24
§
|
23:50 |
<crusnov@deploy1001> |
Finished deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next T266488 p2 (duration: 00m 05s) |
[production] |
23:50 |
<crusnov@deploy1001> |
Started deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next T266488 p2 |
[production] |
23:50 |
<crusnov@deploy1001> |
Finished deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next T266488 (duration: 01m 51s) |
[production] |
23:48 |
<crusnov@deploy1001> |
Started deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next T266488 |
[production] |
21:58 |
<wm-bot> |
<lucaswerkmeister> undeployed debug code, I don’t remember what it was for anymore |
[tools.lexeme-forms] |
21:56 |
<wm-bot> |
<lucaswerkmeister> deployed 59f2c38fed (the previously-uncommitted JS fix, now committed; some uncommitted debug code is still there) |
[tools.lexeme-forms] |
21:27 |
<andrewbogott> |
restarting slapd on serpens |
[production] |
21:20 |
<cdanis> |
✔️ cdanis@seaborgium.wikimedia.org ~ 🕟🍵 sudo systemctl restart prometheus-openldap-exporter.service |
[production] |
21:17 |
<andrewbogott> |
restarting slapd on seaborgium |
[production] |
20:49 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
20:42 |
<pt1979@cumin2001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |