1901-1950 of 10000 results (34ms)
2020-11-25 §
09:31 <_dcaro> The OSD seems to be up and running actually, though there's that misleading log, will leave it see if the cluster comes fully healthy (T268722) [admin]
09:22 <marostegui@cumin1001> dbctl commit (dc=all): 'db1076 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13403 and previous config saved to /var/cache/conftool/dbconfig/20201125-092232-root.json [production]
09:07 <marostegui@cumin1001> dbctl commit (dc=all): 'db1076 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13402 and previous config saved to /var/cache/conftool/dbconfig/20201125-090729-root.json [production]
08:54 <_dcaro> Unsetting noup/nodown to allow re-shuffling of the pgs that osd.44 had, will try to rebuild it (T268722) [admin]
08:52 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1076 for schema change', diff saved to https://phabricator.wikimedia.org/P13401 and previous config saved to /var/cache/conftool/dbconfig/20201125-085216-marostegui.json [production]
08:46 <marostegui@cumin1001> dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13400 and previous config saved to /var/cache/conftool/dbconfig/20201125-084603-root.json [production]
08:45 <_dcaro> Tried resetting the class for osd.44 to ssd, no luck, the cluster is in noout/norebalance to avoid data shuffling (opened T268722) [admin]
08:45 <_dcaro> Tried resetting the class for osd.44 to ssd, no luck, the cluster is in noout/norebalance to avoid data shuffling (opened root@cloudcephosd1005:/var/lib/ceph/osd/ceph-44# ceph osd crush set-device-class ssd osd.44) [admin]
08:43 <kormat@deploy1001> Synchronized wmf-config/db-eqiad.php: Re-enable writes to es5 T268469 (duration: 00m 59s) [production]
08:34 <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
08:34 <kormat@cumin1001> START - Cookbook sre.hosts.downtime [production]
08:31 <marostegui@cumin1001> dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13399 and previous config saved to /var/cache/conftool/dbconfig/20201125-083059-root.json [production]
08:19 <_dcaro> Restarting serivce osd.44 resulted on osd.44 being unable to start due to some config inconsistency (can not reset class to hdd) [admin]
08:16 <_dcaro> After enabling auto pg scaling on ceph eqiad cluster, osd.44 (cloudcephosd1005) got stuck, trying to restart the osd service [admin]
08:16 <_dcaro> After enabling auto pg scaling on ceph eqiad cluster, osd.44 (cloudcephosd1005) got stuck, trying to restart [admin]
08:15 <marostegui@cumin1001> dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13398 and previous config saved to /var/cache/conftool/dbconfig/20201125-081556-root.json [production]
08:14 <kormat> rebooting es1024 T268469 [production]
08:08 <godog> swift eqiad-prod: add weight to ms-be106[0-3] - T268435 [production]
08:07 <kormat> stopping mariadb on es1024 T268469 [production]
08:04 <kormat@deploy1001> Synchronized wmf-config/db-eqiad.php: Disable writes to es5 T268469 (duration: 00m 58s) [production]
08:02 <marostegui> Upgrade db2108 [production]
08:02 <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
08:02 <kormat@cumin1001> START - Cookbook sre.hosts.downtime [production]
08:00 <marostegui@cumin1001> dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: After cloning the new clouddb hosts', diff saved to https://phabricator.wikimedia.org/P13397 and previous config saved to /var/cache/conftool/dbconfig/20201125-080053-root.json [production]
07:19 <marostegui@cumin1001> dbctl commit (dc=all): 'Repool db1130', diff saved to https://phabricator.wikimedia.org/P13396 and previous config saved to /var/cache/conftool/dbconfig/20201125-071951-marostegui.json [production]
07:14 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P13395 and previous config saved to /var/cache/conftool/dbconfig/20201125-071450-marostegui.json [production]
06:38 <marostegui> Stop mysql on db1125:3317 to clone clouddb1014:3317 clouddb1018:3317 T267090 [production]
06:33 <marostegui> Restart clouddb1019:3314, clouddb1019:3316 [production]
06:32 <marostegui> Restart clouddb1015:3314, clouddb1015:3316 [production]
06:28 <marostegui> Check private data on clouddb1014:3312 and clouddb1018:3312 T267090 [production]
05:48 <marostegui> Sanitize clouddb1014:3312 and clouddb1018:3312 T267090 [production]
01:10 <tgr_> Evening deploys done [production]
01:07 <tgr@deploy1001> Finished scap: Backport: [[gerrit:643156|GrowthExperiments: Add Russian aliases (T268519)]] (duration: 32m 09s) [production]
00:35 <tgr@deploy1001> Started scap: Backport: [[gerrit:643156|GrowthExperiments: Add Russian aliases (T268519)]] [production]
2020-11-24 §
23:50 <crusnov@deploy1001> Finished deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next T266488 p2 (duration: 00m 05s) [production]
23:50 <crusnov@deploy1001> Started deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next T266488 p2 [production]
23:50 <crusnov@deploy1001> Finished deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next T266488 (duration: 01m 51s) [production]
23:48 <crusnov@deploy1001> Started deploy [netbox/deploy@0362a12]: Test deploy of 2.9.10 to netbox-next T266488 [production]
21:58 <wm-bot> <lucaswerkmeister> undeployed debug code, I don’t remember what it was for anymore [tools.lexeme-forms]
21:56 <wm-bot> <lucaswerkmeister> deployed 59f2c38fed (the previously-uncommitted JS fix, now committed; some uncommitted debug code is still there) [tools.lexeme-forms]
21:27 <andrewbogott> restarting slapd on serpens [production]
21:20 <cdanis> ✔️ cdanis@seaborgium.wikimedia.org ~ 🕟🍵 sudo systemctl restart prometheus-openldap-exporter.service [production]
21:17 <andrewbogott> restarting slapd on seaborgium [production]
20:49 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
20:42 <pt1979@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
20:41 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
20:40 <pt1979@cumin2001> START - Cookbook sre.hosts.downtime [production]
19:53 <otto@deploy1001> Synchronized wmf-config/InitialiseSettings-labs.php: Remove no longer needed EventLoggingSchemas override for NavigationTiming and ResourceTiming - T254606 (duration: 01m 01s) [production]
19:49 <ryankemper> [elasticsearch] Restarted all elasticsearch systemd-managed services on `relforge100[1,2]`: `elasticsearch_6@relforge-eqiad.service` and `elasticsearch_6@relforge-eqiad-small-alpha.service` [production]
19:33 <elukey> kill and restart webrequest_load bundle to pick up analytics-hive.eqiad.wmnet settings [analytics]