2021-01-13
§
|
08:04 |
<ryankemper> |
[WDQS Deploy] Deploy is complete, and the WDQS service is healthy |
[production] |
07:59 |
<moritzm> |
draining ganeti4001 for eventual reboot |
[production] |
07:29 |
<ryankemper> |
[WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'` |
[production] |
07:29 |
<ryankemper> |
[WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'` |
[production] |
07:28 |
<ryankemper> |
[WDQS Deploy] Restarted `wdqs-updater` across all hosts simultaneously: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'` |
[production] |
07:28 |
<ryankemper@deploy1001> |
Finished deploy [wdqs/wdqs@fdd2c2f]: 0.3.59 (duration: 14m 23s) |
[production] |
07:15 |
<ryankemper> |
[WDQS Deploy] All tests passing on canary instance `wdqs1003` following canary deploy. Proceeding to rest of fleet... |
[production] |
07:13 |
<ryankemper@deploy1001> |
Started deploy [wdqs/wdqs@fdd2c2f]: 0.3.59 |
[production] |
07:13 |
<ryankemper> |
[WDQS Deploy] All tests passing on canary instance `wdqs1003` prior to start of deploy. Proceeding with canary deploy of version `0.3.59`... |
[production] |
07:04 |
<ryankemper> |
T266492 T268779 T265699 Restarting cloudelastic to apply new readahead changes, this will also verify cloudelastic support works in our elasticsearch spicerack code. Only going one node at a time because cloudelastic elasticsearch indices only have 1 replica shard per index. |
[production] |
07:03 |
<ryankemper@cumin1001> |
START - Cookbook sre.elasticsearch.rolling-restart |
[production] |
06:55 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: After cloning db1155:3317', diff saved to https://phabricator.wikimedia.org/P13745 and previous config saved to /var/cache/conftool/dbconfig/20210113-065535-root.json |
[production] |
06:40 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: After cloning db1155:3317', diff saved to https://phabricator.wikimedia.org/P13744 and previous config saved to /var/cache/conftool/dbconfig/20210113-064031-root.json |
[production] |
06:25 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: After cloning db1155:3317', diff saved to https://phabricator.wikimedia.org/P13743 and previous config saved to /var/cache/conftool/dbconfig/20210113-062528-root.json |
[production] |
06:10 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: After cloning db1155:3317', diff saved to https://phabricator.wikimedia.org/P13742 and previous config saved to /var/cache/conftool/dbconfig/20210113-061024-root.json |
[production] |
2021-01-12
§
|
22:55 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2225.codfw.wmnet |
[production] |
22:55 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2224.codfw.wmnet |
[production] |
22:46 |
<crusnov@deploy1001> |
Finished deploy [netbox/deploy@b17db99]: Rerun production deploy of Netbox 2.9 just in case T266487 (duration: 00m 05s) |
[production] |
22:46 |
<crusnov@deploy1001> |
Started deploy [netbox/deploy@b17db99]: Rerun production deploy of Netbox 2.9 just in case T266487 |
[production] |
22:37 |
<chaomodus> |
Upgrade of Netbox to 2.9 complete, checking support software. T266487 |
[production] |
22:32 |
<crusnov@deploy1001> |
Finished deploy [netbox/deploy@b17db99]: Deploy Netbox 2.9.10 to production T266487 (duration: 02m 33s) |
[production] |
22:30 |
<crusnov@deploy1001> |
Started deploy [netbox/deploy@b17db99]: Deploy Netbox 2.9.10 to production T266487 |
[production] |
22:12 |
<chaomodus> |
Merged Netbox 2.9 related changes in puppet and -extras; testing on -next T266487 |
[production] |
22:07 |
<bblack> |
reboot authdns1001 - T266746#6741647 |
[production] |
22:04 |
<chaomodus> |
proceeding with Netbox 2.9 upgrade T266487 |
[production] |
22:02 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2225.codfw.wmnet with reason: REIMAGE |
[production] |
22:00 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2225.codfw.wmnet with reason: REIMAGE |
[production] |
21:57 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2224.codfw.wmnet with reason: REIMAGE |
[production] |
21:55 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw2224.codfw.wmnet with reason: REIMAGE |
[production] |
21:50 |
<jforrester@deploy1001> |
Synchronized php-1.36.0-wmf.25/extensions/AbuseFilter/modules/mode-abusefilter.js: T271487 Don't pass protocol-relative URLs to the Ace worker (duration: 01m 06s) |
[production] |
21:41 |
<ottomata> |
rolling restart of eventgate-analytics-external pods |
[production] |
20:40 |
<tgr_> |
running 'mwscript extensions/ORES/maintenance/PopulateDatabase.php --wiki=ukwiki' on terbium |
[production] |
19:57 |
<tgr_> |
backports done |
[production] |
19:52 |
<bblack> |
dns1001,authdns1001 - upgrade gdnsd to 3.5.0 |
[production] |
19:49 |
<tgr_> |
synced Config: [[gerrit:654520|Disable DiscussionTools' upcoming newtopictool (T270119)]] |
[production] |
19:48 |
<tgr_> |
synced Config: [[gerrit:655723|Migrate HomepageVisit and ServerSideAccountCreation to Event Platform on testwiki (T267333)]] |
[production] |
19:48 |
<tgr_> |
synced Config: [[gerrit:655706|Migrate SuggestedTagsAction to Event Platform on testwiki (T267351)]] |
[production] |
19:48 |
<tgr_> |
synced Config: [[gerrit:655301|Alphabetize ORES settings (T256887)]] |
[production] |
19:46 |
<tgr@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:655302|Enable ORES filters on ukwiki (T256887)]] (duration: 01m 05s) |
[production] |
19:32 |
<tgr@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Bunch of no-op/testwiki changes: [[gerrit:654520]], [[gerrit:655301]], [[gerrit:655706]], [[gerrit:655723]] (duration: 01m 05s) |
[production] |
19:27 |
<bblack> |
dns3001,dns4001 - upgrade gdnsd to 3.5.0 |
[production] |
19:25 |
<ottomata> |
rolling restart of eventgate-analytics-external pods to clear schema caches - T267333 |
[production] |
19:01 |
<ariel@deploy1001> |
Synchronized php-1.36.0-wmf.26/includes/api/ApiQueryInfo.php: Backport: (gerrit 655671) Fix undefined index error in ApiQueryInfo (T271815) (duration: 01m 06s) |
[production] |
18:06 |
<bblack> |
dns2001,dns5001 - upgrade gdnsd to 3.5.0 |
[production] |
17:40 |
<bblack> |
dnsX002 - upgrade gdnsd to 3.5.0 |
[production] |
17:20 |
<herron> |
roll restarting eqiad/codfw low-traffic pybals for kibana-next -> kibana7 rename |
[production] |
17:11 |
<jmm@cumin2001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) |
[production] |
17:09 |
<jynus> |
shutting down db2132, db2078:m1 for m1 codfw replica reprovisioning T270877 |
[production] |
17:09 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.reboot-single |
[production] |
17:09 |
<moritzm> |
rebooting people1002 (people.wikimedia.org) |
[production] |