5001-5050 of 10000 results (30ms)
2021-01-08 §
03:04 <ryankemper> [wdqs deploy] Deploy complete, service is healthy. This is done. [production]
02:35 <ryankemper> [wdqs deploy] Restarting `wdqs-categories` across load-balanced instances, one host at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'` [production]
02:35 <ryankemper> [wdqs deploy] Restarted `wdqs-categories` across test instances: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'` [production]
02:34 <ryankemper> [wdqs deploy] Restarted `wdqs-updater` across all instances: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'` [production]
02:27 <ryankemper@deploy1001> Finished deploy [wdqs/wdqs@b15fc5c]: 0.3.58 (duration: 18m 04s) [production]
02:15 <ryankemper> [wdqs deploy] Nevermind - the UI failure I mentioned above is transient. Restarting my ssh tunnel seemed to make the problem go away. Proceeding with deploy [production]
02:12 <ryankemper> [wdqs deploy] While queries run fine, it looks like there might be a UI glitch in this version. Digging in to see if it's transient, but I'll likely be aborting this deploy [production]
02:09 <ryankemper@deploy1001> Started deploy [wdqs/wdqs@b15fc5c]: 0.3.58 [production]
02:09 <ryankemper> [wdqs deploy] Tests passing on canary before beginning wdqs deploy, proceeding [production]
01:29 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1267.eqiad.wmnet [production]
01:28 <mutante> mw1276, mw1277 - first API appervers on buster, now serving traffic, free to depool if any issues [production]
01:28 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1277.eqiad.wmnet [production]
01:28 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1276.eqiad.wmnet [production]
01:24 <mutante> mw1266 - another buster appserver now serving traffic [production]
01:24 <mutante> mw1265 - raised weight to 25 like regular appservers (buster) [production]
01:23 <dzahn@cumin1001> conftool action : set/weight=25; selector: name=mw1265.eqiad.wmnet [production]
01:18 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1266.eqiad.wmnet [production]
01:17 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1277.eqiad.wmnet [production]
01:17 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1276.eqiad.wmnet [production]
01:16 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1267.eqiad.wmnet [production]
01:12 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1266.eqiad.wmnet [production]
00:27 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1277.eqiad.wmnet with reason: REIMAGE [production]
00:25 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1267.eqiad.wmnet with reason: REIMAGE [production]
00:23 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1277.eqiad.wmnet with reason: REIMAGE [production]
00:23 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1276.eqiad.wmnet with reason: REIMAGE [production]
00:22 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1267.eqiad.wmnet with reason: REIMAGE [production]
00:21 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1276.eqiad.wmnet with reason: REIMAGE [production]
00:17 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1266.eqiad.wmnet with reason: REIMAGE [production]
00:15 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1266.eqiad.wmnet with reason: REIMAGE [production]
00:06 <jforrester@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Undeploy graphoid on enwiki T271495 (duration: 00m 57s) [production]
2021-01-07 §
23:55 <mutante> reimaging mw1267,mw1276,mw1277 [production]
23:28 <mutante> reimaging mw1266 [production]
23:14 <andrew@deploy1001> Finished deploy [horizon/deploy@25ffdee]: trying to debug a compression error that doesn't happen on the test host (duration: 02m 00s) [production]
23:12 <andrew@deploy1001> Started deploy [horizon/deploy@25ffdee]: trying to debug a compression error that doesn't happen on the test host [production]
22:54 <andrew@deploy1001> Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 00m 04s) [production]
22:54 <andrew@deploy1001> Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host [production]
22:52 <andrew@deploy1001> Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 07m 44s) [production]
22:44 <andrew@deploy1001> Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host [production]
22:41 <andrew@deploy1001> Finished deploy [striker/deploy@e4db843]: striker -> labweb1002 (duration: 00m 04s) [production]
22:41 <andrew@deploy1001> Started deploy [striker/deploy@e4db843]: striker -> labweb1002 [production]
22:39 <andrew@deploy1001> Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 00m 06s) [production]
22:39 <andrew@deploy1001> Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host [production]
22:31 <robh@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
22:24 <robh@cumin1001> START - Cookbook sre.dns.netbox [production]
22:19 <andrew@cumin1001> conftool action : set/pooled=inactive; selector: name=labweb1002.wikimedia.org [production]
22:12 <jhuneidi@deploy1001> rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.25 refs T267418 [production]
21:43 <jforrester@deploy1001> Synchronized php-1.36.0-wmf.25/extensions/CodeMirror/resources/ext.CodeMirror.js: T271457 Guard against WikiEditor being removed by the time the hook runs (duration: 01m 05s) [production]
21:16 <andrew@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on labweb1002.wikimedia.org with reason: REIMAGE [production]
21:14 <andrew@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on labweb1002.wikimedia.org with reason: REIMAGE [production]
21:10 <jhuneidi@deploy1001> rebuilt and synchronized wikiversions files: Revert "group[2] wikis to 1.36.0-wmf.22" [production]