2021-05-03
ยง
|
21:27 |
<ryankemper@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE |
[production] |
21:25 |
<ryankemper@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE |
[production] |
21:22 |
<ryankemper> |
[WDQS] `ryankemper@wdqs1003:~$ sudo pool` |
[production] |
21:20 |
<ryankemper> |
T280382 [WDQS] `ryankemper@puppetmaster1001:~$ sudo confctl select 'name=wdqs1011.eqiad.wmnet' set/pooled=no` |
[production] |
21:19 |
<ryankemper@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=wdqs1011.eqiad.wmnet |
[production] |
21:09 |
<ryankemper> |
T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1011.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` |
[production] |
21:06 |
<ryankemper> |
T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage` |
[production] |
21:05 |
<ryankemper@cumin1001> |
START - Cookbook sre.wdqs.data-transfer |
[production] |
21:02 |
<ryankemper> |
T280382 `wdqs1010.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 975G 1.5T 39% /srv` |
[production] |
20:56 |
<ryankemper> |
T280382 [WDQS] `ryankemper@wdqs2001:~$ sudo run-puppet-agent --force` |
[production] |
20:44 |
<ryankemper@cumin1001> |
END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) |
[production] |
20:42 |
<ryankemper@cumin1001> |
END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) |
[production] |
20:37 |
<ryankemper> |
T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage` |
[production] |
20:37 |
<ryankemper@cumin1001> |
START - Cookbook sre.wdqs.data-transfer |
[production] |
19:42 |
<James_F> |
Zuul: [mediawiki/services/image-suggestion-api] Publish images post-merge T281256 |
[releng] |
19:40 |
<ryankemper@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE |
[production] |
19:39 |
<ryankemper@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE |
[production] |
19:35 |
<wm-bot> |
<lucaswerkmeister> deployed b159dd1060 (l10n updates) |
[tools.lexeme-forms] |
19:24 |
<ryankemper> |
T280382 `sudo -i cookbook sre.wdqs.data-transfer --without-lvs --source wdqs1003.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage` |
[production] |
19:24 |
<ryankemper@cumin1001> |
START - Cookbook sre.wdqs.data-transfer |
[production] |
19:21 |
<ryankemper@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=wdqs1004.eqiad.wmnet |
[production] |
19:21 |
<ryankemper> |
T280382 [WDQS] `sudo confctl select 'name=wdqs1004.eqiad.wmnet' set/pooled=no` (`wdqs1004` failed re-image [not sure why yet] and won't let me ssh in to depool so using conftool instead) |
[production] |
18:20 |
<Urbanecm> |
Morning B&C window done |
[production] |
18:19 |
<urbanecm@deploy1002> |
Synchronized php-1.37.0-wmf.3/extensions/RelatedArticles/resources/ext.relatedArticles.readMore.bootstrap/index.js: cf9d9da3bf272d33c2d9b29d9172b1c81bfd8beb: Hotfix: loadRelatedArticles should consider existence of container element (T281547) (duration: 00m 57s) |
[production] |
18:15 |
<urbanecm@deploy1002> |
Synchronized wmf-config/filebackend.php: bc1bc903169e4982c0c5a930094bed9f22616293: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads (T281650; 2/2) (duration: 00m 57s) |
[production] |
18:14 |
<urbanecm@deploy1002> |
Synchronized wmf-config/CommonSettings.php: bc1bc903169e4982c0c5a930094bed9f22616293: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads (T281650; 1/2) (duration: 00m 58s) |
[production] |
17:44 |
<ryankemper@cumin1001> |
END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563 |
[production] |
17:20 |
<hashar> |
Restarting CI Jenkins due to "Gearman worker contint2001.wikimedia.org_manager" thread dieing unexpectedly # T281737 |
[production] |
17:05 |
<James_F> |
Docker: Publishing quibble-buster images with python3-distutils so quibble can build |
[releng] |
16:34 |
<wm-bot> |
Safe reboot of 'cloudvirt1023.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus |
[admin] |
16:30 |
<ryankemper@cumin1001> |
START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563 |
[production] |
16:29 |
<ryankemper> |
T281498 `sudo confctl select 'name=wdqs2004.codfw.wmnet' set/pooled=yes:weight=10` after merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/684435 |
[production] |
16:29 |
<wm-bot> |
Safe rebooting 'cloudvirt1023.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus |
[admin] |
16:27 |
<ryankemper@puppetmaster1001> |
conftool action : set/pooled=yes:weight=10; selector: name=wdqs2004.codfw.wmnet |
[production] |
16:23 |
<dcaro> |
started tools-sgeexec-0907, was stuck on initramfs due to an unclean fs (/dev/vda3, root), ran fsck manually fixing all the errors and booted up correctly after (T280641) |
[tools] |
16:19 |
<legoktm> |
legoktm@lists1001:~$ sudo apt install default-mysql-client # for temporary debugging |
[production] |
16:07 |
<James_F> |
Zuul: Add Luca Mauri to the CI allow list |
[releng] |
15:48 |
<pt1979@cumin2001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
15:44 |
<pt1979@cumin2001> |
START - Cookbook sre.dns.netbox |
[production] |
15:41 |
<wm-bot> |
Safe rebooting 'cloudvirt1023.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus |
[admin] |
15:41 |
<wm-bot> |
Safe reboot of 'cloudvirt1022.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus |
[admin] |
15:27 |
<Amir1> |
upgrade group A to mailman3 (T280322) |
[production] |
15:13 |
<wm-bot> |
Safe rebooting 'cloudvirt1022.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus |
[admin] |
14:27 |
<volans> |
uploaded conftool_1.3.1 to apt.wikimedia.org bullseye-wikimedia |
[production] |
14:23 |
<ottomata> |
stopping all venv based jupyter singleuser servers - T262847 |
[analytics] |
14:07 |
<dcaro> |
depooling tols-sgeexec-0908/7 to be able to restart the VMs as they got stuck during migration (T280641) |
[tools] |
13:59 |
<ottomata> |
dropped all obselete (upper cased location) event_santizied.*_T280813 tables created for T280813 |
[analytics] |
13:55 |
<CFisch_WMDE> |
enable new search features for the template dialog (T271802) |
[deployment-prep] |
13:55 |
<CFisch_WMDE> |
enable new search features for the template dialog (T271802) |
[releng] |
13:43 |
<volans> |
uploaded cumin_4.1.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia |
[production] |