301-350 of 10000 results (24ms)
2021-05-03 ยง
21:54 <ryankemper> T280563 eqiad reboot failed with: `curator.exceptions.FailedExecution: Exception encountered. Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='search.svc.eqiad.wmnet', port=9243): Read timed out. (read timeout=10))` [production]
21:52 <ryankemper@cumin1001> END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563 [production]
21:47 <ryankemper> T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` [production]
21:46 <ryankemper@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563 [production]
21:32 <krinkle@deploy1002> Synchronized wmf-config/InitialiseSettings.php: d95b91648 (duration: 00m 58s) [production]
21:27 <ryankemper@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE [production]
21:25 <ryankemper@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE [production]
21:22 <ryankemper> [WDQS] `ryankemper@wdqs1003:~$ sudo pool` [production]
21:20 <ryankemper> T280382 [WDQS] `ryankemper@puppetmaster1001:~$ sudo confctl select 'name=wdqs1011.eqiad.wmnet' set/pooled=no` [production]
21:19 <ryankemper@puppetmaster1001> conftool action : set/pooled=no; selector: name=wdqs1011.eqiad.wmnet [production]
21:09 <ryankemper> T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1011.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` [production]
21:06 <ryankemper> T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage` [production]
21:05 <ryankemper@cumin1001> START - Cookbook sre.wdqs.data-transfer [production]
21:02 <ryankemper> T280382 `wdqs1010.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 975G 1.5T 39% /srv` [production]
20:56 <ryankemper> T280382 [WDQS] `ryankemper@wdqs2001:~$ sudo run-puppet-agent --force` [production]
20:44 <ryankemper@cumin1001> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [production]
20:42 <ryankemper@cumin1001> END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) [production]
20:37 <ryankemper> T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage` [production]
20:37 <ryankemper@cumin1001> START - Cookbook sre.wdqs.data-transfer [production]
19:40 <ryankemper@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE [production]
19:39 <ryankemper@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE [production]
19:24 <ryankemper> T280382 `sudo -i cookbook sre.wdqs.data-transfer --without-lvs --source wdqs1003.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage` [production]
19:24 <ryankemper@cumin1001> START - Cookbook sre.wdqs.data-transfer [production]
19:21 <ryankemper@puppetmaster1001> conftool action : set/pooled=no; selector: name=wdqs1004.eqiad.wmnet [production]
19:21 <ryankemper> T280382 [WDQS] `sudo confctl select 'name=wdqs1004.eqiad.wmnet' set/pooled=no` (`wdqs1004` failed re-image [not sure why yet] and won't let me ssh in to depool so using conftool instead) [production]
18:20 <Urbanecm> Morning B&C window done [production]
18:19 <urbanecm@deploy1002> Synchronized php-1.37.0-wmf.3/extensions/RelatedArticles/resources/ext.relatedArticles.readMore.bootstrap/index.js: cf9d9da3bf272d33c2d9b29d9172b1c81bfd8beb: Hotfix: loadRelatedArticles should consider existence of container element (T281547) (duration: 00m 57s) [production]
18:15 <urbanecm@deploy1002> Synchronized wmf-config/filebackend.php: bc1bc903169e4982c0c5a930094bed9f22616293: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads (T281650; 2/2) (duration: 00m 57s) [production]
18:14 <urbanecm@deploy1002> Synchronized wmf-config/CommonSettings.php: bc1bc903169e4982c0c5a930094bed9f22616293: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads (T281650; 1/2) (duration: 00m 58s) [production]
17:44 <ryankemper@cumin1001> END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563 [production]
17:20 <hashar> Restarting CI Jenkins due to "Gearman worker contint2001.wikimedia.org_manager" thread dieing unexpectedly # T281737 [production]
16:30 <ryankemper@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563 [production]
16:29 <ryankemper> T281498 `sudo confctl select 'name=wdqs2004.codfw.wmnet' set/pooled=yes:weight=10` after merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/684435 [production]
16:27 <ryankemper@puppetmaster1001> conftool action : set/pooled=yes:weight=10; selector: name=wdqs2004.codfw.wmnet [production]
16:19 <legoktm> legoktm@lists1001:~$ sudo apt install default-mysql-client # for temporary debugging [production]
15:48 <pt1979@cumin2001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
15:44 <pt1979@cumin2001> START - Cookbook sre.dns.netbox [production]
15:27 <Amir1> upgrade group A to mailman3 (T280322) [production]
14:27 <volans> uploaded conftool_1.3.1 to apt.wikimedia.org bullseye-wikimedia [production]
13:43 <volans> uploaded cumin_4.1.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia [production]
13:10 <Urbanecm> Run `User::newSystemUser( 'Maintenance script', [ 'steal' => true ] )` on cswiki to make the user a proper system user (T281703) [production]
12:36 <kostajh> Backport window done [production]
12:33 <kharlan@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:684378|GrowthExperiments: Set default variant (T278123)]] [[gerrit:684331|GrowthExperiments: enable link recommendations frontend on cswiki (T278710)]] (duration: 00m 57s) [production]
12:07 <kharlan@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:684327|GrowthExperiments: enable link recommendations backend on cswiki (T278710)]] (duration: 00m 57s) [production]
11:56 <kharlan@deploy1002> Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments: Backport: [[gerrit:684080|refreshLinkRecommendations.php: Use per-wiki locks]] [[gerrit:684078|Handle DB readonly errors (T281382)]] (duration: 00m 58s) [production]
11:15 <urbanecm@deploy1002> Synchronized php-1.37.0-wmf.3/extensions/Popups/: a438b641c81fa16faba287407012beaff8b1f3ba: Fix settings dialog offering ReferencePreviews when unavailable (T281352) (duration: 00m 58s) [production]
11:11 <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: c5a7c67b4daf33e0f9aaabec3f35ab6d4184894b: Set wgGEMentorshipMigrationStage to SCHEMA_COMPAT_NEW everywhere (T279853) (duration: 00m 57s) [production]
11:04 <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: f1a5ef0116c77b86b1abfb7bfa7d4ed363c69f61: wikidata: post edit constraint jobs on 70% of edits (T204031) (duration: 00m 57s) [production]
10:59 <moritzm> installing avahi security updates on buster [production]
10:47 <jdrewniak@deploy1002> Synchronized portals: Wikimedia Portals Update: [[gerrit:684302| Bumping portals to master (T128546)]] (duration: 00m 57s) [production]