5301-5350 of 10000 results (85ms)
2022-12-15 §
09:08 <hashar@deploy1002> rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.14 refs T320519 [production]
08:53 <akosiaris> reboot rdb2009 for kernel upgrades [production]
08:52 <akosiaris> correction: reboot rdb1011 for kernel upgrades [production]
08:51 <akosiaris> reboot rdb1007 for kernel upgrades [production]
08:51 <akosiaris> nothing noticed with rdb1007 reboot for mw, jobqueue, api-gateway. changeprop had a minor backlog increase, but everything appears fine now. [production]
08:28 <akosiaris> reboot rdb1009 for kernel upgrades. possibly (but probably not) affected applications: changeprop, cpjobqueue, api-gateway, redisLockManager [production]
08:13 <kartik@deploy1002> Finished scap: Backport for [[gerrit:868215|Enable Section Translation on 6 WPs (T319177)]] (duration: 10m 55s) [production]
08:08 <jmm@cumin2002> END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host puppetdb2003.codfw.wmnet [production]
08:04 <kartik@deploy1002> kartik and kartik: Backport for [[gerrit:868215|Enable Section Translation on 6 WPs (T319177)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet [production]
08:03 <kartik@deploy1002> Started scap: Backport for [[gerrit:868215|Enable Section Translation on 6 WPs (T319177)]] [production]
07:57 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host puppetdb2003.codfw.wmnet [production]
01:46 <cwhite@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2026.codfw.wmnet with OS bullseye [production]
00:58 <cwhite@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2026.codfw.wmnet with reason: host reimage [production]
00:55 <cwhite@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2026.codfw.wmnet with reason: host reimage [production]
00:32 <mutante> releases1002 - rebooting [production]
00:30 <mutante> releases2002 - rebooting [production]
00:19 <cwhite@cumin2002> START - Cookbook sre.hosts.reimage for host logstash2026.codfw.wmnet with OS bullseye [production]
00:19 <cwhite@cumin2002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash2026.codfw.wmnet with OS bullseye [production]
00:15 <cwhite@cumin2002> START - Cookbook sre.hosts.reimage for host logstash2026.codfw.wmnet with OS bullseye [production]
00:14 <cwhite@cumin2002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logstash2026.codfw.wmnet with OS bullseye [production]
00:05 <tgr> EU late backports done [production]
00:05 <tgr@deploy1002> Synchronized php-1.40.0-wmf.14/extensions/GrowthExperiments/: Backport: [[gerrit:868052|User impact: read edit count from primary db in save complete hook (T324930)]] (duration: 07m 03s) [production]
2022-12-14 §
23:50 <cwhite@cumin2002> START - Cookbook sre.hosts.reimage for host logstash2026.codfw.wmnet with OS bullseye [production]
23:48 <cwhite@cumin2002> END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2026'] [production]
23:44 <ejegg> civicrm upgraded from a1c2630a to 98b48b9a [production]
23:41 <cwhite@cumin2002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2026'] [production]
23:40 <cwhite@cumin2002> END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['logstash2026'] [production]
23:33 <cwhite@cumin2002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash2026'] [production]
23:29 <ryankemper> [WDQS] Downtimed wdqs20[09-12] for the next 7 days [production]
23:28 <ryankemper> T301167 wdqs2011/2012 were not visible in pybal (oversight from when I added the other hosts with conftool last week). Fixed that, so now all of the new hosts are showing up properly. [production]
23:27 <ryankemper@puppetmaster1001> conftool action : set/weight=10:pooled=no; selector: name=wdqs2012.* [production]
23:27 <ryankemper@puppetmaster1001> conftool action : set/weight=10:pooled=no; selector: name=wdqs2011.* [production]
23:14 <bd808> Toolhub: rebuilding search indices following app update [production]
23:12 <bd808@deploy1002> helmfile [eqiad] DONE helmfile.d/services/toolhub: apply [production]
23:10 <denisse@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on alert2001.wikimedia.org with reason: kernel update [production]
23:10 <bd808@deploy1002> helmfile [eqiad] START helmfile.d/services/toolhub: apply [production]
23:10 <denisse@cumin1001> START - Cookbook sre.hosts.downtime for 0:10:00 on alert2001.wikimedia.org with reason: kernel update [production]
23:04 <bd808@deploy1002> helmfile [codfw] DONE helmfile.d/services/toolhub: apply [production]
23:03 <denisse@cumin1001> END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert2001.wikimedia.org [production]
23:03 <denisse@cumin1001> START - Cookbook sre.hosts.reboot-single for host alert2001.wikimedia.org [production]
23:03 <bd808@deploy1002> helmfile [codfw] START helmfile.d/services/toolhub: apply [production]
23:01 <bd808@deploy1002> helmfile [staging] DONE helmfile.d/services/toolhub: apply [production]
22:59 <bd808@deploy1002> helmfile [staging] START helmfile.d/services/toolhub: apply [production]
22:56 <tgr> doing the last backport by hand due to T325252 [production]
22:49 <tgr@deploy1002> Finished scap: Backport for [[gerrit:868047|NewImpact: Add log event for clicking suggested edits button (T325041)]], [[gerrit:868051|UserEditTracker: Allow querying primary DB for edit timestamp]] (duration: 11m 37s) [production]
22:46 <denisse@cumin1001> END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host alert1001.wikimedia.org [production]
22:39 <tgr@deploy1002> tgr and kharlan and tgr: Backport for [[gerrit:868047|NewImpact: Add log event for clicking suggested edits button (T325041)]], [[gerrit:868051|UserEditTracker: Allow querying primary DB for edit timestamp]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet [production]
22:37 <tgr@deploy1002> Started scap: Backport for [[gerrit:868047|NewImpact: Add log event for clicking suggested edits button (T325041)]], [[gerrit:868051|UserEditTracker: Allow querying primary DB for edit timestamp]] [production]
22:36 <bking@cumin2002> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on wdqs2009.codfw.wmnet with reason: NFS troubleshooting [production]
22:36 <bking@cumin2002> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2009.codfw.wmnet with reason: NFS troubleshooting [production]