2851-2900 of 10000 results (120ms)
2024-06-05 ยง
17:32 <marostegui@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P64125 and previous config saved to /var/cache/conftool/dbconfig/20240605-173216-marostegui.json [production]
17:31 <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet) [production]
17:29 <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:1039256|Stop writing to pagelinks old columns in enwiki (T352010)]] [production]
17:27 <kamila@cumin1002> END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001'] [production]
17:24 <ryankemper@cumin2002> END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet) [production]
17:24 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P64124 and previous config saved to /var/cache/conftool/dbconfig/20240605-172446-ladsgroup.json [production]
17:17 <marostegui@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P64123 and previous config saved to /var/cache/conftool/dbconfig/20240605-171708-marostegui.json [production]
17:13 <kamila@cumin1002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001'] [production]
17:12 <kamila@cumin1002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye [production]
17:10 <jhathaway> phabricator email now egressing via mx-out{1001,2001}.wikimedia.org, which should solve the SPF warnings in your inbox [production]
17:10 <dcaro@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1033.eqiad.wmnet [production]
17:09 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64122 and previous config saved to /var/cache/conftool/dbconfig/20240605-170938-ladsgroup.json [production]
17:06 <dzahn@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1007.eqiad.wmnet with reason: decom T353785 [production]
17:06 <dcaro@cumin1002> START - Cookbook sre.hosts.reboot-single for host cloudcephosd1033.eqiad.wmnet [production]
17:06 <dzahn@cumin1002> START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1007.eqiad.wmnet with reason: decom T353785 [production]
17:05 <dzahn@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1006.eqiad.wmnet with reason: decom T353785 [production]
17:05 <dzahn@cumin1002> START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1006.eqiad.wmnet with reason: decom T353785 [production]
17:04 <kamila@cumin1002> START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye [production]
17:02 <marostegui@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2179 (T364299)', diff saved to https://phabricator.wikimedia.org/P64121 and previous config saved to /var/cache/conftool/dbconfig/20240605-170200-marostegui.json [production]
16:56 <kamila@cumin1002> END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001'] [production]
16:56 <dzahn@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1005.eqiad.wmnet with reason: decom T353785 [production]
16:56 <dzahn@cumin1002> START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1005.eqiad.wmnet with reason: decom T353785 [production]
16:54 <mutante> downtimed stat1004 for 10 days to avoid alerting spam during decom process - T353785 [production]
16:53 <dzahn@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1004.eqiad.wmnet with reason: decom T353785 [production]
16:53 <dzahn@cumin1002> START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1004.eqiad.wmnet with reason: decom T353785 [production]
16:52 <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:1038392|Bump XML dump schema to version 0.11 (T365155)]] (duration: 18m 23s) [production]
16:48 <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet) [production]
16:46 <ladsgroup@cumin1002> dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64120 and previous config saved to /var/cache/conftool/dbconfig/20240605-164635-ladsgroup.json [production]
16:46 <kamila@cumin1002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001'] [production]
16:45 <kamila@cumin1002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye [production]
16:43 <ladsgroup@deploy1002> ladsgroup and dr0ptp4kt: Continuing with sync [production]
16:40 <jayme@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage1003.eqiad.wmnet [production]
16:38 <kamila@cumin1002> START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye [production]
16:36 <ladsgroup@deploy1002> ladsgroup and dr0ptp4kt: Backport for [[gerrit:1038392|Bump XML dump schema to version 0.11 (T365155)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
16:34 <kamila@cumin1002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye [production]
16:34 <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:1038392|Bump XML dump schema to version 0.11 (T365155)]] [production]
16:32 <jayme@cumin1002> START - Cookbook sre.hosts.reboot-single for host kubestage1003.eqiad.wmnet [production]
16:31 <ladsgroup@cumin1002> dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64119 and previous config saved to /var/cache/conftool/dbconfig/20240605-163129-ladsgroup.json [production]
16:20 <jforrester@deploy1002> helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply [production]
16:18 <jforrester@deploy1002> helmfile [eqiad] START helmfile.d/services/wikifunctions: apply [production]
16:18 <dcaro@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1032.eqiad.wmnet [production]
16:18 <jforrester@deploy1002> helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply [production]
16:16 <ladsgroup@cumin1002> dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: Maint over', diff saved to https://phabricator.wikimedia.org/P64118 and previous config saved to /var/cache/conftool/dbconfig/20240605-161622-ladsgroup.json [production]
16:16 <jforrester@deploy1002> helmfile [codfw] START helmfile.d/services/wikifunctions: apply [production]
16:15 <jforrester@deploy1002> helmfile [staging] DONE helmfile.d/services/wikifunctions: apply [production]
16:14 <jforrester@deploy1002> helmfile [staging] START helmfile.d/services/wikifunctions: apply [production]
16:12 <dcaro@cumin1002> START - Cookbook sre.hosts.reboot-single for host cloudcephosd1032.eqiad.wmnet [production]
16:11 <jforrester@deploy1002> helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply [production]
16:10 <kamila@cumin1002> START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye [production]
16:10 <jforrester@deploy1002> helmfile [eqiad] START helmfile.d/services/wikifunctions: apply [production]