851-900 of 10000 results (93ms)
2024-06-05 ยง
19:36 <jhathaway@cumin1002> START - Cookbook sre.ganeti.makevm for new host mx-in1001.wikimedia.org [production]
19:27 <ryankemper@cumin2002> END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet) [production]
19:09 <swfrench@deploy1002> helmfile [codfw] DONE helmfile.d/services/data-gateway: apply [production]
18:58 <swfrench@deploy1002> helmfile [codfw] START helmfile.d/services/data-gateway: apply [production]
18:53 <dduvall@deploy1002> rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.8 refs T361402 [production]
18:53 <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet) [production]
18:42 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64132 and previous config saved to /var/cache/conftool/dbconfig/20240605-184250-ladsgroup.json [production]
18:27 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P64131 and previous config saved to /var/cache/conftool/dbconfig/20240605-182742-ladsgroup.json [production]
18:13 <swfrench@deploy1002> helmfile [staging] DONE helmfile.d/services/data-gateway: apply [production]
18:12 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P64130 and previous config saved to /var/cache/conftool/dbconfig/20240605-181234-ladsgroup.json [production]
18:12 <swfrench@deploy1002> helmfile [staging] START helmfile.d/services/data-gateway: apply [production]
18:11 <aokoth@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1001.eqiad.wmnet [production]
18:07 <aokoth@cumin1002> START - Cookbook sre.hosts.reboot-single for host vrts1001.eqiad.wmnet [production]
18:06 <ryankemper@cumin2002> END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet) [production]
17:57 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64129 and previous config saved to /var/cache/conftool/dbconfig/20240605-175725-ladsgroup.json [production]
17:55 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64128 and previous config saved to /var/cache/conftool/dbconfig/20240605-175503-ladsgroup.json [production]
17:50 <kamila@cumin1002> START - Cookbook sre.hosts.dhcp for host wikikube-ctrl1001.eqiad.wmnet [production]
17:47 <marostegui@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet with reason: Maintenance [production]
17:47 <marostegui@cumin1002> START - Cookbook sre.hosts.downtime for 6:00:00 on db2199.codfw.wmnet with reason: Maintenance [production]
17:47 <marostegui@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2179 (T364299)', diff saved to https://phabricator.wikimedia.org/P64127 and previous config saved to /var/cache/conftool/dbconfig/20240605-174724-marostegui.json [production]
17:42 <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:1039256|Stop writing to pagelinks old columns in enwiki (T352010)]] (duration: 12m 19s) [production]
17:39 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P64126 and previous config saved to /var/cache/conftool/dbconfig/20240605-173954-ladsgroup.json [production]
17:33 <ladsgroup@deploy1002> ladsgroup: Continuing with sync [production]
17:32 <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:1039256|Stop writing to pagelinks old columns in enwiki (T352010)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
17:32 <marostegui@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P64125 and previous config saved to /var/cache/conftool/dbconfig/20240605-173216-marostegui.json [production]
17:31 <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet) [production]
17:29 <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:1039256|Stop writing to pagelinks old columns in enwiki (T352010)]] [production]
17:27 <kamila@cumin1002> END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001'] [production]
17:24 <ryankemper@cumin2002> END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet) [production]
17:24 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P64124 and previous config saved to /var/cache/conftool/dbconfig/20240605-172446-ladsgroup.json [production]
17:17 <marostegui@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P64123 and previous config saved to /var/cache/conftool/dbconfig/20240605-171708-marostegui.json [production]
17:13 <kamila@cumin1002> START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001'] [production]
17:12 <kamila@cumin1002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye [production]
17:10 <jhathaway> phabricator email now egressing via mx-out{1001,2001}.wikimedia.org, which should solve the SPF warnings in your inbox [production]
17:10 <dcaro@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1033.eqiad.wmnet [production]
17:09 <ladsgroup@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64122 and previous config saved to /var/cache/conftool/dbconfig/20240605-170938-ladsgroup.json [production]
17:06 <dzahn@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1007.eqiad.wmnet with reason: decom T353785 [production]
17:06 <dcaro@cumin1002> START - Cookbook sre.hosts.reboot-single for host cloudcephosd1033.eqiad.wmnet [production]
17:06 <dzahn@cumin1002> START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1007.eqiad.wmnet with reason: decom T353785 [production]
17:05 <dzahn@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1006.eqiad.wmnet with reason: decom T353785 [production]
17:05 <dzahn@cumin1002> START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1006.eqiad.wmnet with reason: decom T353785 [production]
17:04 <kamila@cumin1002> START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye [production]
17:02 <marostegui@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db2179 (T364299)', diff saved to https://phabricator.wikimedia.org/P64121 and previous config saved to /var/cache/conftool/dbconfig/20240605-170200-marostegui.json [production]
16:56 <kamila@cumin1002> END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001'] [production]
16:56 <dzahn@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1005.eqiad.wmnet with reason: decom T353785 [production]
16:56 <dzahn@cumin1002> START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1005.eqiad.wmnet with reason: decom T353785 [production]
16:54 <mutante> downtimed stat1004 for 10 days to avoid alerting spam during decom process - T353785 [production]
16:53 <dzahn@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1004.eqiad.wmnet with reason: decom T353785 [production]
16:53 <dzahn@cumin1002> START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1004.eqiad.wmnet with reason: decom T353785 [production]
16:52 <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:1038392|Bump XML dump schema to version 0.11 (T365155)]] (duration: 18m 23s) [production]