1101-1150 of 10000 results (100ms)
2024-08-16 §
10:14 <jayme@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1008.eqiad.wmnet with OS bullseye [production]
10:10 <jayme@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1006.eqiad.wmnet with OS bullseye [production]
10:05 <jayme@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1007.eqiad.wmnet with reason: host reimage [production]
10:02 <jayme@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1009.eqiad.wmnet with reason: host reimage [production]
09:58 <jayme@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1010.eqiad.wmnet with reason: host reimage [production]
09:58 <klausman@deploy1003> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. [production]
09:57 <klausman@deploy1003> helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. [production]
09:56 <jayme@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1008.eqiad.wmnet with reason: host reimage [production]
09:53 <jayme@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1010.eqiad.wmnet with reason: host reimage [production]
09:53 <jayme@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1006.eqiad.wmnet with reason: host reimage [production]
09:51 <jayme@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1009.eqiad.wmnet with reason: host reimage [production]
09:51 <jayme@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1008.eqiad.wmnet with reason: host reimage [production]
09:51 <jayme@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1007.eqiad.wmnet with reason: host reimage [production]
09:50 <jayme@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1006.eqiad.wmnet with reason: host reimage [production]
09:50 <hnowlan@deploy1003> helmfile [codfw] DONE helmfile.d/services/thumbor: sync [production]
09:46 <hnowlan@deploy1003> helmfile [codfw] START helmfile.d/services/thumbor: sync [production]
09:44 <klausman@deploy1003> helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. [production]
09:43 <klausman@deploy1003> helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. [production]
09:35 <jayme@cumin1002> START - Cookbook sre.hosts.reimage for host kafka-main1010.eqiad.wmnet with OS bullseye [production]
09:35 <jayme@cumin1002> START - Cookbook sre.hosts.reimage for host kafka-main1009.eqiad.wmnet with OS bullseye [production]
09:34 <jayme@cumin1002> START - Cookbook sre.hosts.reimage for host kafka-main1008.eqiad.wmnet with OS bullseye [production]
09:34 <jayme@cumin1002> START - Cookbook sre.hosts.reimage for host kafka-main1007.eqiad.wmnet with OS bullseye [production]
09:33 <jayme@cumin1002> START - Cookbook sre.hosts.reimage for host kafka-main1006.eqiad.wmnet with OS bullseye [production]
09:30 <klausman@deploy1003> helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. [production]
09:29 <klausman@deploy1003> helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. [production]
09:23 <pfischer@deploy1003> helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply [production]
09:23 <pfischer@deploy1003> helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply [production]
08:52 <jayme@cumin1002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1010.eqiad.wmnet with OS bullseye [production]
08:50 <jayme@cumin1002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1009.eqiad.wmnet with OS bullseye [production]
08:49 <jayme@cumin1002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1008.eqiad.wmnet with OS bullseye [production]
08:48 <jayme@cumin1002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1007.eqiad.wmnet with OS bullseye [production]
08:47 <jayme@cumin1002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1006.eqiad.wmnet with OS bullseye [production]
08:20 <pfischer@deploy1003> helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply [production]
08:20 <pfischer@deploy1003> helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply [production]
08:05 <jayme@cumin1002> START - Cookbook sre.hosts.reimage for host kafka-main1010.eqiad.wmnet with OS bullseye [production]
08:03 <jayme@cumin1002> START - Cookbook sre.hosts.reimage for host kafka-main1009.eqiad.wmnet with OS bullseye [production]
08:02 <jayme@cumin1002> START - Cookbook sre.hosts.reimage for host kafka-main1008.eqiad.wmnet with OS bullseye [production]
08:01 <jayme@cumin1002> START - Cookbook sre.hosts.reimage for host kafka-main1007.eqiad.wmnet with OS bullseye [production]
08:00 <jayme@cumin1002> START - Cookbook sre.hosts.reimage for host kafka-main1006.eqiad.wmnet with OS bullseye [production]
07:43 <XioNoX> deploy pfw policy update 1723675086 - T372520 [production]
07:40 <jayme@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main2007.codfw.wmnet with OS bullseye [production]
07:23 <jayme@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2007.codfw.wmnet with reason: host reimage [production]
07:20 <jayme@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2007.codfw.wmnet with reason: host reimage [production]
07:01 <jayme@cumin1002> START - Cookbook sre.hosts.reimage for host kafka-main2007.codfw.wmnet with OS bullseye [production]
06:56 <marostegui@cumin1002> dbctl commit (dc=all): 'Repool db2136 - running 10.11', diff saved to https://phabricator.wikimedia.org/P67345 and previous config saved to /var/cache/conftool/dbconfig/20240816-065606-marostegui.json [production]
06:42 <marostegui@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2152.codfw.wmnet with reason: Schema change [production]
06:42 <marostegui@cumin1002> START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2152.codfw.wmnet with reason: Schema change [production]
2024-08-15 §
23:30 <jhancock@cumin2002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm [production]
23:10 <xSavitar> T372449 mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Philip Federici' 'FilippoFederici' --ignorestatus [production]
22:42 <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling neither afterwards [production]