6801-6850 of 10000 results (103ms)
2023-10-17 ยง
09:42 <btullis@cumin1001> END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host an-airflow1007.eqiad.wmnet [production]
09:42 <btullis@cumin1001> START - Cookbook sre.hosts.reboot-single for host an-db1001.eqiad.wmnet [production]
09:36 <mfossati@deploy2002> Finished deploy [airflow-dags/platform_eng@b010dae]: (no justification provided) (duration: 00m 46s) [production]
09:35 <mfossati@deploy2002> Started deploy [airflow-dags/platform_eng@b010dae]: (no justification provided) [production]
09:33 <btullis@cumin1001> START - Cookbook sre.hosts.reboot-single for host an-airflow1007.eqiad.wmnet [production]
09:33 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1002.eqiad.wmnet [production]
09:28 <btullis@cumin1001> START - Cookbook sre.hosts.reboot-single for host an-airflow1002.eqiad.wmnet [production]
09:28 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1005.eqiad.wmnet [production]
09:26 <arnaudb@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance [production]
09:26 <arnaudb@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance [production]
09:24 <btullis@cumin1001> START - Cookbook sre.hosts.reboot-single for host an-airflow1005.eqiad.wmnet [production]
09:24 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1004.eqiad.wmnet [production]
09:21 <arnaudb@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance [production]
09:20 <arnaudb@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance [production]
09:20 <btullis@cumin1001> START - Cookbook sre.hosts.reboot-single for host an-airflow1004.eqiad.wmnet [production]
09:17 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1006.eqiad.wmnet [production]
09:15 <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance [production]
09:14 <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance [production]
09:13 <btullis@cumin1001> START - Cookbook sre.hosts.reboot-single for host an-airflow1006.eqiad.wmnet [production]
09:12 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet with reason: Rebooting Airflow instances for T344671 [production]
09:12 <btullis@cumin1001> START - Cookbook sre.hosts.downtime for 0:20:00 on an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet with reason: Rebooting Airflow instances for T344671 [production]
08:38 <kartik@deploy2002> helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply [production]
08:35 <kartik@deploy2002> helmfile [codfw] START helmfile.d/services/machinetranslation: apply [production]
08:32 <XioNoX> push pfw policies - T348576 [production]
07:26 <hashar@deploy2002> Finished deploy [gerrit/gerrit@578be93]: wm-checks-api: filter out Zuul start messages | T348920 (duration: 00m 07s) [production]
07:26 <hashar@deploy2002> Started deploy [gerrit/gerrit@578be93]: wm-checks-api: filter out Zuul start messages | T348920 [production]
07:23 <hashar@deploy2002> Finished deploy [gerrit/gerrit@1153a16]: wm-checks-api: filter out Zuul start messages | T348920 (duration: 00m 05s) [production]
07:22 <hashar@deploy2002> Started deploy [gerrit/gerrit@1153a16]: wm-checks-api: filter out Zuul start messages | T348920 [production]
06:06 <isaranto@deploy2002> helmfile [eqiad] START helmfile.d/services/api-gateway: sync [production]
06:06 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db2161 T349053', diff saved to https://phabricator.wikimedia.org/P52986 and previous config saved to /var/cache/conftool/dbconfig/20231017-060214-root.json [production]
06:06 <isaranto@deploy2002> helmfile [staging] DONE helmfile.d/services/api-gateway: sync [production]
06:02 <isaranto@deploy2002> helmfile [staging] START helmfile.d/services/api-gateway: sync [production]
06:00 <marostegui@cumin1001> dbctl commit (dc=all): 'Promote db2165 to s8 primary and set section read-write T349053', diff saved to https://phabricator.wikimedia.org/P52985 and previous config saved to /var/cache/conftool/dbconfig/20231017-060047-root.json [production]
06:00 <marostegui@cumin1001> dbctl commit (dc=all): 'Set s8 codfw as read-only for maintenance - T349053', diff saved to https://phabricator.wikimedia.org/P52984 and previous config saved to /var/cache/conftool/dbconfig/20231017-060021-root.json [production]
06:00 <marostegui> Starting s8 codfw failover from db2161 to db2165 - T349053 [production]
05:59 <kart_> Update MinT to 2023-10-16-101614-production (T333969, T336683, T348097) [production]
05:36 <kartik@deploy2002> helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply [production]
05:36 <kartik@deploy2002> helmfile [codfw] START helmfile.d/services/machinetranslation: apply [production]
05:31 <kartik@deploy2002> helmfile [codfw] START helmfile.d/services/machinetranslation: apply [production]
05:29 <kartik@deploy2002> helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply [production]
05:19 <kartik@deploy2002> helmfile [staging] START helmfile.d/services/machinetranslation: apply [production]
05:17 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s8 T349053 [production]
05:17 <marostegui@cumin1001> dbctl commit (dc=all): 'Set db2165 with weight 0 T349053', diff saved to https://phabricator.wikimedia.org/P52983 and previous config saved to /var/cache/conftool/dbconfig/20231017-051723-root.json [production]
05:17 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s8 T349053 [production]
03:55 <mwpresync@deploy2002> Pruned MediaWiki: 1.41.0-wmf.29 (duration: 02m 15s) [production]
03:53 <mwpresync@deploy2002> Finished scap: testwikis wikis to 1.42.0-wmf.1 refs T348354 (duration: 50m 15s) [production]
03:02 <mwpresync@deploy2002> Started scap: testwikis wikis to 1.42.0-wmf.1 refs T348354 [production]
02:10 <arnaudb@cumin1001> dbctl commit (dc=all): 'Depooling db2176 (T343198)', diff saved to https://phabricator.wikimedia.org/P52982 and previous config saved to /var/cache/conftool/dbconfig/20231017-021040-arnaudb.json [production]
02:10 <arnaudb@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance [production]
02:10 <arnaudb@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance [production]