4101-4150 of 10000 results (84ms)
2022-08-30 ยง
10:15 <jmm@cumin2002> START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2001.codfw.wmnet with reason: Switch instance to plain disks, T311686 [production]
10:11 <filippo@cumin1001> START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet [production]
10:08 <filippo@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2003.codfw.wmnet [production]
10:08 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [production]
10:08 <filippo@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2002.wikimedia.org [production]
10:07 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply [production]
10:07 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [production]
10:06 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply [production]
10:03 <marostegui@deploy1002> Synchronized wmf-config/ProductionServices.php: Promote pc1011 to pc1 master (duration: 03m 44s) [production]
10:03 <filippo@cumin1001> START - Cookbook sre.hosts.reboot-single for host netmon2002.wikimedia.org [production]
10:02 <filippo@cumin1001> START - Cookbook sre.hosts.reboot-single for host graphite2003.codfw.wmnet [production]
10:01 <filippo@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet [production]
09:55 <filippo@cumin1001> START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet [production]
09:53 <dcausse@deploy1002> Finished deploy [wikimedia/discovery/analytics@ff76338]: Add sd-alerts notifications to image_suggestions_weekly (duration: 02m 05s) [production]
09:53 <filippo@cumin1001> END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host centrallog2002.codfw.wmnet [production]
09:53 <filippo@cumin1001> START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet [production]
09:51 <dcausse@deploy1002> Started deploy [wikimedia/discovery/analytics@ff76338]: Add sd-alerts notifications to image_suggestions_weekly [production]
09:51 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2001.codfw.wmnet with reason: Switch instance to DRBD, T311686 [production]
09:51 <jmm@cumin2002> START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2001.codfw.wmnet with reason: Switch instance to DRBD, T311686 [production]
09:41 <ladsgroup@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:827953|Stop writing to old templatelinks fields in s6 (T312865)]] (duration: 03m 57s) [production]
09:41 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [production]
09:40 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply [production]
09:40 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [production]
09:39 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply [production]
09:34 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [production]
09:34 <marostegui@deploy1002> Synchronized wmf-config/ProductionServices.php: Promote pc1014 to pc1 master (duration: 03m 50s) [production]
09:33 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply [production]
09:33 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [production]
09:32 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply [production]
09:31 <moritzm> draining ganeti2022 for eventual reimage T311686 [production]
09:18 <moritzm> installing perf updates on Bullseye hosts [production]
09:12 <moritzm> upgrading ganeti2027,ganeti2028 to 3.0.2 T312637 [production]
09:07 <jynus> restart dbprov* hosts [production]
08:58 <_joe_> powercycling parse1002, blank console [production]
08:58 <moritzm> upgrading ganeti2010,ganeti2012,ganeti2024 to 3.0.2 T312637 [production]
08:53 <moritzm> failover Ganeti master in codfw to ganeti2020 T311686 [production]
08:49 <marostegui@cumin1001> dbctl commit (dc=all): 'Give some weight to current x1 codfw master', diff saved to https://phabricator.wikimedia.org/P33661 and previous config saved to /var/cache/conftool/dbconfig/20220830-084945-root.json [production]
08:41 <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2096.codfw.wmnet with reason: Maintenance [production]
08:41 <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 12:00:00 on db2096.codfw.wmnet with reason: Maintenance [production]
08:38 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db2096 T316522', diff saved to https://phabricator.wikimedia.org/P33660 and previous config saved to /var/cache/conftool/dbconfig/20220830-083845-root.json [production]
08:36 <marostegui@cumin1001> dbctl commit (dc=all): 'Promote db2115 to x1 codfw primary T316522', diff saved to https://phabricator.wikimedia.org/P33659 and previous config saved to /var/cache/conftool/dbconfig/20220830-083654-root.json [production]
08:36 <marostegui> Starting x1 codfw failover from db2096 to db2115 - T316522 [production]
08:31 <marostegui@cumin1001> dbctl commit (dc=all): 'Set db2115 with weight 0 T316522', diff saved to https://phabricator.wikimedia.org/P33658 and previous config saved to /var/cache/conftool/dbconfig/20220830-083103-root.json [production]
08:30 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: DC switchover x1 T316522 [production]
08:30 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: DC switchover x1 T316522 [production]
08:24 <vgutierrez> ATS: enforce per-request timeout globally (205 secs) - T315533 [production]
07:37 <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance [production]
07:37 <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance [production]
07:20 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [production]
07:18 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply [production]