4401-4450 of 10000 results (72ms)
2022-08-03 ยง
18:15 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply [production]
18:12 <dancy@deploy1002> rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.23 refs T308076 [production]
17:58 <rzl@cumin1001> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubestage2002.codfw.wmnet [production]
17:58 <rzl@cumin1001> START - Cookbook sre.hosts.remove-downtime for kubestage2002.codfw.wmnet [production]
17:57 <rzl@cumin1001> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc[2025-2026].codfw.wmnet [production]
17:57 <rzl@cumin1001> START - Cookbook sre.hosts.remove-downtime for mc[2025-2026].codfw.wmnet [production]
17:57 <bking@cumin1001> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic2044.codfw.wmnet [production]
17:57 <bking@cumin1001> START - Cookbook sre.hosts.remove-downtime for elastic2044.codfw.wmnet [production]
17:56 <bking@cumin1001> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic2043.codfw.wmnet [production]
17:56 <bking@cumin1001> START - Cookbook sre.hosts.remove-downtime for elastic2043.codfw.wmnet [production]
17:55 <ottomata> increasing partitions from 5 to 6 for *.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite topics in Kafka main-eqiad and main-codfw - T314426 [production]
17:55 <mvernon@cumin1001> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2055.codfw.wmnet [production]
17:55 <mvernon@cumin1001> START - Cookbook sre.hosts.remove-downtime for ms-be2055.codfw.wmnet [production]
17:50 <rzl@cumin1001> conftool action : set/pooled=yes; selector: name=kubestage2002.codfw.wmnet [production]
17:38 <rzl@cumin1001> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[2008-2010].codfw.wmnet [production]
17:38 <rzl@cumin1001> START - Cookbook sre.hosts.remove-downtime for parse[2008-2010].codfw.wmnet [production]
17:23 <hnowlan@puppetmaster1001> conftool action : set/pooled=yes; selector: name=restbase20[12]4.codfw.wmnet [production]
17:14 <mvernon@cumin1001> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts [production]
17:14 <mvernon@cumin1001> START - Cookbook sre.hosts.remove-downtime for 6 hosts [production]
17:08 <ryankemper> T310145 `elastic2031` and `wcqs2002` powered off in preparation for C1 maintenance [production]
17:06 <jayme@cumin1001> conftool action : set/pooled=yes; selector: name=(kubernetes2020.codfw.wmnet|kubernetes2009.codfw.wmnet|kubernetes2010.codfw.wmnet) [production]
17:00 <btullis@cumin1001> END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. [production]
16:48 <Emperor> shutdown moss-fe2001.codfw.wmnet,ms-fe2011.codfw.wmnet,ms-be20[34,35,42,48,55,68].codfw.wmnet PDU work T310145 [production]
16:47 <mvernon@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 8 hosts with reason: PDU work [production]
16:47 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: in setup / flapping [production]
16:47 <mvernon@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 8 hosts with reason: PDU work [production]
16:47 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: in setup / flapping [production]
16:46 <mvernon@cumin1001> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet [production]
16:46 <mvernon@cumin1001> START - Cookbook sre.hosts.remove-downtime for ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet [production]
16:40 <jayme@cumin1001> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2046.codfw.wmnet [production]
16:40 <jayme@cumin1001> START - Cookbook sre.hosts.remove-downtime for mc2046.codfw.wmnet [production]
16:39 <jayme@cumin1001> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 10 hosts [production]
16:39 <jayme@cumin1001> START - Cookbook sre.hosts.remove-downtime for 10 hosts [production]
16:38 <jelto@cumin1001> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2023.codfw.wmnet [production]
16:38 <jelto@cumin1001> START - Cookbook sre.hosts.remove-downtime for mc2023.codfw.wmnet [production]
16:37 <jelto@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on gitlab-runner2002.codfw.wmnet with reason: PDU swap [production]
16:37 <jelto@cumin1001> START - Cookbook sre.hosts.downtime for 0:30:00 on gitlab-runner2002.codfw.wmnet with reason: PDU swap [production]
16:35 <jayme@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mc[2025-2026].codfw.wmnet with reason: PDU swap [production]
16:35 <jayme@cumin1001> START - Cookbook sre.hosts.downtime for 0:30:00 on mc[2025-2026].codfw.wmnet with reason: PDU swap [production]
16:32 <jelto> power off mc2025-2026 [production]
16:31 <jayme@cumin1001> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for rdb2008.codfw.wmnet [production]
16:30 <jayme@cumin1001> START - Cookbook sre.hosts.remove-downtime for rdb2008.codfw.wmnet [production]
16:28 <btullis@cumin1001> START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. [production]
16:28 <jayme@cumin1001> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes[2009-2010,2020].codfw.wmnet [production]
16:27 <jayme@cumin1001> START - Cookbook sre.hosts.remove-downtime for kubernetes[2009-2010,2020].codfw.wmnet [production]
16:11 <jelto@cumin1001> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 12 hosts [production]
16:11 <jelto@cumin1001> START - Cookbook sre.hosts.remove-downtime for 12 hosts [production]
16:08 <jayme@cumin1001> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 15 hosts [production]
16:08 <jayme@cumin1001> START - Cookbook sre.hosts.remove-downtime for 15 hosts [production]
16:08 <mvernon@cumin1001> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs[2005-2008].codfw.wmnet [production]