901-950 of 10000 results (35ms)
2023-10-12 ยง
11:59 <taavi@cloudcumin1001> START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors [toolsbeta]
11:52 <taavi> reboot tools-sgeweblight-10-22, 28 [tools]
11:49 <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance [production]
11:49 <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance [production]
11:49 <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance [production]
11:49 <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance [production]
11:41 <taavi> configure keepalived ip for main project-proxy service T316982 [project-proxy]
11:37 <jayme@deploy2002> helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply [production]
11:36 <jayme@deploy2002> helmfile [eqiad] START helmfile.d/services/wikifunctions: apply [production]
11:34 <jayme@deploy2002> helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply [production]
11:33 <jayme@deploy2002> helmfile [codfw] START helmfile.d/services/wikifunctions: apply [production]
11:30 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: testing [production]
11:30 <jmm@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: testing [production]
11:27 <arnaudb@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance [production]
11:27 <arnaudb@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance [production]
11:21 <jayme@deploy2002> helmfile [staging] DONE helmfile.d/services/wikifunctions: apply [production]
11:20 <jayme@deploy2002> helmfile [staging] START helmfile.d/services/wikifunctions: apply [production]
11:05 <taavi> resize proxy-04 g3.cores2.ram4.disk20 to match proxy-03 [project-proxy]
10:52 <elukey@deploy2002> helmfile [staging] DONE helmfile.d/services/eventstreams: sync [production]
10:51 <elukey@deploy2002> helmfile [staging] START helmfile.d/services/eventstreams: sync [production]
10:50 <elukey@deploy2002> helmfile [staging] DONE helmfile.d/services/eventstreams-internal: sync [production]
10:49 <elukey@deploy2002> helmfile [staging] START helmfile.d/services/eventstreams-internal: sync [production]
10:38 <elukey> delete ores proxy and instance in deployment-prep - T347278 [releng]
10:26 <elukey@deploy2002> helmfile [staging] DONE helmfile.d/services/eventstreams-internal: sync [production]
10:26 <elukey@deploy2002> helmfile [staging] START helmfile.d/services/eventstreams-internal: sync [production]
10:26 <elukey@deploy2002> helmfile [staging] DONE helmfile.d/services/eventstreams-internal: sync [production]
10:15 <elukey@deploy2002> helmfile [staging] START helmfile.d/services/eventstreams-internal: sync [production]
10:13 <elukey@deploy2002> helmfile [staging] DONE helmfile.d/services/eventstreams-internal: sync [production]
10:03 <elukey@deploy2002> helmfile [staging] START helmfile.d/services/eventstreams-internal: sync [production]
09:40 <fabfur> repooling cp4040 (depooled for T347837 and forgot) [production]
09:37 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1002.eqiad.wmnet [production]
09:31 <btullis> rebooting an-coord1002 for T344671 [analytics]
09:31 <btullis@cumin1001> START - Cookbook sre.hosts.reboot-single for host an-coord1002.eqiad.wmnet [production]
09:31 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for an-master1002.eqiad.wmnet [production]
09:31 <btullis@cumin1001> START - Cookbook sre.hosts.remove-downtime for an-master1002.eqiad.wmnet [production]
09:18 <btullis> power cycling an-master1002 to address unresponsiveness [analytics]
09:17 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on an-master1002.eqiad.wmnet with reason: Rebooting misbehaving an-master1002 [production]
09:16 <btullis@cumin1001> START - Cookbook sre.hosts.downtime for 0:20:00 on an-master1002.eqiad.wmnet with reason: Rebooting misbehaving an-master1002 [production]
08:53 <hashar@deploy2002> rebuilt and synchronized wikiversions files: Revert "group2 wikis to 1.41.0-wmf.30" # T347081 [production]
08:49 <ayounsi@cumin1001> END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 56099 [production]
08:45 <ayounsi@cumin1001> START - Cookbook sre.network.peering with action 'configure' for AS: 56099 [production]
08:44 <ayounsi@cumin1001> END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 38195 [production]
08:41 <ayounsi@cumin1001> START - Cookbook sre.network.peering with action 'configure' for AS: 38195 [production]
08:40 <ayounsi@cumin1001> END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 38195 [production]
08:39 <ayounsi@cumin1001> START - Cookbook sre.network.peering with action 'configure' for AS: 38195 [production]
08:38 <jayme@deploy2002> helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply [production]
08:38 <jayme@deploy2002> helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply [production]
08:38 <jayme@deploy2002> helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply [production]
08:38 <jayme@deploy2002> helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply [production]
08:35 <godog> add 200G to prometheus/ops in eqiad [production]