1151-1200 of 10000 results (112ms)
2024-06-24 §
21:00 <eevans@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002 [production]
20:53 <btullis@cumin1002> START - Cookbook sre.hosts.reboot-single for host an-redacteddb1001.eqiad.wmnet [production]
20:36 <inflatador> bking@alert1001 install `ripgrep` deb pkg T368107 [production]
20:22 <ladsgroup@deploy1002> Synchronized php-1.43.0-wmf.10/includes/libs/rdbms/loadbalancer/LoadBalancer.php: (no justification provided) (duration: 11m 04s) [production]
20:21 <mutante> snapsho1017 - systemctl mask commonsrdf-dump ; systemctl mask commonsjson-dump T368098 [production]
20:18 <taavi> taavi@snapshot1017 ~ $ sudo systemctl stop commons*.service [production]
20:01 <andrew@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1056.eqiad.wmnet with OS bookworm [production]
19:35 <andrew@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage [production]
19:32 <andrew@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage [production]
19:27 <bking@deploy1002> helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply [production]
19:27 <bking@deploy1002> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply [production]
19:26 <bking@deploy1002> helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply [production]
19:26 <bking@deploy1002> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply [production]
19:08 <mutante> LDAP - added daphnesmit to group 'wmf' - Phabricator: added dsmit-wmf to WMF-NDA group T368140 [production]
19:02 <sukhe> ms-fe1009: restart swift-proxy: T360913 [production]
18:59 <mutante> ms-fe1011 - restarted swift-proxy [production]
18:53 <eevans@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002 [production]
18:52 <eevans@cumin1002> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 15 hosts [production]
18:52 <eevans@cumin1002> START - Cookbook sre.hosts.remove-downtime for 15 hosts [production]
18:50 <eevans@cumin1002> END (ERROR) - Cookbook sre.cassandra.roll-restart (exit_code=97) for nodes matching A:restbase-eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002 [production]
18:50 <eevans@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002 [production]
18:50 <sukhe> sudo cumin -s1 -b60 'ms-fe1010*,ms-fe1013*' 'systemctl restart swift-proxy' [production]
18:50 <mutante> ms-fe1010,ms-fe1013 - restart swift-proxy - T360913 [production]
18:48 <ladsgroup@deploy1002> Synchronized private/PrivateSettings.php: Rotate ChronologyProtector secret (duration: 11m 33s) [production]
18:46 <eevans@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "d"} and A:restbase and A:eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002 [production]
18:43 <ladsgroup@deploy1002> ladsgroup: Continuing with sync [production]
18:41 <ladsgroup@deploy1002> ladsgroup: Rotate ChronologyProtector secret synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
18:17 <andrew@cumin1002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1055.eqiad.wmnet with OS bookworm [production]
18:16 <mutante> ms-fe1012:~] $ sudo systemctl restart swift-proxy T360913 [production]
18:16 <mutante> ms-fe1012:~] $ sudo systemctl restart swift-proxy T360931 [production]
18:07 <swfrench@deploy1002> helmfile [codfw] DONE helmfile.d/services/mw-debug: apply [production]
18:06 <swfrench@deploy1002> helmfile [codfw] START helmfile.d/services/mw-debug: apply [production]
18:06 <swfrench@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply [production]
18:05 <swfrench@deploy1002> helmfile [eqiad] START helmfile.d/services/mw-debug: apply [production]
18:04 <sukhe@cumin1002> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1020.eqiad.wmnet [production]
18:04 <sukhe@cumin1002> START - Cookbook sre.hosts.remove-downtime for lvs1020.eqiad.wmnet [production]
18:02 <eevans@cumin1002> START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "d"} and A:restbase and A:eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002 [production]
18:02 <eevans@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "b"} and A:restbase and A:eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002 [production]
17:57 <sukhe> restart on pybal lvs1019 [production]
17:56 <sukhe@cumin1002> END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs [production]
17:53 <sukhe@puppetmaster1001> conftool action : set/pooled=no; selector: cluster=apus,dc=eqiad [production]
17:50 <sukhe@cumin1002> START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs [production]
17:50 <sukhe@cumin1002> END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs [production]
17:49 <sukhe@cumin1002> START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs [production]
17:48 <sbassett@deploy1002> helmfile [eqiad] DONE helmfile.d/services/miscweb: apply [production]
17:48 <sbassett@deploy1002> helmfile [eqiad] START helmfile.d/services/miscweb: apply [production]
17:48 <sbassett@deploy1002> helmfile [codfw] DONE helmfile.d/services/miscweb: apply [production]
17:48 <sukhe@cumin1002> END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs [production]
17:47 <sbassett@deploy1002> helmfile [codfw] START helmfile.d/services/miscweb: apply [production]
17:47 <andrew@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage [production]