3651-3700 of 10000 results (84ms)
2023-04-24 §
08:44 <marostegui> Enable replication eqiad -> codfw on s8 dbmaint eqiad T335266 [production]
08:44 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 34 hosts with reason: Enabling replication T335266 [production]
08:43 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 0:15:00 on 34 hosts with reason: Enabling replication T335266 [production]
08:33 <marostegui> Enable replication eqiad -> codfw on s5 dbmaint eqiad T335266 [production]
08:32 <cgoubert@deploy2002> Finished scap: testing T329857 (duration: 14m 29s) [production]
08:32 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 26 hosts with reason: Enabling replication T335266 [production]
08:32 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 0:15:00 on 26 hosts with reason: Enabling replication T335266 [production]
08:29 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 27 hosts with reason: Enabling replication T335266 [production]
08:28 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 0:15:00 on 27 hosts with reason: Enabling replication T335266 [production]
08:28 <marostegui> Enable replication eqiad -> codfw on s6 dbmaint eqiad T335266 [production]
08:27 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 27 hosts with reason: Enabling replication T335266 [production]
08:26 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 0:15:00 on 27 hosts with reason: Enabling replication T335266 [production]
08:26 <marostegui> Enable replication eqiad -> codfw on s2 dbmaint eqiad T335266 [production]
08:25 <btullis@cumin1001> START - Cookbook sre.hosts.dhcp for host an-worker1110.eqiad.wmnet [production]
08:21 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-worker1110.eqiad.wmnet with reason: Upgrading RAID controller firmware [production]
08:21 <btullis@cumin1001> START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-worker1110.eqiad.wmnet with reason: Upgrading RAID controller firmware [production]
08:20 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 10 hosts with reason: Enabling replication T335266 [production]
08:20 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 0:15:00 on 10 hosts with reason: Enabling replication T335266 [production]
08:20 <marostegui> Enable replication eqiad -> codfw on x1 dbmaint eqiad T335266 [production]
08:18 <cgoubert@deploy2002> Started scap: testing T329857 [production]
08:17 <marostegui> Enable replication eqiad -> codfw on es5 dbmaint eqiad T335266 [production]
08:14 <claime> Deploying 909302 on deploy2002 for T329857 [production]
08:10 <claime> Disabling puppet on deploy2002 - T329857 [production]
08:09 <claime> Deploying 909302 on deploy1002 for T329857 [production]
08:08 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 6 hosts with reason: Enabling replication T335266 [production]
08:08 <marostegui> Enable replication eqiad -> codfw on es4 dbmaint eqiad T335266 [production]
08:08 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 0:15:00 on 6 hosts with reason: Enabling replication T335266 [production]
08:07 <marostegui> Enable replication eqiad -> codfw on pc3 dbmaint eqiad T335266 [production]
08:06 <marostegui> Enable replication eqiad -> codfw on pc2 dbmaint eqiad T335266 [production]
08:05 <marostegui> Enable replication eqiad -> codfw on pc1 dbmaint eqiad T335266 [production]
07:53 <mvernon@cumin2002> END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.41 in codfw [production]
07:51 <mvernon@cumin2002> START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.41 in codfw [production]
07:45 <jelto@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1004.wikimedia.org with OS bullseye [production]
07:44 <mvernon@cumin2002> END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.59 in codfw [production]
07:42 <mvernon@cumin2002> START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.59 in codfw [production]
07:39 <dcausse> restarting blazegraph on wdqs1005 (stuck for 3+days) [production]
07:38 <mvernon@cumin2002> END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.4a in codfw [production]
07:36 <mvernon@cumin2002> START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.4a in codfw [production]
07:24 <jelto@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage [production]
07:21 <jelto@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage [production]
07:06 <jelto@cumin2002> START - Cookbook sre.hosts.reimage for host gitlab1004.wikimedia.org with OS bullseye [production]
2023-04-22 §
05:41 <joe> <thumbor/codfw>$ helmfile --state-values-set roll_restart=1 -e codfw sync [production]
05:40 <oblivian@deploy2002> helmfile [codfw] DONE helmfile.d/services/thumbor: sync [production]
05:39 <oblivian@deploy2002> helmfile [codfw] START helmfile.d/services/thumbor: sync [production]
05:39 <oblivian@deploy2002> helmfile [codfw] DONE helmfile.d/services/thumbor: apply [production]
05:39 <oblivian@deploy2002> helmfile [codfw] START helmfile.d/services/thumbor: apply [production]
05:15 <hashar@deploy2002> Finished deploy [integration/docroot@b816911]: Update Grafana URL (duration: 00m 11s) [production]
05:15 <hashar@deploy2002> Started deploy [integration/docroot@b816911]: Update Grafana URL [production]
05:10 <joe> sudo cumin -b 1 -s 20 'A:swift-fe-codfw' 'systemctl restart swift-proxy.service' [production]
04:33 <vgutierrez> restart haproxy on cp1087 - T334448 [production]