2023-04-24
§
|
08:43 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 0:15:00 on 34 hosts with reason: Enabling replication T335266 |
[production] |
08:33 |
<marostegui> |
Enable replication eqiad -> codfw on s5 dbmaint eqiad T335266 |
[production] |
08:32 |
<cgoubert@deploy2002> |
Finished scap: testing T329857 (duration: 14m 29s) |
[production] |
08:32 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 26 hosts with reason: Enabling replication T335266 |
[production] |
08:32 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 0:15:00 on 26 hosts with reason: Enabling replication T335266 |
[production] |
08:29 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 27 hosts with reason: Enabling replication T335266 |
[production] |
08:28 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 0:15:00 on 27 hosts with reason: Enabling replication T335266 |
[production] |
08:28 |
<marostegui> |
Enable replication eqiad -> codfw on s6 dbmaint eqiad T335266 |
[production] |
08:27 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 27 hosts with reason: Enabling replication T335266 |
[production] |
08:26 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 0:15:00 on 27 hosts with reason: Enabling replication T335266 |
[production] |
08:26 |
<marostegui> |
Enable replication eqiad -> codfw on s2 dbmaint eqiad T335266 |
[production] |
08:25 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.dhcp for host an-worker1110.eqiad.wmnet |
[production] |
08:21 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-worker1110.eqiad.wmnet with reason: Upgrading RAID controller firmware |
[production] |
08:21 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-worker1110.eqiad.wmnet with reason: Upgrading RAID controller firmware |
[production] |
08:20 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 10 hosts with reason: Enabling replication T335266 |
[production] |
08:20 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 0:15:00 on 10 hosts with reason: Enabling replication T335266 |
[production] |
08:20 |
<marostegui> |
Enable replication eqiad -> codfw on x1 dbmaint eqiad T335266 |
[production] |
08:18 |
<cgoubert@deploy2002> |
Started scap: testing T329857 |
[production] |
08:17 |
<marostegui> |
Enable replication eqiad -> codfw on es5 dbmaint eqiad T335266 |
[production] |
08:14 |
<claime> |
Deploying 909302 on deploy2002 for T329857 |
[production] |
08:10 |
<claime> |
Disabling puppet on deploy2002 - T329857 |
[production] |
08:09 |
<claime> |
Deploying 909302 on deploy1002 for T329857 |
[production] |
08:08 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 6 hosts with reason: Enabling replication T335266 |
[production] |
08:08 |
<marostegui> |
Enable replication eqiad -> codfw on es4 dbmaint eqiad T335266 |
[production] |
08:08 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 0:15:00 on 6 hosts with reason: Enabling replication T335266 |
[production] |
08:07 |
<marostegui> |
Enable replication eqiad -> codfw on pc3 dbmaint eqiad T335266 |
[production] |
08:06 |
<marostegui> |
Enable replication eqiad -> codfw on pc2 dbmaint eqiad T335266 |
[production] |
08:05 |
<marostegui> |
Enable replication eqiad -> codfw on pc1 dbmaint eqiad T335266 |
[production] |
07:53 |
<mvernon@cumin2002> |
END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.41 in codfw |
[production] |
07:51 |
<mvernon@cumin2002> |
START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.41 in codfw |
[production] |
07:45 |
<jelto@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1004.wikimedia.org with OS bullseye |
[production] |
07:44 |
<mvernon@cumin2002> |
END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.59 in codfw |
[production] |
07:42 |
<mvernon@cumin2002> |
START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.59 in codfw |
[production] |
07:39 |
<dcausse> |
restarting blazegraph on wdqs1005 (stuck for 3+days) |
[production] |
07:38 |
<mvernon@cumin2002> |
END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.4a in codfw |
[production] |
07:36 |
<mvernon@cumin2002> |
START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.4a in codfw |
[production] |
07:24 |
<jelto@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage |
[production] |
07:21 |
<jelto@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage |
[production] |
07:06 |
<jelto@cumin2002> |
START - Cookbook sre.hosts.reimage for host gitlab1004.wikimedia.org with OS bullseye |
[production] |
2023-04-22
§
|
05:41 |
<joe> |
<thumbor/codfw>$ helmfile --state-values-set roll_restart=1 -e codfw sync |
[production] |
05:40 |
<oblivian@deploy2002> |
helmfile [codfw] DONE helmfile.d/services/thumbor: sync |
[production] |
05:39 |
<oblivian@deploy2002> |
helmfile [codfw] START helmfile.d/services/thumbor: sync |
[production] |
05:39 |
<oblivian@deploy2002> |
helmfile [codfw] DONE helmfile.d/services/thumbor: apply |
[production] |
05:39 |
<oblivian@deploy2002> |
helmfile [codfw] START helmfile.d/services/thumbor: apply |
[production] |
05:15 |
<hashar@deploy2002> |
Finished deploy [integration/docroot@b816911]: Update Grafana URL (duration: 00m 11s) |
[production] |
05:15 |
<hashar@deploy2002> |
Started deploy [integration/docroot@b816911]: Update Grafana URL |
[production] |
05:10 |
<joe> |
sudo cumin -b 1 -s 20 'A:swift-fe-codfw' 'systemctl restart swift-proxy.service' |
[production] |
04:33 |
<vgutierrez> |
restart haproxy on cp1087 - T334448 |
[production] |