9501-9550 of 10000 results (84ms)
2022-10-06 ยง
13:11 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [production]
13:11 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply [production]
13:08 <btullis@cumin1001> START - Cookbook sre.dns.netbox [production]
13:06 <aborrero@cumin1001> START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye [production]
13:06 <urbanecm@deploy1002> urbanecm and sbisson: Backport for [[gerrit:826882|Explicit config for Wikistories discovery module (T314582)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet [production]
13:06 <aborrero@cumin1001> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudnet1006.eqiad.wmnet with OS bullseye [production]
13:05 <urbanecm@deploy1002> Started scap: Backport for [[gerrit:826882|Explicit config for Wikistories discovery module (T314582)]] [production]
13:02 <jelto> update gitlab-settings to enable admin_mode on gitlab production instances - T316419 [releng]
13:00 <James_F> Docker: Building and publishing php74:0.3.2 and cascade for T318918 [releng]
12:59 <jelto> update gitlab-settings to enable admin_mode on gitlab replica instances - T316419 [releng]
12:59 <aborrero@cumin1001> START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye [production]
12:58 <aborrero@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye [production]
12:56 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1026.eqiad.wmnet with reason: Downtime for removal from Ganeti cluster and eventual bullseye reimage [production]
12:56 <jmm@cumin2002> START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1026.eqiad.wmnet with reason: Downtime for removal from Ganeti cluster and eventual bullseye reimage [production]
12:55 <jelto> update gitlab-settings to enable admin_mode on gitlab test instance - T316419 [releng]
12:54 <btullis@cumin1001> START - Cookbook sre.hosts.decommission for hosts aqs1006.eqiad.wmnet [production]
12:45 <jmm@cumin2002> END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti1029.eqiad.wmnet [production]
12:43 <aborrero@cumin1001> START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye [production]
12:42 <aborrero@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye [production]
12:40 <elukey@cumin1001> START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons. [production]
12:39 <cmooney@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
12:36 <cmooney@cumin1001> START - Cookbook sre.dns.netbox [production]
12:34 <aborrero@cumin1001> START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye [production]
12:31 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet [production]
12:24 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1005.eqiad.wmnet [production]
12:24 <btullis@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
12:21 <btullis@cumin1001> START - Cookbook sre.dns.netbox [production]
12:15 <btullis@cumin1001> START - Cookbook sre.hosts.decommission for hosts aqs1005.eqiad.wmnet [production]
12:15 <btullis> decommissioning aqs1005 [analytics]
12:09 <jmm@cumin2002> END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1012.eqiad.wmnet to cluster eqiad and group C [production]
11:54 <arturo> rebooting cloudnet1005/1006 to see if they have the right network config (T316284) [admin]
11:50 <arturo> set neutron l3 agents on cloudnet1005/1006 as down `root@cloudcontrol1005:~# neutron agent-update --admin-state-down <uuid>` (T316284) [admin]
11:40 <arturo> [codfw1dev] rebooting both network nodes to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/839492 [admin]
11:32 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1004.eqiad.wmnet [production]
11:32 <btullis@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
11:28 <jbond> enable puppet post deploy puppetdb change 814824 [production]
11:27 <jbond> switch puppetdb replication to use replications slots [production]
11:27 <btullis@cumin1001> START - Cookbook sre.dns.netbox [production]
11:27 <btullis> cold-reset the BMC on analytics1076 [production]
11:23 <btullis> decommissioning aqs1004 [analytics]
11:22 <btullis@cumin1001> START - Cookbook sre.hosts.decommission for hosts aqs1004.eqiad.wmnet [production]
10:58 <jbond> disable puppet temporarily to deploy a puppetdb change 814824 [production]
10:51 <_joe_> installing the upgraded php package everywhere, T318918 [production]
10:30 <elukey> restart kafka on kafka-logging1003 to reload the conifg (cleanup old super.users related to past keystore) [production]
10:16 <moritzm> installing ruby-rack security updates [production]
10:14 <arturo> [codfw1dev] restart neutron-l3-agent on cloudnet2006-dev, it was dead [admin]
10:11 <hoo> Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all remaining wikis [production]
10:07 <jmm@cumin2002> END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging NOkafor out of all services on: 1213 hosts [production]
10:07 <jmm@cumin2002> START - Cookbook sre.idm.logout Logging NOkafor out of all services on: 1213 hosts [production]
10:07 <jmm@cumin2002> END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging NOkafor out of all services on: 799 hosts [production]