6801-6850 of 10000 results (46ms)
2024-01-15 §
12:00 <btullis@cumin1002> START - Cookbook sre.hosts.remove-downtime for 92 hosts [production]
11:57 <btullis> un-pausing all previously paused DAGS on all airflow instances for T332573 [analytics]
11:55 <btullis> re-enabling gobblin jobs [analytics]
11:41 <brouberol@deploy2002> helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply [production]
11:38 <brouberol@deploy2002> helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply [production]
11:38 <brouberol> redeploying the Spark History Server to pick up the new HDFS namenodes - T332573 [analytics]
11:29 <btullis> puppet runs cleanly on an-master1003 and it is the active namenode - running puppet an an-master1004. [analytics]
11:20 <btullis> running puppet on an-master1003 to set it to active for T332573 [analytics]
11:16 <btullis> running puppet on journal nodes first for T332573 [analytics]
11:10 <btullis@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-coord[1001-1004].eqiad.wmnet with reason: Bringing new nameservers into service [production]
11:10 <btullis@cumin1002> START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-coord[1001-1004].eqiad.wmnet with reason: Bringing new nameservers into service [production]
11:10 <btullis@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-master[1001-1004].eqiad.wmnet with reason: Bringing new nameservers into service [production]
11:10 <btullis@cumin1002> START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-master[1001-1004].eqiad.wmnet with reason: Bringing new nameservers into service [production]
11:09 <jiji@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1037.eqiad.wmnet [production]
11:08 <btullis@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 8 hosts with reason: Bringing new nameservers into service [production]
11:08 <btullis@cumin1002> START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 8 hosts with reason: Bringing new nameservers into service [production]
11:08 <btullis@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 97 hosts with reason: Bringing new nameservers into service [production]
11:07 <btullis@cumin1002> START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 97 hosts with reason: Bringing new nameservers into service [production]
11:03 <btullis> stopping all hadoop services [analytics]
11:03 <jiji@cumin1002> START - Cookbook sre.hosts.reboot-single for host mc1037.eqiad.wmnet [production]
10:59 <btullis> disabling puppet on all hadoop nodes [analytics]
10:58 <jiji@cumin1002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1002.eqiad.wmnet [production]
10:54 <btullis> putting HDFS into safe mode for T332573 [analytics]
10:51 <jiji@cumin1002> START - Cookbook sre.hosts.reboot-single for host mc-gp1002.eqiad.wmnet [production]
10:48 <moritzm> installing systemd bugfix updates from Bullseye point release [production]
10:30 <jmm@cumin2002> END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host mc1037.eqiad.wmnet [production]
10:13 <jmm@cumin2002> START - Cookbook sre.puppet.migrate-host for host mc1037.eqiad.wmnet [production]
10:08 <jmm@cumin2002> END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host mc-gp1002.eqiad.wmnet [production]
10:02 <ladsgroup@deploy2002> Finished scap: Backport for [[gerrit:990424|SecurePoll: Adding updated voterlist files (T349263)]] (duration: 16m 04s) [production]
09:58 <jmm@cumin2002> START - Cookbook sre.puppet.migrate-host for host mc-gp1002.eqiad.wmnet [production]
09:56 <ladsgroup@deploy2002> ladsgroup: Continuing with sync [production]
09:48 <ladsgroup@deploy2002> ladsgroup: Backport for [[gerrit:990424|SecurePoll: Adding updated voterlist files (T349263)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
09:46 <ladsgroup@deploy2002> Started scap: Backport for [[gerrit:990424|SecurePoll: Adding updated voterlist files (T349263)]] [production]
09:18 <taavi> reboot stuck tools-k8s-worker-84 [tools]
09:16 <pfischer@deploy2002> helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [production]
09:16 <pfischer@deploy2002> helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [production]
09:15 <pfischer@deploy2002> helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [production]
09:15 <pfischer@deploy2002> helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [production]
09:15 <pfischer@deploy2002> helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [production]
09:14 <pfischer@deploy2002> helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [production]
08:45 <filippo@deploy2002> Finished deploy [performance/arc-lamp@67389a0]: (no justification provided) (duration: 00m 05s) [production]
08:45 <filippo@deploy2002> Started deploy [performance/arc-lamp@67389a0]: (no justification provided) [production]
08:23 <dcausse@deploy2002> Finished scap: Backport for [[gerrit:990029|enable page_rerender for 5th batch of wikis (T351503)]] (duration: 11m 40s) [production]
08:17 <dcausse@deploy2002> pfischer and dcausse: Continuing with sync [production]
08:13 <dcausse@deploy2002> pfischer and dcausse: Backport for [[gerrit:990029|enable page_rerender for 5th batch of wikis (T351503)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
08:12 <dcausse@deploy2002> Started scap: Backport for [[gerrit:990029|enable page_rerender for 5th batch of wikis (T351503)]] [production]
04:57 <andrewbogott> restarting wikitech-static, oom [production]
2024-01-14 §
15:47 <taavi@deploy2002> Finished scap: Backport for [[gerrit:990396|Log IpReputation channel as debug (T354928)]] (duration: 26m 49s) [production]
15:36 <taavi@deploy2002> taavi: Continuing with sync [production]
15:35 <taavi@deploy2002> taavi: Backport for [[gerrit:990396|Log IpReputation channel as debug (T354928)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]