1601-1650 of 10000 results (36ms)
2024-02-07 §
18:30 <btullis@cumin1002> START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons. [production]
18:25 <btullis@cumin1002> END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-jumbo-eqiad [production]
18:23 <marostegui@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P56463 and previous config saved to /var/cache/conftool/dbconfig/20240207-182342-marostegui.json [production]
18:08 <marostegui@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P56462 and previous config saved to /var/cache/conftool/dbconfig/20240207-180835-marostegui.json [production]
18:00 <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.k8s.reboot for all workers [tools]
17:59 <wmbot~lucaswerkmeister@tools-sgebastion-10> started webservice again (and patched the startup probe into it); took a while to come up but now it seems to be working [tools.lexeme-forms]
17:58 <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 [tools]
17:58 <andrew@cloudcumin1001> START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 [tools]
17:53 <marostegui@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db1191 (T355609)', diff saved to https://phabricator.wikimedia.org/P56461 and previous config saved to /var/cache/conftool/dbconfig/20240207-175328-marostegui.json [production]
17:52 <bking@cumin2002> END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw [production]
17:52 <bking@cumin2002> START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw [production]
17:49 <wmbot~lucaswerkmeister@tools-sgebastion-10> stopped webservice, restart wasn’t working so let’s try harder [tools.lexeme-forms]
17:48 <marostegui@cumin1002> dbctl commit (dc=all): 'Depooling db1191 (T355609)', diff saved to https://phabricator.wikimedia.org/P56460 and previous config saved to /var/cache/conftool/dbconfig/20240207-174807-marostegui.json [production]
17:48 <marostegui@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1191.eqiad.wmnet with reason: Maintenance [production]
17:47 <marostegui@cumin1002> START - Cookbook sre.hosts.downtime for 6:00:00 on db1191.eqiad.wmnet with reason: Maintenance [production]
17:47 <marostegui@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db1174 (T355609)', diff saved to https://phabricator.wikimedia.org/P56459 and previous config saved to /var/cache/conftool/dbconfig/20240207-174745-marostegui.json [production]
17:45 <wmbot~lucaswerkmeister@tools-sgebastion-10> restarted webservice, log was full of various errors [tools.lexeme-forms]
17:32 <jgiannelos@deploy2002> Finished deploy [restbase/deploy@1007273]: Disabling storage for jawiki (duration: 07m 19s) [production]
17:32 <marostegui@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P56458 and previous config saved to /var/cache/conftool/dbconfig/20240207-173238-marostegui.json [production]
17:26 <btullis> roll-restarting kafka-jumbo for T356382 [analytics]
17:26 <btullis@cumin1002> START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-jumbo-eqiad [production]
17:25 <jgiannelos@deploy2002> Started deploy [restbase/deploy@1007273]: Disabling storage for jawiki [production]
17:24 <taavi@cloudcumin1001> END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for all workers [tools]
17:23 <taavi@cloudcumin1001> START - Cookbook wmcs.toolforge.k8s.reboot for all workers [tools]
17:17 <marostegui@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P56457 and previous config saved to /var/cache/conftool/dbconfig/20240207-171732-marostegui.json [production]
17:11 <hnowlan@puppetmaster1001> conftool action : set/pooled=yes:weight=10; selector: service=thumbor [production]
17:05 <taavi@cloudcumin1001> END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for all workers [tools]
17:05 <taavi@cloudcumin1001> START - Cookbook wmcs.toolforge.k8s.reboot for all workers [tools]
17:04 <sbailey@deploy2002> helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply [production]
17:04 <sbailey@deploy2002> helmfile [codfw] START helmfile.d/services/wikifeeds: apply [production]
17:03 <sbailey@deploy2002> helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply [production]
17:03 <taavi@cloudcumin1001> END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for all workers [tools]
17:03 <sbailey@deploy2002> helmfile [eqiad] START helmfile.d/services/wikifeeds: apply [production]
17:02 <taavi@cloudcumin1001> START - Cookbook wmcs.toolforge.k8s.reboot for all workers [tools]
17:02 <marostegui@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db1174 (T355609)', diff saved to https://phabricator.wikimedia.org/P56456 and previous config saved to /var/cache/conftool/dbconfig/20240207-170225-marostegui.json [production]
17:01 <taavi@cloudcumin1001> END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for all workers [tools]
16:57 <marostegui@cumin1002> dbctl commit (dc=all): 'Depooling db1174 (T355609)', diff saved to https://phabricator.wikimedia.org/P56455 and previous config saved to /var/cache/conftool/dbconfig/20240207-165703-marostegui.json [production]
16:56 <marostegui@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance [production]
16:56 <marostegui@cumin1002> START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance [production]
16:55 <sbailey@deploy2002> helmfile [staging] DONE helmfile.d/services/wikifeeds: apply [production]
16:54 <sbailey@deploy2002> helmfile [staging] START helmfile.d/services/wikifeeds: apply [production]
16:52 <hnowlan@cumin2002> conftool action : set/pooled=yes; selector: name=(mw2377.codfw.wmnet|mw2378.codfw.wmnet|mw2406.codfw.wmnet|mw2301.codfw.wmnet|mw2310.codfw.wmnet),cluster=kubernetes,service=kubesvc [production]
16:52 <hnowlan@cumin2002> conftool action : set/weight=10; selector: name=(mw2377.codfw.wmnet|mw2378.codfw.wmnet|mw2406.codfw.wmnet|mw2301.codfw.wmnet|mw2310.codfw.wmnet),cluster=kubernetes,service=kubesvc [production]
16:47 <marostegui@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance [production]
16:47 <marostegui@cumin1002> START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance [production]
16:47 <marostegui@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T355609)', diff saved to https://phabricator.wikimedia.org/P56454 and previous config saved to /var/cache/conftool/dbconfig/20240207-164738-marostegui.json [production]
16:47 <cmooney@cumin1002> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for asw-a-codfw,cr[1-2]-codfw,lsw1-a2-codfw.mgmt [production]
16:47 <cmooney@cumin1002> START - Cookbook sre.hosts.remove-downtime for asw-a-codfw,cr[1-2]-codfw,lsw1-a2-codfw.mgmt [production]
16:47 <ejegg> fundraising civicrm upgraded from c3dff157 to 98d35c79 [production]
16:46 <hnowlan> homer 'cr*codfw*' commit 'T354791' for 5 new k8s ex-appservers [production]