4451-4500 of 10000 results (53ms)
2024-01-02 §
10:37 <dcaro> hard reboot tools-harbor-1 [tools]
10:22 <wm-bot2> fran@wmf3169 END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99) [admin]
10:22 <wm-bot2> fran@wmf3169 START - Cookbook wmcs.openstack.cloudvirt.vm_console [admin]
10:13 <dhinus> hard reboot tools-harbor-1 [tools]
09:24 <btullis> adding three days' downtime to dbstore1008, prior to switching its role to `mariadb::analytics_replica` for T351921 [analytics]
09:23 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Commissioning new database server [production]
09:23 <btullis@cumin1001> START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Commissioning new database server [production]
09:17 <pfischer@deploy2002> Finished scap: Backport for [[gerrit:987028|configure message_key_fields for update_pipeline]] (duration: 15m 35s) [production]
09:05 <pfischer@deploy2002> pfischer: Continuing with sync [production]
09:04 <pfischer@deploy2002> pfischer: Backport for [[gerrit:987028|configure message_key_fields for update_pipeline]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
09:02 <moritzm> installing nodejs security updates on bookworm [production]
09:02 <pfischer@deploy2002> Started scap: Backport for [[gerrit:987028|configure message_key_fields for update_pipeline]] [production]
08:33 <akosiaris@cumin1001> END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2448.mgmt.codfw.wmnet with reboot policy GRACEFUL [production]
08:27 <jayme> restart prometheus@k8s prometheus@k8s-aux in eqiad - T343529 [production]
08:26 <akosiaris@cumin1001> START - Cookbook sre.hosts.provision for host mw2448.mgmt.codfw.wmnet with reboot policy GRACEFUL [production]
06:45 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2144.codfw.wmnet with OS bookworm [production]
06:27 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2144.codfw.wmnet with reason: host reimage [production]
06:24 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on db2144.codfw.wmnet with reason: host reimage [production]
06:06 <marostegui@cumin1001> START - Cookbook sre.hosts.reimage for host db2144.codfw.wmnet with OS bookworm [production]
05:00 <mwpresync@deploy2002> Finished scap: testwikis wikis to 1.42.0-wmf.12 refs T350088 (duration: 56m 48s) [production]
04:03 <mwpresync@deploy2002> Started scap: testwikis wikis to 1.42.0-wmf.12 refs T350088 [production]
00:04 <wm-bot> <jjmc89> disable all plagiabot jobs T354145 [tools.eranbot]
2024-01-01 §
21:38 <eileen> config revision changed from 026cf508 to 21b91455 [production]
21:13 <eileen> config revision changed from 3a1a1444 to 026cf508 [production]
21:13 <eileen> fork/mapping-edit-button-fix [production]
17:11 <joal@deploy2002> Finished deploy [airflow-dags/analytics@8b8a456]: Fix monthly job [airflow-dags/analytics@8b8a4567] (duration: 00m 31s) [production]
17:11 <joal> Deploying airflow to fix pageview daily aggregated monthly job [analytics]
17:11 <joal@deploy2002> Started deploy [airflow-dags/analytics@8b8a456]: Fix monthly job [airflow-dags/analytics@8b8a4567] [production]
15:54 <andrewbogott> rebooting tools-harbor-1, T354151 [tools]
15:41 <taavi> reloading zuul for 986673 [releng]
2023-12-31 §
21:39 <andrew@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.restart_openstack (exit_code=0) [admin]
21:35 <andrewbogott> running openstack service restart cookbook in eqiad1 in response to a bunch of service down alerts [admin]
21:34 <andrew@cloudcumin1001> START - Cookbook wmcs.openstack.restart_openstack [admin]
2023-12-30 §
16:55 <otto@deploy2002> Synchronized wmf-config/ext-EventStreamConfig.php: Config: [[gerrit:984627|Add eventlogging_MediaWikiPingback stream (T323828)]] (duration: 15m 10s) [production]
16:10 <taavi> publish updated pywikibot-scripts-stable image for pywikibot 8.6 T354077 [tools.pywikibot]
15:06 <wm-bot> <lucaswerkmeister> deployed 5baa3871d0 (l10n updates: lb, zh-hans) [tools.lexeme-forms]
12:43 <taavi@cloudcumin1001> END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) [tools]
12:43 <taavi@cloudcumin1001> START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors [tools]
11:09 <RhinosF1> deployed 4e79f706 (https://gitlab.wikimedia.org/cloudvps-repos/wikistats/-/merge_requests/7) for T354101 [wikistats]
11:02 <RhinosF1> starting emergency deployment [wikistats]
2023-12-29 §
22:59 <pfischer@deploy2002> helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [production]
22:59 <pfischer@deploy2002> helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [production]
22:57 <pfischer@deploy2002> helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [production]
21:39 <andrewbogott> rebooting tools-sgeweblight-10-28.tools.eqiad1.wikimedia.cloud because previous reset didn't get the queue out of error state [tools]
20:33 <wm-bot> <anticomposite> SULWatcher/manage.sh restart # Not connected. [tools.stewardbots]
19:31 <andrewbogott> restarting sge_execd on tools-sgeweblight-10-28.tools.eqiad1.wikimedia.cloud in response to error state alert [tools]
08:01 <pfischer@deploy2002> helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [production]
08:00 <pfischer@deploy2002> helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [production]
08:00 <pfischer@deploy2002> helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply [production]
07:58 <pfischer@deploy2002> helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply [production]