351-400 of 10000 results (25ms)
2021-02-24 §
08:58 <jayme@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' . [production]
08:53 <jayme@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' . [production]
08:52 <jayme@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' . [production]
08:52 <jayme@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' . [production]
08:50 <jayme@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [production]
08:50 <jayme@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . [production]
08:48 <jayme@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' . [production]
08:48 <jayme@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' . [production]
08:35 <moritzm> reimaging bast1002 to Buster [production]
08:33 <jayme@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' . [production]
08:32 <jayme@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' . [production]
08:30 <jayme@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' . [production]
08:26 <jayme@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' . [production]
08:04 <jynus> restarting db2101, db2139, db2141 T271913 [production]
07:56 <moritzm> installing remaining openldap updates for buster [production]
07:47 <elukey> change gid/uid for druid + roll restart of all druid nodes [analytics]
06:24 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1090.eqiad.wmnet [production]
06:18 <marostegui@cumin1001> START - Cookbook sre.hosts.decommission for hosts db1090.eqiad.wmnet [production]
04:10 <ryankemper> T267927 [WDQS Data Reload] Running `/srv/deployment/wdqs/wdqs/loadData.sh -n wdq -d /srv/wdqs/munged/ -s 864` on `ryankemper@wdqs2008` tmux session `data_reload` [production]
04:04 <ryankemper> [WDQS] Depooled `wdqs2008` [production]
03:16 <pt1979@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2149.codfw.wmnet with reason: REIMAGE [production]
03:13 <pt1979@cumin2001> START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: REIMAGE [production]
03:03 <pt1979@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2148.codfw.wmnet with reason: REIMAGE [production]
03:01 <pt1979@cumin2001> START - Cookbook sre.hosts.downtime for 2:00:00 on db2148.codfw.wmnet with reason: REIMAGE [production]
02:58 <ryankemper> [WDQS Data Reload] Restarting reload on test node `wdqs1009` from where it last left off: `/srv/deployment/wdqs/wdqs/loadData.sh -n wdq -d /srv/wdqs/munged/ -s 947` [production]
02:57 <ryankemper> [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good [production]
02:39 <pt1979@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2147.codfw.wmnet with reason: REIMAGE [production]
02:37 <pt1979@cumin2001> START - Cookbook sre.hosts.downtime for 2:00:00 on db2147.codfw.wmnet with reason: REIMAGE [production]
02:35 <pt1979@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2146.codfw.wmnet with reason: REIMAGE [production]
02:33 <pt1979@cumin2001> START - Cookbook sre.hosts.downtime for 2:00:00 on db2146.codfw.wmnet with reason: REIMAGE [production]
02:30 <ryankemper> [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'` [production]
02:29 <ryankemper> [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'` [production]
02:29 <ryankemper> [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'` [production]
02:27 <ryankemper@deploy1001> Finished deploy [wdqs/wdqs@b5fc9d5]: 0.3.64 (duration: 06m 24s) [production]
02:24 <ebernhardson@deploy1001> Finished deploy [wikimedia/discovery/analytics@25549e7]: ores_bulk_ingest: use backoffs starting at 30sec (duration: 01m 37s) [production]
02:22 <gehel@cumin2001> END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) [production]
02:22 <ebernhardson@deploy1001> Started deploy [wikimedia/discovery/analytics@25549e7]: ores_bulk_ingest: use backoffs starting at 30sec [production]
02:20 <ryankemper@deploy1001> Started deploy [wdqs/wdqs@b5fc9d5]: 0.3.64 [production]
02:18 <ryankemper@deploy1001> Finished deploy [wdqs/wdqs@b5fc9d5]: 0.3.64 (duration: 11m 22s) [production]
02:09 <pt1979@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2145.codfw.wmnet with reason: REIMAGE [production]
02:07 <ryankemper> [WDQS Deploy] Tests passing following deploy of `0.3.64` on canary `wdqs1003`; proceeding to rest of fleet [production]
02:07 <pt1979@cumin2001> START - Cookbook sre.hosts.downtime for 2:00:00 on db2145.codfw.wmnet with reason: REIMAGE [production]
02:06 <ryankemper@deploy1001> Started deploy [wdqs/wdqs@b5fc9d5]: 0.3.64 [production]
02:06 <ryankemper> [WDQS Deploy] Gearing up for deploy of wdqs `0.3.64`. Pre-deploy tests passing on canary `wdqs1003` [production]
01:04 <bstorm> hard rebooting tools-k8s-worker-76 because it's in a sorry state [tools]
00:58 <volker-e@deploy1001> Finished deploy [design/style-guide@a66b5b6]: Deploy design/style-guide: a66b5b6 “Components”: Add “Dialogs” (#430) (duration: 00m 06s) [production]
00:58 <volker-e@deploy1001> Started deploy [design/style-guide@a66b5b6]: Deploy design/style-guide: a66b5b6 “Components”: Add “Dialogs” (#430) [production]
00:47 <ebernhardson@deploy1001> Finished deploy [wikimedia/discovery/analytics@4ee50e3]: ores_bulk_ingest: more retry on error (duration: 01m 37s) [production]
00:45 <ebernhardson@deploy1001> Started deploy [wikimedia/discovery/analytics@4ee50e3]: ores_bulk_ingest: more retry on error [production]
00:17 <bstorm> set --property hw_scsi_model=virtio-scsi and --property hw_disk_bus=scsi on the main stretch image in glance on eqiad1 T275430 [admin]