2021-02-24
§
|
02:37 |
<pt1979@cumin2001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on db2147.codfw.wmnet with reason: REIMAGE |
[production] |
02:35 |
<pt1979@cumin2001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2146.codfw.wmnet with reason: REIMAGE |
[production] |
02:33 |
<pt1979@cumin2001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on db2146.codfw.wmnet with reason: REIMAGE |
[production] |
02:30 |
<ryankemper> |
[WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'` |
[production] |
02:29 |
<ryankemper> |
[WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'` |
[production] |
02:29 |
<ryankemper> |
[WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'` |
[production] |
02:27 |
<ryankemper@deploy1001> |
Finished deploy [wdqs/wdqs@b5fc9d5]: 0.3.64 (duration: 06m 24s) |
[production] |
02:24 |
<ebernhardson@deploy1001> |
Finished deploy [wikimedia/discovery/analytics@25549e7]: ores_bulk_ingest: use backoffs starting at 30sec (duration: 01m 37s) |
[production] |
02:22 |
<gehel@cumin2001> |
END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) |
[production] |
02:22 |
<ebernhardson@deploy1001> |
Started deploy [wikimedia/discovery/analytics@25549e7]: ores_bulk_ingest: use backoffs starting at 30sec |
[production] |
02:20 |
<ryankemper@deploy1001> |
Started deploy [wdqs/wdqs@b5fc9d5]: 0.3.64 |
[production] |
02:18 |
<ryankemper@deploy1001> |
Finished deploy [wdqs/wdqs@b5fc9d5]: 0.3.64 (duration: 11m 22s) |
[production] |
02:09 |
<pt1979@cumin2001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2145.codfw.wmnet with reason: REIMAGE |
[production] |
02:07 |
<ryankemper> |
[WDQS Deploy] Tests passing following deploy of `0.3.64` on canary `wdqs1003`; proceeding to rest of fleet |
[production] |
02:07 |
<pt1979@cumin2001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on db2145.codfw.wmnet with reason: REIMAGE |
[production] |
02:06 |
<ryankemper@deploy1001> |
Started deploy [wdqs/wdqs@b5fc9d5]: 0.3.64 |
[production] |
02:06 |
<ryankemper> |
[WDQS Deploy] Gearing up for deploy of wdqs `0.3.64`. Pre-deploy tests passing on canary `wdqs1003` |
[production] |
00:58 |
<volker-e@deploy1001> |
Finished deploy [design/style-guide@a66b5b6]: Deploy design/style-guide: a66b5b6 “Components”: Add “Dialogs” (#430) (duration: 00m 06s) |
[production] |
00:58 |
<volker-e@deploy1001> |
Started deploy [design/style-guide@a66b5b6]: Deploy design/style-guide: a66b5b6 “Components”: Add “Dialogs” (#430) |
[production] |
00:47 |
<ebernhardson@deploy1001> |
Finished deploy [wikimedia/discovery/analytics@4ee50e3]: ores_bulk_ingest: more retry on error (duration: 01m 37s) |
[production] |
00:45 |
<ebernhardson@deploy1001> |
Started deploy [wikimedia/discovery/analytics@4ee50e3]: ores_bulk_ingest: more retry on error |
[production] |
00:03 |
<pt1979@cumin2001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2145.codfw.wmnet with reason: REIMAGE |
[production] |
00:02 |
<pt1979@cumin2001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on db2145.codfw.wmnet with reason: REIMAGE |
[production] |
2021-02-23
§
|
22:52 |
<chaomodus> |
Netbox 2.10 upgrade complete T265084 |
[production] |
22:28 |
<crusnov@deploy1001> |
Finished deploy [netbox/deploy@dabbf5e]: Deploying Netbox 2.10.4-wmf to production T265084 (duration: 06m 11s) |
[production] |
22:25 |
<ppchelko@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . |
[production] |
22:25 |
<ppchelko@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . |
[production] |
22:23 |
<ppchelko@deploy1001> |
helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . |
[production] |
22:23 |
<ppchelko@deploy1001> |
helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . |
[production] |
22:22 |
<crusnov@deploy1001> |
Started deploy [netbox/deploy@dabbf5e]: Deploying Netbox 2.10.4-wmf to production T265084 |
[production] |
22:21 |
<ppchelko@deploy1001> |
helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . |
[production] |
22:21 |
<ppchelko@deploy1001> |
helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . |
[production] |
22:17 |
<chaomodus> |
deploying Netbox 2.10 to production and associated work |
[production] |
21:48 |
<otto@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Fix typos in wgEventLoggingSchemas (duration: 01m 05s) |
[production] |
21:38 |
<jhuneidi@deploy1001> |
rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.32 refs T274936 |
[production] |
21:36 |
<ebernhardson@deploy1001> |
Finished deploy [wikimedia/discovery/analytics@1344853]: apply spark env_vars to executors too (duration: 01m 46s) |
[production] |
21:34 |
<ebernhardson@deploy1001> |
Started deploy [wikimedia/discovery/analytics@1344853]: apply spark env_vars to executors too |
[production] |
21:28 |
<jhuneidi@deploy1001> |
Finished scap: testwikis wikis to 1.36.0-wmf.32 refs T274936 (duration: 36m 52s) |
[production] |
21:03 |
<otto@deploy1001> |
helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' . |
[production] |
21:03 |
<otto@deploy1001> |
helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' . |
[production] |
21:00 |
<ebernhardson@deploy1001> |
Finished deploy [wikimedia/discovery/analytics@46a8ae1]: ores_bulk_ingest: namespace is not plural (duration: 01m 41s) |
[production] |
21:00 |
<otto@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' . |
[production] |
20:59 |
<otto@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' . |
[production] |
20:58 |
<ebernhardson@deploy1001> |
Started deploy [wikimedia/discovery/analytics@46a8ae1]: ores_bulk_ingest: namespace is not plural |
[production] |
20:56 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab1002.eqiad.wmnet |
[production] |
20:52 |
<jhuneidi@deploy1001> |
Started scap: testwikis wikis to 1.36.0-wmf.32 refs T274936 |
[production] |
20:44 |
<ppchelko@deploy1001> |
Synchronized wmf-config/CommonSettings.php: No-op: math enable talking to mathoid directly in labs, T274436 (duration: 00m 57s) |
[production] |
20:38 |
<otto@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Fix typo in visualeditortemplatedialoguse - T275015 (duration: 01m 01s) |
[production] |
20:13 |
<razzi@cumin1001> |
END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka jumbo cluster: Reboot kafka nodes - razzi@cumin1001 |
[production] |
20:04 |
<dzahn@cumin1001> |
START - Cookbook sre.ganeti.makevm for new host gitlab1002.eqiad.wmnet |
[production] |