2020-10-29
§
|
01:17 |
<ryankemper> |
T266492 Beginning rolling restart of eqiad cirrus cluster, 3 nodes at a time, on `ryankemper@cumin1001` tmux session `elasticsearch_restart_eqiad` |
[production] |
01:16 |
<ryankemper@cumin1001> |
START - Cookbook sre.elasticsearch.rolling-restart |
[production] |
00:51 |
<ryankemper> |
Finished restart of wdqs categories across production hosts; wdqs deploy is complete and the service is healthy |
[production] |
00:14 |
<Amir1> |
rolling restart of ores |
[production] |
00:12 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
00:10 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
00:04 |
<ryankemper> |
Beginning restart of wdqs categories across production hosts, one at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'` |
[production] |
00:03 |
<ryankemper> |
Restarted wdqs categories across test hosts: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'` |
[production] |
00:03 |
<ryankemper> |
Restarted wdqs updater across all hosts: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'` |
[production] |
00:02 |
<ryankemper> |
Following wdqs deploy, https://query.wikidata.org successfully responds to an example query |
[production] |
00:01 |
<ryankemper@deploy1001> |
Finished deploy [wdqs/wdqs@8c97b17]: 0.3.53 (duration: 09m 29s) |
[production] |
2020-10-28
§
|
23:54 |
<ryankemper> |
Canary `wdqs1003` tests pass, proceeding with wdqs deploy to rest of fleet |
[production] |
23:52 |
<ryankemper@deploy1001> |
Started deploy [wdqs/wdqs@8c97b17]: 0.3.53 |
[production] |
23:52 |
<ryankemper@deploy1001> |
deploy aborted: 0.3.53 (duration: 00m 00s) |
[production] |
23:52 |
<ryankemper@deploy1001> |
Started deploy [wdqs/wdqs@8c97b17]: 0.3.53 |
[production] |
22:54 |
<mutante> |
scandium - scap pull after reinstalling OS |
[production] |
22:14 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
22:12 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
21:41 |
<ryankemper> |
Disabled elasticsearch "saneitizer" systemd timer in eqiad due to checker jobs falling behind: `sudo systemctl disable mediawiki_job_cirrus_sanitize_jobs.timer` on `mwmaint1002` |
[production] |
21:22 |
<herron@cumin1001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) |
[production] |
21:05 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
21:05 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
20:50 |
<herron@cumin1001> |
START - Cookbook sre.ganeti.makevm |
[production] |
20:22 |
<ladsgroup@deploy1001> |
Synchronized static/images/project-logos: Changing logo of Wikidata for the brithday (duration: 00m 58s) |
[production] |
19:56 |
<jgleeson> |
updated Smashpig from 2246685626 to 09f29c1da5 |
[production] |
19:53 |
<herron@cumin1001> |
END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) |
[production] |
19:53 |
<herron@cumin1001> |
START - Cookbook sre.ganeti.makevm |
[production] |
19:50 |
<herron@cumin1001> |
END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) |
[production] |
19:36 |
<herron@cumin1001> |
START - Cookbook sre.ganeti.makevm |
[production] |
19:36 |
<herron@cumin1001> |
END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) |
[production] |
19:36 |
<herron@cumin1001> |
START - Cookbook sre.ganeti.makevm |
[production] |
19:30 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
19:30 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
19:22 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
19:20 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
18:56 |
<tgr_> |
Morning deploys done |
[production] |
18:55 |
<tgr@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:636983|Temporary enable 'editpage' warn logging (T251023)]] (duration: 00m 57s) |
[production] |
18:51 |
<dzahn@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
18:51 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
18:47 |
<volans@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
18:46 |
<tgr@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:636791|Revert "cirrus: Hardcode more_like to codfw cirrus cluster"]] (duration: 00m 56s) |
[production] |
18:45 |
<tgr@deploy1001> |
Synchronized wmf-config/PoolCounterSettings.php: Config: [[gerrit:636956|Revert "Revert "Increase cirrus morelike pool counter by 20%"" ()]] (duration: 00m 57s) |
[production] |
18:43 |
<volans@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
18:40 |
<tgr@deploy1001> |
Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: [[gerrit:636787|Suggested edits: Include page ID with task preview data (T266600)]] (duration: 00m 59s) |
[production] |
18:19 |
<tgr@deploy1001> |
Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:619880|Removing obsolete license definition]] (duration: 01m 00s) |
[production] |
18:11 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
18:07 |
<cmjohnson@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
18:06 |
<cmjohnson@cumin1001> |
END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) |
[production] |
18:02 |
<elukey@cumin1001> |
END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) |
[production] |
17:46 |
<elukey@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |