2022-03-22
§
|
07:49 |
<elukey> |
restart php-fpm on mw1448 - high cpu usage right after yesterday's deployment at 21 UTC |
[production] |
07:47 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22925 and previous config saved to /var/cache/conftool/dbconfig/20220322-074748-marostegui.json |
[production] |
07:47 |
<elukey> |
depool mw1448 manually on the node (high cpu usage from php-fpm) |
[production] |
07:32 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298557)', diff saved to https://phabricator.wikimedia.org/P22924 and previous config saved to /var/cache/conftool/dbconfig/20220322-073243-marostegui.json |
[production] |
07:26 |
<urbanecm@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: 8151bf2: Allow flooders to remove the group from themselves in viwiki (T303578) (duration: 00m 50s) |
[production] |
07:21 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1007.eqiad.wmnet with OS bullseye |
[production] |
07:17 |
<urbanecm@deploy1002> |
Synchronized wmf-config/CommonSettings.php: caad5a4df35c0daa5fd3423e4abf5aa4d5c38a7a: wgCrossSiteAJAXdomains: Add foundationwiki and {ee,ge,punjabi}wikimedia (T300978) (duration: 00m 49s) |
[production] |
07:14 |
<urbanecm@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: b4a9935: Create "editautopatrolprotected" protection level for viwiki (T303579) (duration: 00m 57s) |
[production] |
07:08 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: host reimage |
[production] |
07:06 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: host reimage |
[production] |
06:54 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.reimage for host kubernetes1007.eqiad.wmnet with OS bullseye |
[production] |
06:42 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db1142 (T300775)', diff saved to https://phabricator.wikimedia.org/P22923 and previous config saved to /var/cache/conftool/dbconfig/20220322-064230-marostegui.json |
[production] |
06:42 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance |
[production] |
06:42 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance |
[production] |
06:42 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300775)', diff saved to https://phabricator.wikimedia.org/P22922 and previous config saved to /var/cache/conftool/dbconfig/20220322-064222-marostegui.json |
[production] |
06:32 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db1127 (T298557)', diff saved to https://phabricator.wikimedia.org/P22921 and previous config saved to /var/cache/conftool/dbconfig/20220322-063223-marostegui.json |
[production] |
06:32 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance |
[production] |
06:32 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance |
[production] |
06:27 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22920 and previous config saved to /var/cache/conftool/dbconfig/20220322-062717-marostegui.json |
[production] |
06:23 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Add db1132 to s1 with minimal weight T301879', diff saved to https://phabricator.wikimedia.org/P22919 and previous config saved to /var/cache/conftool/dbconfig/20220322-062310-marostegui.json |
[production] |
06:21 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Add db1132 to dbctl T301879', diff saved to https://phabricator.wikimedia.org/P22918 and previous config saved to /var/cache/conftool/dbconfig/20220322-062140-marostegui.json |
[production] |
06:12 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1175.eqiad.wmnet with OS bullseye |
[production] |
06:12 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22917 and previous config saved to /var/cache/conftool/dbconfig/20220322-061212-marostegui.json |
[production] |
05:57 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300775)', diff saved to https://phabricator.wikimedia.org/P22916 and previous config saved to /var/cache/conftool/dbconfig/20220322-055707-marostegui.json |
[production] |
05:56 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1175.eqiad.wmnet with reason: host reimage |
[production] |
05:53 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on db1175.eqiad.wmnet with reason: host reimage |
[production] |
05:43 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance |
[production] |
05:43 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance |
[production] |
05:41 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.reimage for host db1175.eqiad.wmnet with OS bullseye |
[production] |
03:47 |
<eileen> |
civicrm revision changed from 457adec4 to b6ceb722 |
[production] |
02:56 |
<eileen> |
civicrm revision changed from 30c55f51 to 457adec4 |
[production] |
02:56 |
<eileen> |
revision changed from 30c55f51 to 457adec4 |
[production] |
02:16 |
<pt1979@cumin1001> |
START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye |
[production] |
02:03 |
<cmjohnson@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye |
[production] |
01:35 |
<cmjohnson@cumin1001> |
START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye |
[production] |
00:35 |
<pt1979@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye |
[production] |
2022-03-21
§
|
23:52 |
<eileen> |
civicrm revision changed from 52c45874 to 30c55f51 |
[production] |
22:29 |
<ryankemper> |
T301955 Lifted downtime on relforge now that cluster upgrade is complete and cluster is back to green status |
[production] |
22:26 |
<bking@cumin1001> |
END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955 |
[production] |
22:04 |
<reedy@deploy1002> |
Synchronized php-1.39.0-wmf.2/extensions/OATHAuth/: T304350 (duration: 00m 49s) |
[production] |
22:03 |
<reedy@deploy1002> |
Synchronized php-1.39.0-wmf.1/extensions/OATHAuth/: T304350 (duration: 00m 49s) |
[production] |
21:59 |
<ryankemper> |
T301955 Downtimed relforge for 2 days; stuck in yellow status during upgrade b/c replica shards cannot be scheduled to a host of lower elasticsearch version than primary shards. Working on patch for our `rolling-operation` cookbook to disable replication during operation |
[production] |
21:46 |
<rzl@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/zotero: apply |
[production] |
21:46 |
<rzl@deploy1002> |
helmfile [eqiad] START helmfile.d/services/zotero: apply |
[production] |
21:46 |
<rzl@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/toolhub: apply |
[production] |
21:45 |
<bking@cumin1001> |
START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955 |
[production] |
21:45 |
<rzl@deploy1002> |
helmfile [eqiad] START helmfile.d/services/toolhub: apply |
[production] |
21:45 |
<rzl@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply |
[production] |
21:44 |
<rzl@deploy1002> |
helmfile [eqiad] START helmfile.d/services/shellbox-media: apply |
[production] |
21:44 |
<rzl@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply |
[production] |