251-300 of 10000 results (39ms)
2022-03-22 §
07:49 <elukey> restart php-fpm on mw1448 - high cpu usage right after yesterday's deployment at 21 UTC [production]
07:47 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22925 and previous config saved to /var/cache/conftool/dbconfig/20220322-074748-marostegui.json [production]
07:47 <elukey> depool mw1448 manually on the node (high cpu usage from php-fpm) [production]
07:32 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298557)', diff saved to https://phabricator.wikimedia.org/P22924 and previous config saved to /var/cache/conftool/dbconfig/20220322-073243-marostegui.json [production]
07:26 <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: 8151bf2: Allow flooders to remove the group from themselves in viwiki (T303578) (duration: 00m 50s) [production]
07:21 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1007.eqiad.wmnet with OS bullseye [production]
07:17 <urbanecm@deploy1002> Synchronized wmf-config/CommonSettings.php: caad5a4df35c0daa5fd3423e4abf5aa4d5c38a7a: wgCrossSiteAJAXdomains: Add foundationwiki and {ee,ge,punjabi}wikimedia (T300978) (duration: 00m 49s) [production]
07:14 <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: b4a9935: Create "editautopatrolprotected" protection level for viwiki (T303579) (duration: 00m 57s) [production]
07:08 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: host reimage [production]
07:06 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: host reimage [production]
06:54 <elukey@cumin1001> START - Cookbook sre.hosts.reimage for host kubernetes1007.eqiad.wmnet with OS bullseye [production]
06:42 <marostegui@cumin1001> dbctl commit (dc=all): 'Depooling db1142 (T300775)', diff saved to https://phabricator.wikimedia.org/P22923 and previous config saved to /var/cache/conftool/dbconfig/20220322-064230-marostegui.json [production]
06:42 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance [production]
06:42 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance [production]
06:42 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300775)', diff saved to https://phabricator.wikimedia.org/P22922 and previous config saved to /var/cache/conftool/dbconfig/20220322-064222-marostegui.json [production]
06:32 <marostegui@cumin1001> dbctl commit (dc=all): 'Depooling db1127 (T298557)', diff saved to https://phabricator.wikimedia.org/P22921 and previous config saved to /var/cache/conftool/dbconfig/20220322-063223-marostegui.json [production]
06:32 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance [production]
06:32 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance [production]
06:27 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22920 and previous config saved to /var/cache/conftool/dbconfig/20220322-062717-marostegui.json [production]
06:23 <marostegui@cumin1001> dbctl commit (dc=all): 'Add db1132 to s1 with minimal weight T301879', diff saved to https://phabricator.wikimedia.org/P22919 and previous config saved to /var/cache/conftool/dbconfig/20220322-062310-marostegui.json [production]
06:21 <marostegui@cumin1001> dbctl commit (dc=all): 'Add db1132 to dbctl T301879', diff saved to https://phabricator.wikimedia.org/P22918 and previous config saved to /var/cache/conftool/dbconfig/20220322-062140-marostegui.json [production]
06:12 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1175.eqiad.wmnet with OS bullseye [production]
06:12 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22917 and previous config saved to /var/cache/conftool/dbconfig/20220322-061212-marostegui.json [production]
05:57 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300775)', diff saved to https://phabricator.wikimedia.org/P22916 and previous config saved to /var/cache/conftool/dbconfig/20220322-055707-marostegui.json [production]
05:56 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1175.eqiad.wmnet with reason: host reimage [production]
05:53 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on db1175.eqiad.wmnet with reason: host reimage [production]
05:43 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance [production]
05:43 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance [production]
05:41 <marostegui@cumin1001> START - Cookbook sre.hosts.reimage for host db1175.eqiad.wmnet with OS bullseye [production]
03:47 <eileen> civicrm revision changed from 457adec4 to b6ceb722 [production]
02:56 <eileen> civicrm revision changed from 30c55f51 to 457adec4 [production]
02:56 <eileen> revision changed from 30c55f51 to 457adec4 [production]
02:16 <pt1979@cumin1001> START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye [production]
02:03 <cmjohnson@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye [production]
01:35 <cmjohnson@cumin1001> START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye [production]
00:35 <pt1979@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye [production]
2022-03-21 §
23:52 <eileen> civicrm revision changed from 52c45874 to 30c55f51 [production]
22:29 <ryankemper> T301955 Lifted downtime on relforge now that cluster upgrade is complete and cluster is back to green status [production]
22:26 <bking@cumin1001> END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955 [production]
22:04 <reedy@deploy1002> Synchronized php-1.39.0-wmf.2/extensions/OATHAuth/: T304350 (duration: 00m 49s) [production]
22:03 <reedy@deploy1002> Synchronized php-1.39.0-wmf.1/extensions/OATHAuth/: T304350 (duration: 00m 49s) [production]
21:59 <ryankemper> T301955 Downtimed relforge for 2 days; stuck in yellow status during upgrade b/c replica shards cannot be scheduled to a host of lower elasticsearch version than primary shards. Working on patch for our `rolling-operation` cookbook to disable replication during operation [production]
21:46 <rzl@deploy1002> helmfile [eqiad] DONE helmfile.d/services/zotero: apply [production]
21:46 <rzl@deploy1002> helmfile [eqiad] START helmfile.d/services/zotero: apply [production]
21:46 <rzl@deploy1002> helmfile [eqiad] DONE helmfile.d/services/toolhub: apply [production]
21:45 <bking@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955 [production]
21:45 <rzl@deploy1002> helmfile [eqiad] START helmfile.d/services/toolhub: apply [production]
21:45 <rzl@deploy1002> helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply [production]
21:44 <rzl@deploy1002> helmfile [eqiad] START helmfile.d/services/shellbox-media: apply [production]
21:44 <rzl@deploy1002> helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply [production]