production SAL

251-300 of 10000 results (37ms)

2022-03-22 §
07:49	<elukey>	restart php-fpm on mw1448 - high cpu usage right after yesterday's deployment at 21 UTC	[production]
07:47	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22925 and previous config saved to /var/cache/conftool/dbconfig/20220322-074748-marostegui.json	[production]
07:47	<elukey>	depool mw1448 manually on the node (high cpu usage from php-fpm)	[production]
07:32	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298557)', diff saved to https://phabricator.wikimedia.org/P22924 and previous config saved to /var/cache/conftool/dbconfig/20220322-073243-marostegui.json	[production]
07:26	<urbanecm@deploy1002>	Synchronized wmf-config/InitialiseSettings.php: 8151bf2: Allow flooders to remove the group from themselves in viwiki (T303578) (duration: 00m 50s)	[production]
07:21	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1007.eqiad.wmnet with OS bullseye	[production]
07:17	<urbanecm@deploy1002>	Synchronized wmf-config/CommonSettings.php: caad5a4df35c0daa5fd3423e4abf5aa4d5c38a7a: wgCrossSiteAJAXdomains: Add foundationwiki and {ee,ge,punjabi}wikimedia (T300978) (duration: 00m 49s)	[production]
07:14	<urbanecm@deploy1002>	Synchronized wmf-config/InitialiseSettings.php: b4a9935: Create "editautopatrolprotected" protection level for viwiki (T303579) (duration: 00m 57s)	[production]
07:08	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: host reimage	[production]
07:06	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: host reimage	[production]
06:54	<elukey@cumin1001>	START - Cookbook sre.hosts.reimage for host kubernetes1007.eqiad.wmnet with OS bullseye	[production]
06:42	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depooling db1142 (T300775)', diff saved to https://phabricator.wikimedia.org/P22923 and previous config saved to /var/cache/conftool/dbconfig/20220322-064230-marostegui.json	[production]
06:42	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance	[production]
06:42	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance	[production]
06:42	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300775)', diff saved to https://phabricator.wikimedia.org/P22922 and previous config saved to /var/cache/conftool/dbconfig/20220322-064222-marostegui.json	[production]
06:32	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depooling db1127 (T298557)', diff saved to https://phabricator.wikimedia.org/P22921 and previous config saved to /var/cache/conftool/dbconfig/20220322-063223-marostegui.json	[production]
06:32	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance	[production]
06:32	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance	[production]
06:27	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22920 and previous config saved to /var/cache/conftool/dbconfig/20220322-062717-marostegui.json	[production]
06:23	<marostegui@cumin1001>	dbctl commit (dc=all): 'Add db1132 to s1 with minimal weight T301879', diff saved to https://phabricator.wikimedia.org/P22919 and previous config saved to /var/cache/conftool/dbconfig/20220322-062310-marostegui.json	[production]
06:21	<marostegui@cumin1001>	dbctl commit (dc=all): 'Add db1132 to dbctl T301879', diff saved to https://phabricator.wikimedia.org/P22918 and previous config saved to /var/cache/conftool/dbconfig/20220322-062140-marostegui.json	[production]
06:12	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1175.eqiad.wmnet with OS bullseye	[production]
06:12	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22917 and previous config saved to /var/cache/conftool/dbconfig/20220322-061212-marostegui.json	[production]
05:57	<marostegui@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300775)', diff saved to https://phabricator.wikimedia.org/P22916 and previous config saved to /var/cache/conftool/dbconfig/20220322-055707-marostegui.json	[production]
05:56	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1175.eqiad.wmnet with reason: host reimage	[production]
05:53	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on db1175.eqiad.wmnet with reason: host reimage	[production]
05:43	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance	[production]
05:43	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance	[production]
05:41	<marostegui@cumin1001>	START - Cookbook sre.hosts.reimage for host db1175.eqiad.wmnet with OS bullseye	[production]
03:47	<eileen>	civicrm revision changed from 457adec4 to b6ceb722	[production]
02:56	<eileen>	civicrm revision changed from 30c55f51 to 457adec4	[production]
02:56	<eileen>	revision changed from 30c55f51 to 457adec4	[production]
02:16	<pt1979@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye	[production]
02:03	<cmjohnson@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye	[production]
01:35	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye	[production]
00:35	<pt1979@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye	[production]
2022-03-21 §
23:52	<eileen>	civicrm revision changed from 52c45874 to 30c55f51	[production]
22:29	<ryankemper>	T301955 Lifted downtime on relforge now that cluster upgrade is complete and cluster is back to green status	[production]
22:26	<bking@cumin1001>	END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955	[production]
22:04	<reedy@deploy1002>	Synchronized php-1.39.0-wmf.2/extensions/OATHAuth/: T304350 (duration: 00m 49s)	[production]
22:03	<reedy@deploy1002>	Synchronized php-1.39.0-wmf.1/extensions/OATHAuth/: T304350 (duration: 00m 49s)	[production]
21:59	<ryankemper>	T301955 Downtimed relforge for 2 days; stuck in yellow status during upgrade b/c replica shards cannot be scheduled to a host of lower elasticsearch version than primary shards. Working on patch for our `rolling-operation` cookbook to disable replication during operation	[production]
21:46	<rzl@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/zotero: apply	[production]
21:46	<rzl@deploy1002>	helmfile [eqiad] START helmfile.d/services/zotero: apply	[production]
21:46	<rzl@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/toolhub: apply	[production]
21:45	<bking@cumin1001>	START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955	[production]
21:45	<rzl@deploy1002>	helmfile [eqiad] START helmfile.d/services/toolhub: apply	[production]
21:45	<rzl@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply	[production]
21:44	<rzl@deploy1002>	helmfile [eqiad] START helmfile.d/services/shellbox-media: apply	[production]
21:44	<rzl@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply	[production]