production SAL

401-450 of 10000 results (54ms)

2022-05-26 §
07:55	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance	[production]
07:44	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298555)', diff saved to https://phabricator.wikimedia.org/P28577 and previous config saved to /var/cache/conftool/dbconfig/20220526-074436-ladsgroup.json	[production]
07:31	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: apply	[production]
07:30	<mwdebug-deploy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply	[production]
07:30	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply	[production]
07:29	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
07:25	<ariel@deploy1002>	Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:793799\|Add namespaces to Punjabi wikisource default search (T287887)]] (duration: 00m 50s)	[production]
07:24	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: apply	[production]
07:23	<elukey@deploy1002>	helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .	[production]
07:20	<mwdebug-deploy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply	[production]
07:20	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply	[production]
07:19	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
07:18	<elukey@deploy1002>	helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .	[production]
07:15	<ariel@deploy1002>	Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:796385\|Enable Realtime Preview on more pilot wikis: huwiki and fiwiki (T303961)]] (duration: 00m 51s)	[production]
07:14	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: apply	[production]
07:13	<mwdebug-deploy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply	[production]
07:13	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply	[production]
07:12	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
06:15	<kart_>	Updated cxserver to 2022-05-26-052433-production (T309161, T308829, T308834)	[production]
06:13	<kartik@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/cxserver: apply	[production]
06:12	<kartik@deploy1002>	helmfile [eqiad] START helmfile.d/services/cxserver: apply	[production]
06:11	<kartik@deploy1002>	helmfile [codfw] DONE helmfile.d/services/cxserver: apply	[production]
06:10	<kartik@deploy1002>	helmfile [codfw] START helmfile.d/services/cxserver: apply	[production]
06:07	<kartik@deploy1002>	helmfile [staging] DONE helmfile.d/services/cxserver: apply	[production]
06:06	<kartik@deploy1002>	helmfile [staging] START helmfile.d/services/cxserver: apply	[production]
05:31	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1111', diff saved to https://phabricator.wikimedia.org/P28576 and previous config saved to /var/cache/conftool/dbconfig/20220526-053155-marostegui.json	[production]
05:16	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Depooling db1099:3311 (T298555)', diff saved to https://phabricator.wikimedia.org/P28575 and previous config saved to /var/cache/conftool/dbconfig/20220526-051649-ladsgroup.json	[production]
05:16	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1099.eqiad.wmnet with reason: Maintenance	[production]
05:16	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 10:00:00 on db1099.eqiad.wmnet with reason: Maintenance	[production]
05:16	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298555)', diff saved to https://phabricator.wikimedia.org/P28574 and previous config saved to /var/cache/conftool/dbconfig/20220526-051641-ladsgroup.json	[production]
05:01	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P28573 and previous config saved to /var/cache/conftool/dbconfig/20220526-050136-ladsgroup.json	[production]
04:31	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298555)', diff saved to https://phabricator.wikimedia.org/P28571 and previous config saved to /var/cache/conftool/dbconfig/20220526-043126-ladsgroup.json	[production]
02:23	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Depooling db1119 (T298555)', diff saved to https://phabricator.wikimedia.org/P28570 and previous config saved to /var/cache/conftool/dbconfig/20220526-022307-ladsgroup.json	[production]
02:23	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1119.eqiad.wmnet with reason: Maintenance	[production]
02:23	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 10:00:00 on db1119.eqiad.wmnet with reason: Maintenance	[production]
02:23	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298555)', diff saved to https://phabricator.wikimedia.org/P28569 and previous config saved to /var/cache/conftool/dbconfig/20220526-022259-ladsgroup.json	[production]
02:07	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P28568 and previous config saved to /var/cache/conftool/dbconfig/20220526-020752-ladsgroup.json	[production]
01:52	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P28567 and previous config saved to /var/cache/conftool/dbconfig/20220526-015247-ladsgroup.json	[production]
01:51	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance	[production]
01:51	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance	[production]
01:46	<mutante>	T308089 T274463 - gitlab1001 - still not enough disk space to finish full backup. moved backup of May 24th to /root/ . deleted latest.tar; started full-backup service once again	[production]
01:37	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298555)', diff saved to https://phabricator.wikimedia.org/P28566 and previous config saved to /var/cache/conftool/dbconfig/20220526-013741-ladsgroup.json	[production]
01:27	<mutante>	T308089 T274463 - gitlab1001 - systemctl start rsync-config-backup-gitlab1003.wikimedia.org - Suceeded - RECOVERY - Check systemd state on gitlab1001 is OK	[production]
01:20	<mutante>	gitlab1003 - T308089 T274463 - gitlab1001 - deleted backups from April 4 and April 5 from /srv/gitlab-backup AND deleted partial failed backups from May 26 from /mnt/gitlab-backup; deployed both gerrit:799016 and gerrit:799280 ; restarting full-backup service	[production]
01:01	<mutante>	gitlab1003 - T308089 T274463 - gitlab1003 - systemctl status backup-restore is failed because it's looking for /mnt/gitlab-backup/latest/latest.tar needs gerrit:799016	[production]
00:58	<mutante>	gitlab1001 - T308089 T274463 - gitlab1001 - systemctl start full-backup	[production]
00:56	<mutante>	gitlab1001 - T308089 T274463 - '<+icinga-wm> PROBLEM - Disk space on gitlab1001 is CRITICAL: DISK CRITICAL - free space: /mnt/gitlab-backup 0 MB' - manually deleted 1653294190_2022_05_23_14.10.2_gitlab_backup.tar (we have May 24 and 25, 26 could not finish writing backup) - RECOVERY - Disk space on gitlab1001 is OK	[production]
2022-05-25 §
23:35	<bd808@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply	[production]
23:35	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Depooling db1184 (T298555)', diff saved to https://phabricator.wikimedia.org/P28563 and previous config saved to /var/cache/conftool/dbconfig/20220525-233520-ladsgroup.json	[production]
23:35	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1184.eqiad.wmnet with reason: Maintenance	[production]