3651-3700 of 10000 results (95ms)
2022-10-11 ยง
13:01 <jgiannelos@deploy1002> helmfile [codfw] START helmfile.d/services/mobileapps: apply [production]
13:01 <jgiannelos@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply [production]
13:00 <jgiannelos@deploy1002> helmfile [eqiad] START helmfile.d/services/mobileapps: apply [production]
12:59 <jgiannelos@deploy1002> helmfile [staging] DONE helmfile.d/services/mobileapps: apply [production]
12:58 <jgiannelos@deploy1002> helmfile [staging] START helmfile.d/services/mobileapps: apply [production]
12:46 <vgutierrez> partitioning the ATS cache in cp[2035-2036], cp[6004,6012], cp[1083-1084], cp[5005,5011], cp[3058-3059], cp[4025,4029] - T317748 [production]
12:39 <volans@cumin2002> START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED [production]
12:05 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db2110 (T314041)', diff saved to https://phabricator.wikimedia.org/P35397 and previous config saved to /var/cache/conftool/dbconfig/20221011-120514-ladsgroup.json [production]
11:50 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P35396 and previous config saved to /var/cache/conftool/dbconfig/20221011-115007-ladsgroup.json [production]
11:35 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P35395 and previous config saved to /var/cache/conftool/dbconfig/20221011-113501-ladsgroup.json [production]
11:27 <jmm@cumin2002> END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1032.eqiad.wmnet to cluster eqiad and group A [production]
11:26 <jmm@cumin2002> START - Cookbook sre.ganeti.addnode for new host ganeti1032.eqiad.wmnet to cluster eqiad and group A [production]
11:19 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db2110 (T314041)', diff saved to https://phabricator.wikimedia.org/P35394 and previous config saved to /var/cache/conftool/dbconfig/20221011-111954-ladsgroup.json [production]
11:19 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet [production]
11:13 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: apply [production]
11:12 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply [production]
11:12 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply [production]
11:11 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply [production]
11:10 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet [production]
10:41 <volans@cumin2002> END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED [production]
10:13 <volans@cumin2002> START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED [production]
10:12 <volans@cumin2002> END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED [production]
10:08 <volans@cumin2002> START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED [production]
10:07 <volans@cumin2002> END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED [production]
10:06 <volans@cumin2002> START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED [production]
10:02 <volans@cumin2002> END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED [production]
09:57 <volans@cumin2002> START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED [production]
09:44 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti1006.eqiad.wmnet with reason: Remove from cluster for decom [production]
09:44 <jmm@cumin2002> START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti1006.eqiad.wmnet with reason: Remove from cluster for decom [production]
08:53 <vgutierrez> partitioning the ATS cache in cp1085, cp1086, cp2037, cp2038, cp3060, cp3061, cp4026, cp4030, cp5006, cp5012, cp6005, cp6013 - T317748 [production]
08:37 <jmm@cumin2002> END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti4008.ulsfo.wmnet [production]
07:41 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . [production]
07:40 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . [production]
07:31 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [production]
07:30 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [production]
07:24 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . [production]
07:22 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . [production]
07:21 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . [production]
07:21 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . [production]
07:18 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . [production]
07:18 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . [production]
07:17 <ryankemper> [Elastic] Forcing recheck of elastic settings check alerts; expecting a bit of noise as the alerts resolve (hopefully) [production]
07:17 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet [production]
07:17 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . [production]
07:16 <ryankemper> [Elastic] Updated cross-cluster remote seeds (masters): `ryankemper@mwmaint1002:~/elastic$ python push_cross_cluster_conf.py https://search.svc.eqiad.wmnet:9[2,4,6]43/_cluster/settings --ccc chi=chi_eqiad_masters.lst psi=psi_eqiad_masters.lst omega=omega_eqiad_masters.lst` [production]
07:15 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . [production]
07:12 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . [production]
07:11 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . [production]
07:09 <kartik@deploy1002> Finished scap: Backport for [[gerrit:839411|ContentTranslation: Make Mongolian Wikipedia MT stricter by 10% (T319156)]] (duration: 08m 56s) [production]
07:01 <kartik@deploy1002> kartik and kartik: Backport for [[gerrit:839411|ContentTranslation: Make Mongolian Wikipedia MT stricter by 10% (T319156)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet [production]