4451-4500 of 10000 results (53ms)
2022-02-08 ยง
23:21 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host mc2055.codfw.wmnet with OS buster [production]
23:20 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2053.codfw.wmnet with OS buster [production]
23:17 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host mc2054.codfw.wmnet with OS buster [production]
23:12 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2052.codfw.wmnet with OS buster [production]
22:50 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host mc2053.codfw.wmnet with OS buster [production]
22:44 <dzahn@deploy1002> helmfile [staging] DONE helmfile.d/services/miscweb: sync on main [production]
22:42 <dzahn@deploy1002> helmfile [staging] START helmfile.d/services/miscweb: apply on main [production]
22:41 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host mc2052.codfw.wmnet with OS buster [production]
22:15 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300402)', diff saved to https://phabricator.wikimedia.org/P20402 and previous config saved to /var/cache/conftool/dbconfig/20220208-221545-marostegui.json [production]
22:12 <topranks> doing planned 1-by-1 shutdown of ports xe-0/1/1, xe-0/1/2 and xe-0/1/9 on cr2-esams, to test reliability of each following user reports of issues at AMS-IX. [production]
22:00 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P20401 and previous config saved to /var/cache/conftool/dbconfig/20220208-220041-marostegui.json [production]
21:59 <ryankemper> T294805 elastic10[68-83] erroneously weren't in pybal, added them just now: `sudo confctl select 'cluster=elasticsearch' set/pooled=yes:weight=10` (there's no hosts in the `conftool-data` list that we want depooled so we're okay setting all to pooled w/ equal weight) [production]
21:59 <ryankemper@puppetmaster1001> conftool action : set/pooled=yes:weight=10; selector: cluster=elasticsearch [production]
21:58 <ryankemper@puppetmaster1001> conftool action : set/pooled=yes:weight=10; selector: cluster=elasticsearch,name=elastic1* [production]
21:53 <ryankemper@puppetmaster1001> conftool action : GET; selector: service=search [production]
21:52 <ryankemper@puppetmaster1001> conftool action : GET; selector: service=search [production]
21:47 <ryankemper> [Elastic] `ryankemper@elastic1081:~$ sudo systemctl restart elasticsearch_6*psi*` (9600 but not 9200 seemed to be having connectivity issues) [production]
21:45 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P20400 and previous config saved to /var/cache/conftool/dbconfig/20220208-214536-marostegui.json [production]
21:30 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300402)', diff saved to https://phabricator.wikimedia.org/P20399 and previous config saved to /var/cache/conftool/dbconfig/20220208-213031-marostegui.json [production]
21:26 <marostegui@cumin1001> dbctl commit (dc=all): 'Depooling db1164 (T300402)', diff saved to https://phabricator.wikimedia.org/P20398 and previous config saved to /var/cache/conftool/dbconfig/20220208-212558-marostegui.json [production]
21:25 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance [production]
21:25 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance [production]
21:25 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300402)', diff saved to https://phabricator.wikimedia.org/P20397 and previous config saved to /var/cache/conftool/dbconfig/20220208-212550-marostegui.json [production]
21:10 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P20396 and previous config saved to /var/cache/conftool/dbconfig/20220208-211046-marostegui.json [production]
20:56 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn [production]
20:55 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P20395 and previous config saved to /var/cache/conftool/dbconfig/20220208-205541-marostegui.json [production]
20:54 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance [production]
20:54 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance [production]
20:52 <jhuneidi@deploy1002> Finished scap: sync again in attempt to deploy 1.38.0-wmf.21 to group0 (duration: 16m 17s) [production]
20:50 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn [production]
20:49 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn [production]
20:43 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2051.codfw.wmnet with OS buster [production]
20:43 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn [production]
20:40 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300402)', diff saved to https://phabricator.wikimedia.org/P20394 and previous config saved to /var/cache/conftool/dbconfig/20220208-204036-marostegui.json [production]
20:36 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298554)', diff saved to https://phabricator.wikimedia.org/P20393 and previous config saved to /var/cache/conftool/dbconfig/20220208-203634-ladsgroup.json [production]
20:36 <jhuneidi@deploy1002> Started scap: sync again in attempt to deploy 1.38.0-wmf.21 to group0 [production]
20:35 <marostegui@cumin1001> dbctl commit (dc=all): 'Depooling db1105:3311 (T300402)', diff saved to https://phabricator.wikimedia.org/P20392 and previous config saved to /var/cache/conftool/dbconfig/20220208-203529-marostegui.json [production]
20:35 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance [production]
20:35 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance [production]
20:35 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300402)', diff saved to https://phabricator.wikimedia.org/P20391 and previous config saved to /var/cache/conftool/dbconfig/20220208-203521-marostegui.json [production]
20:33 <ryankemper> T294805 Banned `elastic10[32-47]` from main, omega, and psi elasticsearch clusters. Shards are relocating on main and omega clusters as expected, but they don't seem to be moving on psi. Investigating that currently. Might have to do with row allocation constraints, but unsure currently [production]
20:28 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2050.codfw.wmnet with OS buster [production]
20:22 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn [production]
20:21 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P20390 and previous config saved to /var/cache/conftool/dbconfig/20220208-202127-ladsgroup.json [production]
20:20 <marostegui@cumin1001> dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P20389 and previous config saved to /var/cache/conftool/dbconfig/20220208-202016-marostegui.json [production]
20:19 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn [production]
20:18 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn [production]
20:17 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn [production]
20:17 <jhuneidi@deploy1002> rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.21 refs T300197 [production]
20:14 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host mc2051.codfw.wmnet with OS buster [production]