2022-01-19
ยง
|
17:45 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
17:45 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn |
[production] |
17:44 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
17:42 |
<taavi@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:754998|Drop CentralAuthUserMerge log channel (T216089)]] (duration: 01m 05s) |
[production] |
17:36 |
<mutante> |
- added brennen, aokoth and jelto as users and projectadmins (T297411) |
[devtools] |
17:36 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS buster |
[production] |
17:36 |
<andrewbogott> |
rebooting wcqs-beta-01.wikidata-query.eqiad1.wikimedia.cloud to recover from (presumed) fallout from the scratch/nfs move |
[wikidata-query] |
17:35 |
<hnowlan@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2016.codfw.wmnet with OS buster |
[production] |
17:34 |
<andrewbogott> |
rebooting tools-sgeexec-0913.tools.eqiad1.wikimedia.cloud to recover from (presumed) fallout from the scratch/nfs move |
[tools] |
17:33 |
<andrewbogott> |
rebooting maps-wmanew to recover from (presumed) fallout from the scratch/nfs move |
[maps] |
17:31 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P18886 and previous config saved to /var/cache/conftool/dbconfig/20220119-173145-ladsgroup.json |
[production] |
17:31 |
<hashar> |
Adding https://integration.wikimedia.org/ci/computer/contint1001/ back to the pool after the machine got powercycled # T299542 |
[releng] |
17:26 |
<_joe_> |
powercycling contint1001 via ipmi, T299542 |
[production] |
17:26 |
<Joan> |
Restarted CVNBot3 (Last message was received on RCReader 28129.031916 seconds ago) |
[cvn] |
17:25 |
<cmjohnson1> |
updating firmware, ganeti1018 T299527 |
[production] |
17:19 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS buster |
[production] |
17:18 |
<hnowlan@cumin1001> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2016.codfw.wmnet with OS buster |
[production] |
17:16 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1100 (T239814)', diff saved to https://phabricator.wikimedia.org/P18885 and previous config saved to /var/cache/conftool/dbconfig/20220119-171640-ladsgroup.json |
[production] |
16:59 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn |
[production] |
16:58 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
16:58 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn |
[production] |
16:56 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
16:56 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS buster |
[production] |
16:54 |
<hnowlan@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=restbase2015.codfw.wmnet |
[production] |
16:54 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2015.codfw.wmnet with OS buster |
[production] |
16:48 |
<elukey@deploy1002> |
helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. |
[production] |
16:47 |
<elukey@deploy1002> |
helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. |
[production] |
16:46 |
<elukey@deploy1002> |
helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. |
[production] |
16:46 |
<elukey@deploy1002> |
helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. |
[production] |
16:46 |
<elukey@deploy1002> |
helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. |
[production] |
16:44 |
<elukey@deploy1002> |
helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. |
[production] |
16:38 |
<andrewbogott> |
moving all scratch mounts to scratch.svc.cloudinfra-nfs.eqiad1.wikimedia.cloud |
[admin] |
16:36 |
<hashar> |
marking contint1001.wikimedia.org as offline in Jenkins since it is dramatically overloaded T299542 |
[production] |
16:33 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. |
[production] |
16:32 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. |
[production] |
16:27 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T285149)', diff saved to https://phabricator.wikimedia.org/P18883 and previous config saved to /var/cache/conftool/dbconfig/20220119-162717-marostegui.json |
[production] |
16:12 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P18882 and previous config saved to /var/cache/conftool/dbconfig/20220119-161212-marostegui.json |
[production] |
16:01 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.reimage for host restbase2015.codfw.wmnet with OS buster |
[production] |
16:00 |
<hnowlan@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=restbase201[134].codfw.wmnet |
[production] |
15:58 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2014.codfw.wmnet with OS buster |
[production] |
15:57 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P18881 and previous config saved to /var/cache/conftool/dbconfig/20220119-155706-marostegui.json |
[production] |
15:54 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. |
[production] |
15:54 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. |
[production] |
15:48 |
<moritzm> |
installing tiff security updates on stretch |
[production] |
15:44 |
<ottomata> |
installing anaconda-wmf_2020.02~wmf6_amd64.deb on all analytics cluster nodes. - T292699 |
[analytics] |
15:42 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T285149)', diff saved to https://phabricator.wikimedia.org/P18879 and previous config saved to /var/cache/conftool/dbconfig/20220119-154201-marostegui.json |
[production] |
15:40 |
<mmandere> |
cp5005,cp4025: upgrade varnish to 6.0.9 T298758 |
[production] |
15:40 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db1146:3312 (T285149)', diff saved to https://phabricator.wikimedia.org/P18878 and previous config saved to /var/cache/conftool/dbconfig/20220119-154046-marostegui.json |
[production] |
15:40 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance |
[production] |
15:40 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance |
[production] |