2023-04-04
ยง
|
14:28 |
<vgutierrez> |
switch cp6008 (upload) and cp6016 (text) to use a single UDS socket between haproxy and varnish - T333965 |
[production] |
14:21 |
<jynus> |
stop es1022 for debugging T333961 |
[production] |
14:15 |
<Lucas_WMDE> |
UTC afternoon backport+config window done |
[production] |
14:15 |
<lucaswerkmeister-wmde@deploy2002> |
Finished scap: Backport for [[gerrit:905598|Use HookContainer to register hooks inside hooks (T333926)]] (duration: 10m 50s) |
[production] |
14:10 |
<stevemunene@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=aqs1018.eqiad.wmnet |
[production] |
14:09 |
<stevemunene@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=aqs1013.eqiad.wmnet |
[production] |
14:09 |
<stevemunene@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=aqs1012.eqiad.wmnet |
[production] |
14:09 |
<ayounsi@cumin1001> |
END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 33 |
[production] |
14:09 |
<ayounsi@cumin1001> |
START - Cookbook sre.network.debug for Netbox circuit ID 33 |
[production] |
14:09 |
<stevemunene@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=datahubsearch1003.eqiad.wmnet |
[production] |
14:05 |
<lucaswerkmeister-wmde@deploy2002> |
lucaswerkmeister-wmde: Backport for [[gerrit:905598|Use HookContainer to register hooks inside hooks (T333926)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet |
[production] |
14:04 |
<lucaswerkmeister-wmde@deploy2002> |
Started scap: Backport for [[gerrit:905598|Use HookContainer to register hooks inside hooks (T333926)]] |
[production] |
13:44 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depool es1022 T333961', diff saved to https://phabricator.wikimedia.org/P46027 and previous config saved to /var/cache/conftool/dbconfig/20230404-134415-ladsgroup.json |
[production] |
13:42 |
<Emperor> |
repool thanos-fe1003 re T331882 |
[production] |
13:41 |
<Emperor> |
repool ms-fe1011 re T331882 |
[production] |
13:38 |
<steve_munene> |
leave hdfs safemode T331882 |
[production] |
13:38 |
<inflatador> |
reboot elastic2038 to clear soft lock |
[production] |
13:34 |
<sukhe> |
run authdns-update for CR 905612, reverting depool of eqiad |
[production] |
13:30 |
<hnowlan@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=thumbor1006.eqiad.wmnet |
[production] |
13:25 |
<cgoubert@deploy2002> |
helmfile [eqiad] DONE helmfile.d/services/mw-web: apply |
[production] |
13:25 |
<cgoubert@deploy2002> |
helmfile [eqiad] START helmfile.d/services/mw-web: apply |
[production] |
13:13 |
<hnowlan@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=thumbor1006.eqiad.wmnet |
[production] |
13:11 |
<hnowlan@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=maps1009.eqiad.wmnet |
[production] |
13:11 |
<XioNoX> |
asw2-c-eqiad> request system reboot all-members - T331882 |
[production] |
13:10 |
<urbanecm@deploy2002> |
Finished scap: Backport for [[gerrit:905544|ckbwiktionary: Add logo (T331831)]] (duration: 07m 00s) |
[production] |
13:05 |
<akosiaris@cumin1001> |
END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all active/active services in eqiad: eqiad row C switches upgrade - T331882 |
[production] |
13:03 |
<urbanecm@deploy2002> |
Started scap: Backport for [[gerrit:905544|ckbwiktionary: Add logo (T331831)]] |
[production] |
13:02 |
<ayounsi@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 227 hosts with reason: eqiad row C upgrade |
[production] |
12:57 |
<ayounsi@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on 227 hosts with reason: eqiad row C upgrade |
[production] |
12:57 |
<steve_munene> |
putting pdfs into safe mode as part of T331882 |
[production] |
12:52 |
<ayounsi@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on 228 hosts with reason: eqiad row C upgrade |
[production] |
12:52 |
<ayounsi@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on 228 hosts with reason: eqiad row C upgrade |
[production] |
12:44 |
<akosiaris@cumin1001> |
START - Cookbook sre.discovery.datacenter depool all active/active services in eqiad: eqiad row C switches upgrade - T331882 |
[production] |
12:43 |
<Emperor> |
depool thanos-fe1003 re T331882 |
[production] |
12:38 |
<Emperor> |
depool ms-fe1011 re T331882 |
[production] |
12:32 |
<sukhe> |
[finished] run authdns-update for CR: 905603 depool eqiad |
[production] |
12:31 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 38 hosts with reason: Row c switch maint T331882 |
[production] |
12:31 |
<sukhe> |
run authdns-update for CR: 905603 depool eqiad |
[production] |
12:31 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on 38 hosts with reason: Row c switch maint T331882 |
[production] |
12:28 |
<stevemunene@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=aqs1018.eqiad.wmnet |
[production] |
12:28 |
<volans@cumin1001> |
END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox |
[production] |
12:28 |
<stevemunene@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=aqs1013.eqiad.wmnet |
[production] |
12:28 |
<volans@cumin1001> |
START - Cookbook sre.netbox.update-extras rolling update on A:netbox |
[production] |
12:28 |
<stevemunene@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=aqs1012.eqiad.wmnet |
[production] |
12:28 |
<volans@cumin1001> |
END (FAIL) - Cookbook sre.netbox.update-extras (exit_code=1) rolling update on A:netbox-canary |
[production] |
12:27 |
<volans@cumin1001> |
START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary |
[production] |
12:26 |
<stevemunene@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=datahubsearch1003.eqiad.wmnet |
[production] |
12:24 |
<TimStarling> |
I noticed that mw2382 was still talking to mwlog1002. It still had old php-fpm7.4 processes despite the scap. So I manually restarted php-fpm on it. |
[production] |
12:17 |
<tstarling@deploy2002> |
Synchronized src/Profiler.php: T331882 disable profiling for switch maintenance (duration: 05m 58s) |
[production] |
11:35 |
<hnowlan@puppetmaster1001> |
conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet |
[production] |