901-950 of 10000 results (76ms)
2023-06-27 §
08:58 <hnowlan@puppetmaster1001> conftool action : set/weight=10; selector: service=thumbor,name=kubernetes100[0-9].eqiad.wmnet [production]
08:58 <hnowlan@puppetmaster1001> conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes100[0-9].eqiad.wmnet [production]
08:58 <hnowlan@puppetmaster1001> conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes200[0-9].codfw.wmnet [production]
08:53 <akosiaris@deploy1002> Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 07m 21s) [production]
08:52 <kartik@deploy1002> helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply [production]
08:47 <kartik@deploy1002> helmfile [codfw] START helmfile.d/services/machinetranslation: apply [production]
08:45 <kartik@deploy1002> helmfile [staging] DONE helmfile.d/services/machinetranslation: apply [production]
08:42 <kartik@deploy1002> helmfile [staging] START helmfile.d/services/machinetranslation: apply [production]
08:42 <fabfur@cumin1001> START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo [production]
08:41 <fabfur@cumin1001> START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo [production]
08:41 <kart_> Updated cxserver to 2023-06-27-053435-production (T339105) [production]
08:38 <elukey> revoked puppet cert for 'varnishkafka' and cleaned up its cergen's files in puppet private - T337825 [production]
08:33 <root@cumin2002> END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Neil P. Quinn-WMF out of all services on: 19 hosts [production]
08:33 <root@cumin2002> START - Cookbook sre.idm.logout Logging Neil P. Quinn-WMF out of all services on: 19 hosts [production]
08:32 <kartik@deploy1002> helmfile [eqiad] DONE helmfile.d/services/cxserver: apply [production]
08:32 <root@cumin2002> END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Neil P. Quinn-WMF out of all services on: 767 hosts [production]
08:32 <kartik@deploy1002> helmfile [eqiad] START helmfile.d/services/cxserver: apply [production]
08:32 <root@cumin2002> START - Cookbook sre.idm.logout Logging Neil P. Quinn-WMF out of all services on: 767 hosts [production]
08:31 <root@cumin2002> END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Neil P. Quinn-WMF out of all services on: 1265 hosts [production]
08:30 <root@cumin2002> START - Cookbook sre.idm.logout Logging Neil P. Quinn-WMF out of all services on: 1265 hosts [production]
08:29 <marostegui> Failover m2-master to dbproxy1022 T337812 [production]
08:28 <kartik@deploy1002> helmfile [codfw] DONE helmfile.d/services/cxserver: apply [production]
08:28 <kartik@deploy1002> helmfile [codfw] START helmfile.d/services/cxserver: apply [production]
08:25 <kartik@deploy1002> helmfile [staging] DONE helmfile.d/services/cxserver: apply [production]
08:24 <kartik@deploy1002> helmfile [staging] START helmfile.d/services/cxserver: apply [production]
08:14 <kartik@deploy1002> Finished scap: Backport for [[gerrit:933125|Enable Content and Section Translation for 4 Wikipedias (T338123)]] (duration: 16m 17s) [production]
08:03 <moritzm> installing openjdk-8 security updates for bullseye [production]
08:02 <kartik@deploy1002> kartik: Backport for [[gerrit:933125|Enable Content and Section Translation for 4 Wikipedias (T338123)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet [production]
07:58 <kartik@deploy1002> Started scap: Backport for [[gerrit:933125|Enable Content and Section Translation for 4 Wikipedias (T338123)]] [production]
07:54 <moritzm> uploaded openjdk-8 8u372-ga-1~deb11u1 to component/jdk8 for bullseye (forward port of Java 8 for Buster) [production]
07:48 <hashar> Restart Zuul due to stuck connection | T340518 | T309376 [production]
07:15 <elukey> `sudo kill `pgrep -u paramd`` on stat1005 to unblock puppet [production]
06:22 <marostegui> Failover m1-master to dbproxy1022 T337812 [production]
2023-06-26 §
23:21 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-worker1092.eqiad.wmnet with reason: Replacing RAID controller battery [production]
23:21 <btullis@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-worker1092.eqiad.wmnet with reason: Replacing RAID controller battery [production]
23:07 <btullis@deploy1002> helmfile [staging] DONE helmfile.d/services/datahub: sync on main [production]
23:02 <sbassett> Deployed updated mitigation for T336027 [production]
23:01 <ryankemper@cumin1001> END (PASS) - Cookbook sre.wdqs.restart (exit_code=0) [production]
22:55 <btullis@deploy1002> helmfile [staging] START helmfile.d/services/datahub: apply on main [production]
22:51 <ryankemper@cumin1001> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [production]
22:46 <btullis@deploy1002> helmfile [staging] DONE helmfile.d/services/datahub: sync on main [production]
22:33 <btullis@deploy1002> helmfile [staging] START helmfile.d/services/datahub: apply on main [production]
22:31 <btullis@deploy1002> helmfile [staging] START helmfile.d/services/datahub: apply on main [production]
22:24 <btullis@deploy1002> helmfile [staging] START helmfile.d/services/datahub: apply on main [production]
22:18 <ryankemper@cumin1001> START - Cookbook sre.wdqs.restart [production]
22:17 <ryankemper@cumin1001> END (ERROR) - Cookbook sre.wdqs.restart (exit_code=97) [production]
22:17 <ryankemper@cumin1001> START - Cookbook sre.wdqs.restart [production]
22:17 <btullis@deploy1002> helmfile [staging] DONE helmfile.d/services/datahub: sync on main [production]
22:16 <ryankemper@cumin1001> END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99) [production]
22:05 <btullis@deploy1002> helmfile [staging] START helmfile.d/services/datahub: apply on main [production]