2601-2650 of 10000 results (39ms)
2021-04-14 ยง
11:41 <hnowlan@cumin1001> START - Cookbook sre.cassandra.roll-restart [production]
11:39 <hnowlan@cumin1001> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) [production]
11:37 <marostegui@cumin1001> dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15328 and previous config saved to /var/cache/conftool/dbconfig/20210414-113714-root.json [production]
11:35 <marostegui@cumin1001> dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15327 and previous config saved to /var/cache/conftool/dbconfig/20210414-113557-root.json [production]
11:31 <marostegui> Upgrade kernel on db1096 (s5, s6) [production]
11:29 <hnowlan@cumin1001> START - Cookbook sre.cassandra.roll-restart [production]
11:26 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1096 (s5,s6) kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15326 and previous config saved to /var/cache/conftool/dbconfig/20210414-112619-marostegui.json [production]
11:25 <hnowlan> regenerated certificates for restbase1019/restbase102[0-7] [production]
11:22 <marostegui@cumin1001> dbctl commit (dc=all): 'db1177 (re)pooling @ 90%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15325 and previous config saved to /var/cache/conftool/dbconfig/20210414-112211-root.json [production]
11:07 <marostegui@cumin1001> dbctl commit (dc=all): 'db1177 (re)pooling @ 80%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15323 and previous config saved to /var/cache/conftool/dbconfig/20210414-110706-root.json [production]
11:06 <jiji@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1036.eqiad.wmnet with reason: REIMAGE [production]
11:06 <akosiaris@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' . [production]
11:06 <akosiaris@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' . [production]
11:06 <akosiaris@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' . [production]
11:06 <akosiaris@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' . [production]
11:04 <jiji@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1035.eqiad.wmnet with reason: REIMAGE [production]
11:04 <akosiaris@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' . [production]
11:04 <akosiaris@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' . [production]
11:04 <akosiaris@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' . [production]
11:03 <akosiaris@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' . [production]
11:03 <jiji@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1036.eqiad.wmnet with reason: REIMAGE [production]
11:03 <akosiaris@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' . [production]
11:03 <akosiaris@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' . [production]
11:02 <akosiaris@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' . [production]
11:02 <akosiaris@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' . [production]
11:02 <jiji@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1034.eqiad.wmnet with reason: REIMAGE [production]
11:01 <jiji@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1035.eqiad.wmnet with reason: REIMAGE [production]
10:59 <jiji@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1034.eqiad.wmnet with reason: REIMAGE [production]
10:52 <marostegui@cumin1001> dbctl commit (dc=all): 'db1177 (re)pooling @ 70%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15322 and previous config saved to /var/cache/conftool/dbconfig/20210414-105202-root.json [production]
10:36 <marostegui@cumin1001> dbctl commit (dc=all): 'db1177 (re)pooling @ 60%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15321 and previous config saved to /var/cache/conftool/dbconfig/20210414-103659-root.json [production]
10:30 <marostegui> Failover m1 from db1080 to db1159 - T276448 [production]
10:25 <dcaro@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Upgrading ceph to octopus [production]
10:25 <dcaro@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Upgrading ceph to octopus [production]
10:21 <marostegui@cumin1001> dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15320 and previous config saved to /var/cache/conftool/dbconfig/20210414-102153-root.json [production]
10:06 <marostegui@cumin1001> dbctl commit (dc=all): 'db1177 (re)pooling @ 40%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15319 and previous config saved to /var/cache/conftool/dbconfig/20210414-100649-root.json [production]
09:51 <marostegui@cumin1001> dbctl commit (dc=all): 'db1177 (re)pooling @ 30%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15318 and previous config saved to /var/cache/conftool/dbconfig/20210414-095146-root.json [production]
09:37 <ryankemper@cumin2001> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [production]
09:36 <marostegui@cumin1001> dbctl commit (dc=all): 'db1177 (re)pooling @ 20%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15317 and previous config saved to /var/cache/conftool/dbconfig/20210414-093642-root.json [production]
09:33 <marostegui@cumin1001> dbctl commit (dc=all): 'Pool db1177 with minimal weight on s8 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15316 and previous config saved to /var/cache/conftool/dbconfig/20210414-093305-marostegui.json [production]
09:29 <gehel> depooling wdqs1004 - corrupted data after data reload [production]
09:27 <effie> disable puppet on all mediawiki servers to merge 676580 [production]
09:24 <urbanecm@deploy1002> Synchronized php-1.37.0-wmf.1/extensions/DiscussionTools/includes/Hooks/HookUtils.php: e4b2d93dcf86a336314ed09fd37844edb16f4f30: Dont allow query and cookie hacks to enable topic subscriptions (T280082) (duration: 01m 24s) [production]
09:23 <gehel> repooling wdqs1013, catched up on lag [production]
09:22 <gehel> depooling wdqs1003 - corrupted data after data reload [production]
09:19 <jmm@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kraz.wikimedia.org [production]
09:16 <gehel> restarting blazegraph on wdqs1003 [production]
09:12 <ryankemper> T267927 depooled `wdqs1004` following data transfer (catching up on lag), current round of data transfers is done so there shouldn't be any left to depool [production]
09:10 <ryankemper@cumin2001> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [production]
09:09 <jmm@cumin1001> START - Cookbook sre.hosts.decommission for hosts kraz.wikimedia.org [production]
09:06 <ryankemper> T267927 depool `wdqs2001` following data transfer (catching up on lag) [production]