1401-1450 of 10000 results (36ms)
2021-02-10 §
09:10 <elukey@cumin1001> START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 [production]
09:00 <marostegui@cumin1001> dbctl commit (dc=all): 'db1076 (re)pooling @ 10%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14288 and previous config saved to /var/cache/conftool/dbconfig/20210210-090057-root.json [production]
09:00 <marostegui@cumin1001> dbctl commit (dc=all): 'db1157 (re)pooling @ 60%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14287 and previous config saved to /var/cache/conftool/dbconfig/20210210-090004-root.json [production]
08:45 <marostegui@cumin1001> dbctl commit (dc=all): 'db1157 (re)pooling @ 40%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14286 and previous config saved to /var/cache/conftool/dbconfig/20210210-084500-root.json [production]
08:41 <legoktm> depooling mw1404.eqiad.wmnet for perf benchmarking (T274041) [production]
08:41 <legoktm@cumin1001> conftool action : set/pooled=no; selector: name=mw1404.eqiad.wmnet [production]
08:29 <marostegui@cumin1001> dbctl commit (dc=all): 'db1157 (re)pooling @ 20%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14285 and previous config saved to /var/cache/conftool/dbconfig/20210210-082957-root.json [production]
08:19 <godog> swift eqiad-prod: decrease weight for SSDs on ms-be[1019-1026] - T272836 [production]
08:14 <marostegui@cumin1001> dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14284 and previous config saved to /var/cache/conftool/dbconfig/20210210-081453-root.json [production]
08:05 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1127 T266483', diff saved to https://phabricator.wikimedia.org/P14283 and previous config saved to /var/cache/conftool/dbconfig/20210210-080512-marostegui.json [production]
06:43 <marostegui@cumin1001> dbctl commit (dc=all): 'Fully pool db1170:3312, db1170:3317 T258361', diff saved to https://phabricator.wikimedia.org/P14282 and previous config saved to /var/cache/conftool/dbconfig/20210210-064330-marostegui.json [production]
06:35 <marostegui@cumin1001> dbctl commit (dc=all): 'Give more weight to db1170:3312, db1170:3317 T258361', diff saved to https://phabricator.wikimedia.org/P14281 and previous config saved to /var/cache/conftool/dbconfig/20210210-063534-marostegui.json [production]
06:22 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE [production]
06:20 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE [production]
06:19 <marostegui@cumin1001> dbctl commit (dc=all): 'Pool db1170:3312, db1170:3317 with minimal weight for the first time T258361', diff saved to https://phabricator.wikimedia.org/P14279 and previous config saved to /var/cache/conftool/dbconfig/20210210-061924-marostegui.json [production]
06:16 <marostegui@cumin1001> dbctl commit (dc=all): 'Add db1170:3312 and db1170:3317 to dbctl, depooled T258361', diff saved to https://phabricator.wikimedia.org/P14278 and previous config saved to /var/cache/conftool/dbconfig/20210210-061638-marostegui.json [production]
06:11 <jiji@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1020.eqiad.wmnet [production]
06:04 <jiji@cumin1001> START - Cookbook sre.hosts.reboot-single for host mc1020.eqiad.wmnet [production]
05:58 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1076 to clone db1162 T258361', diff saved to https://phabricator.wikimedia.org/P14277 and previous config saved to /var/cache/conftool/dbconfig/20210210-055846-marostegui.json [production]
03:46 <ryankemper> `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph.service` [production]
01:54 <krinkle@deploy1001> Finished deploy [integration/docroot@0234db2]: Unbreak doc.wm.o (2) - Ib67da94fb1bdf0 (duration: 00m 06s) [production]
01:54 <krinkle@deploy1001> Started deploy [integration/docroot@0234db2]: Unbreak doc.wm.o (2) - Ib67da94fb1bdf0 [production]
01:43 <krinkle@deploy1001> Finished deploy [integration/docroot@fddc7c9]: Unbreak doc.wm.o - Ibf28e02ec03 (duration: 00m 06s) [production]
01:43 <krinkle@deploy1001> Started deploy [integration/docroot@fddc7c9]: Unbreak doc.wm.o - Ibf28e02ec03 [production]
01:06 <milimetric@deploy1001> Finished deploy [analytics/refinery@b539bf6] (thin): Job fixes after Hadoop upgrade (duration: 00m 06s) [production]
01:06 <milimetric@deploy1001> Started deploy [analytics/refinery@b539bf6] (thin): Job fixes after Hadoop upgrade [production]
01:06 <milimetric@deploy1001> Finished deploy [analytics/refinery@b539bf6]: Job fixes after Hadoop upgrade (duration: 10m 55s) [production]
00:58 <mutante> doc1001 - reloaded apache2 [production]
00:55 <milimetric@deploy1001> Started deploy [analytics/refinery@b539bf6]: Job fixes after Hadoop upgrade [production]
00:42 <Amir1> changing frwiki to wmf.30 in mwdebug1002 to test T264391 [production]
00:33 <ladsgroup@deploy1001> Synchronized php-1.36.0-wmf.30/extensions/FeaturedFeeds: [[gerrit:662965|Fix issues with recent caching update]] (T264391) (duration: 01m 10s) [production]
00:22 <twentyafterfour@deploy1001> Finished scap: testwikis wikis to 1.36.0-wmf.30 (duration: 24m 10s) [production]
00:01 <twentyafterfour> train status: wmf.28 and wmf.29 are undeployed. wmf.27 is everywhere with the exception of testwikis which is at wmf.30 refs T271344 [production]
2021-02-09 §
23:58 <twentyafterfour@deploy1001> Started scap: testwikis wikis to 1.36.0-wmf.30 [production]
23:56 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet [production]
23:55 <ryankemper> Depooled `wdqs1005` - it's catching up on hours of lag [production]
23:55 <twentyafterfour@deploy1001> Finished scap: (no justification provided) (duration: 08m 43s) [production]
23:53 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw2250.codfw.wmnet [production]
23:50 <mutante> mw1383,mw1385 - scap pull, php [production]
23:48 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1296.eqiad.wmnet [production]
23:47 <twentyafterfour> running scap sync-world [production]
23:47 <twentyafterfour@deploy1001> Started scap: (no justification provided) [production]
23:46 <twentyafterfour@deploy1001> rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.27 [production]
23:40 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1296.eqiad.wmnet [production]
23:33 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1380.eqiad.wmnet [production]
23:32 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1380.eqiad.wmnet [production]
23:28 <mutante> mw1380 - powercycling after it did not come back from normal reboot during reimaging [production]
23:23 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1372.eqiad.wmnet [production]
23:18 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1372.eqiad.wmnet [production]
23:05 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2250.codfw.wmnet with reason: REIMAGE [production]