2021-02-10
§
|
09:10 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 |
[production] |
09:00 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1076 (re)pooling @ 10%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14288 and previous config saved to /var/cache/conftool/dbconfig/20210210-090057-root.json |
[production] |
09:00 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1157 (re)pooling @ 60%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14287 and previous config saved to /var/cache/conftool/dbconfig/20210210-090004-root.json |
[production] |
08:45 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1157 (re)pooling @ 40%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14286 and previous config saved to /var/cache/conftool/dbconfig/20210210-084500-root.json |
[production] |
08:41 |
<legoktm> |
depooling mw1404.eqiad.wmnet for perf benchmarking (T274041) |
[production] |
08:41 |
<legoktm@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1404.eqiad.wmnet |
[production] |
08:29 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1157 (re)pooling @ 20%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14285 and previous config saved to /var/cache/conftool/dbconfig/20210210-082957-root.json |
[production] |
08:19 |
<godog> |
swift eqiad-prod: decrease weight for SSDs on ms-be[1019-1026] - T272836 |
[production] |
08:14 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14284 and previous config saved to /var/cache/conftool/dbconfig/20210210-081453-root.json |
[production] |
08:05 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1127 T266483', diff saved to https://phabricator.wikimedia.org/P14283 and previous config saved to /var/cache/conftool/dbconfig/20210210-080512-marostegui.json |
[production] |
06:43 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Fully pool db1170:3312, db1170:3317 T258361', diff saved to https://phabricator.wikimedia.org/P14282 and previous config saved to /var/cache/conftool/dbconfig/20210210-064330-marostegui.json |
[production] |
06:35 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Give more weight to db1170:3312, db1170:3317 T258361', diff saved to https://phabricator.wikimedia.org/P14281 and previous config saved to /var/cache/conftool/dbconfig/20210210-063534-marostegui.json |
[production] |
06:22 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE |
[production] |
06:20 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE |
[production] |
06:19 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Pool db1170:3312, db1170:3317 with minimal weight for the first time T258361', diff saved to https://phabricator.wikimedia.org/P14279 and previous config saved to /var/cache/conftool/dbconfig/20210210-061924-marostegui.json |
[production] |
06:16 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Add db1170:3312 and db1170:3317 to dbctl, depooled T258361', diff saved to https://phabricator.wikimedia.org/P14278 and previous config saved to /var/cache/conftool/dbconfig/20210210-061638-marostegui.json |
[production] |
06:11 |
<jiji@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1020.eqiad.wmnet |
[production] |
06:04 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host mc1020.eqiad.wmnet |
[production] |
05:58 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1076 to clone db1162 T258361', diff saved to https://phabricator.wikimedia.org/P14277 and previous config saved to /var/cache/conftool/dbconfig/20210210-055846-marostegui.json |
[production] |
03:46 |
<ryankemper> |
`ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph.service` |
[production] |
01:54 |
<krinkle@deploy1001> |
Finished deploy [integration/docroot@0234db2]: Unbreak doc.wm.o (2) - Ib67da94fb1bdf0 (duration: 00m 06s) |
[production] |
01:54 |
<krinkle@deploy1001> |
Started deploy [integration/docroot@0234db2]: Unbreak doc.wm.o (2) - Ib67da94fb1bdf0 |
[production] |
01:43 |
<krinkle@deploy1001> |
Finished deploy [integration/docroot@fddc7c9]: Unbreak doc.wm.o - Ibf28e02ec03 (duration: 00m 06s) |
[production] |
01:43 |
<krinkle@deploy1001> |
Started deploy [integration/docroot@fddc7c9]: Unbreak doc.wm.o - Ibf28e02ec03 |
[production] |
01:06 |
<milimetric@deploy1001> |
Finished deploy [analytics/refinery@b539bf6] (thin): Job fixes after Hadoop upgrade (duration: 00m 06s) |
[production] |
01:06 |
<milimetric@deploy1001> |
Started deploy [analytics/refinery@b539bf6] (thin): Job fixes after Hadoop upgrade |
[production] |
01:06 |
<milimetric@deploy1001> |
Finished deploy [analytics/refinery@b539bf6]: Job fixes after Hadoop upgrade (duration: 10m 55s) |
[production] |
00:58 |
<mutante> |
doc1001 - reloaded apache2 |
[production] |
00:55 |
<milimetric@deploy1001> |
Started deploy [analytics/refinery@b539bf6]: Job fixes after Hadoop upgrade |
[production] |
00:42 |
<Amir1> |
changing frwiki to wmf.30 in mwdebug1002 to test T264391 |
[production] |
00:33 |
<ladsgroup@deploy1001> |
Synchronized php-1.36.0-wmf.30/extensions/FeaturedFeeds: [[gerrit:662965|Fix issues with recent caching update]] (T264391) (duration: 01m 10s) |
[production] |
00:22 |
<twentyafterfour@deploy1001> |
Finished scap: testwikis wikis to 1.36.0-wmf.30 (duration: 24m 10s) |
[production] |
00:01 |
<twentyafterfour> |
train status: wmf.28 and wmf.29 are undeployed. wmf.27 is everywhere with the exception of testwikis which is at wmf.30 refs T271344 |
[production] |
2021-02-09
§
|
23:58 |
<twentyafterfour@deploy1001> |
Started scap: testwikis wikis to 1.36.0-wmf.30 |
[production] |
23:56 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet |
[production] |
23:55 |
<ryankemper> |
Depooled `wdqs1005` - it's catching up on hours of lag |
[production] |
23:55 |
<twentyafterfour@deploy1001> |
Finished scap: (no justification provided) (duration: 08m 43s) |
[production] |
23:53 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw2250.codfw.wmnet |
[production] |
23:50 |
<mutante> |
mw1383,mw1385 - scap pull, php |
[production] |
23:48 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1296.eqiad.wmnet |
[production] |
23:47 |
<twentyafterfour> |
running scap sync-world |
[production] |
23:47 |
<twentyafterfour@deploy1001> |
Started scap: (no justification provided) |
[production] |
23:46 |
<twentyafterfour@deploy1001> |
rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.27 |
[production] |
23:40 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1296.eqiad.wmnet |
[production] |
23:33 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1380.eqiad.wmnet |
[production] |
23:32 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1380.eqiad.wmnet |
[production] |
23:28 |
<mutante> |
mw1380 - powercycling after it did not come back from normal reboot during reimaging |
[production] |
23:23 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1372.eqiad.wmnet |
[production] |
23:18 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1372.eqiad.wmnet |
[production] |
23:05 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2250.codfw.wmnet with reason: REIMAGE |
[production] |