2021-03-01
§
|
07:55 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14529 and previous config saved to /var/cache/conftool/dbconfig/20210301-075514-root.json |
[production] |
07:53 |
<marostegui> |
Upgrade pc1010 pc2008 pc200 to 10.4.18 |
[production] |
07:53 |
<elukey> |
clean up old logs + apt-get clean + puppet clientbucket on an-coord1001 to free space |
[production] |
07:48 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1168 (re)pooling @ 4%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14528 and previous config saved to /var/cache/conftool/dbconfig/20210301-074759-root.json |
[production] |
07:40 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1134 (re)pooling @ 15%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14527 and previous config saved to /var/cache/conftool/dbconfig/20210301-074011-root.json |
[production] |
07:29 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Give some more weight to db1168', diff saved to https://phabricator.wikimedia.org/P14526 and previous config saved to /var/cache/conftool/dbconfig/20210301-072957-marostegui.json |
[production] |
07:25 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14525 and previous config saved to /var/cache/conftool/dbconfig/20210301-072507-root.json |
[production] |
07:10 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Give some more weight to db1168', diff saved to https://phabricator.wikimedia.org/P14524 and previous config saved to /var/cache/conftool/dbconfig/20210301-071047-marostegui.json |
[production] |
07:10 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14523 and previous config saved to /var/cache/conftool/dbconfig/20210301-071004-root.json |
[production] |
07:05 |
<marostegui> |
Stop MySQL on db2082 to clone db2152 - T275633 |
[production] |
06:55 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1134 (re)pooling @ 1%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14521 and previous config saved to /var/cache/conftool/dbconfig/20210301-065500-root.json |
[production] |
06:47 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Pool db1168 with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P14520 and previous config saved to /var/cache/conftool/dbconfig/20210301-064704-marostegui.json |
[production] |
06:46 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Add db1168 to dbctl T258361!', diff saved to https://phabricator.wikimedia.org/P14519 and previous config saved to /var/cache/conftool/dbconfig/20210301-064603-marostegui.json |
[production] |
06:32 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1092.eqiad.wmnet |
[production] |
06:25 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts db1092.eqiad.wmnet |
[production] |
2021-02-28
§
|
19:20 |
<wm-bot> |
<lucaswerkmeister> deployed bbca6e5b8e (better OAuth error handling) |
[tools.ranker] |
19:17 |
<wm-bot> |
<lucaswerkmeister> deployed 03d707756b (fix return type, should be a no-op) |
[tools.quickcategories] |
19:15 |
<wm-bot> |
<lucaswerkmeister> deployed a01dae7728 (better OAuth error handling) |
[tools.quickcategories] |
19:04 |
<wm-bot> |
<lucaswerkmeister> deployed a543196e25 (better OAuth error handling) |
[tools.speedpatrolling] |
18:55 |
<wm-bot> |
<lucaswerkmeister> deployed ae6a228597 (better OAuth error handling) |
[tools.wd-image-positions] |
18:19 |
<legoktm> |
added Majavah as a maintainer |
[tools.wikibugs] |
18:17 |
<legoktm> |
manually stopped all jobs and started them |
[tools.wikibugs] |
17:18 |
<wm-bot> |
<lucaswerkmeister> deployed 369031b945 (minifix) |
[tools.lexeme-forms] |
17:10 |
<wm-bot> |
<lucaswerkmeister> deployed 0455dc20f4 (better OAuth error handling) |
[tools.lexeme-forms] |
14:17 |
<gehel> |
repooled wdqs1011 - catched up on lag |
[production] |
04:54 |
<andrewbogott> |
restarted redis-server on tools-redis-1003 and tools-redis-1004 in an attempt to reduce replag, no real change detected |
[admin] |
2021-02-27
§
|
22:03 |
<Reedy> |
re-armed beta keyholder... I think... |
[releng] |
21:19 |
<dwisehaupt> |
ran the following on frdb2002 to allow replication to continue after conversion to utf8mb4 charset: set global slave_type_conversions = ALL_NON_LOSSY; |
[production] |
18:44 |
<gehel> |
depooled wdqs1011 to catch up on lag |
[production] |
18:37 |
<gehel> |
powercycling wdqs1011 |
[production] |
02:23 |
<bstorm> |
deployed typo fix to maintain-kubeusers in an innocent effort to make the weekend better T275910 |
[tools] |
02:00 |
<bstorm> |
running a script to repair the dumps mount in all podpresets T275371 |
[tools] |
00:33 |
<andrewbogott> |
sudo cumin --timeout 500 "A:all and not O{project:clouddb-services}" 'lsb_release -c | grep -i buster && uname -r | grep -v 4.19.0-14-amd64 && reboot' |
[admin] |
00:28 |
<andrewbogott> |
sudo cumin --timeout 500 "A:all and not O{project:clouddb-services}" 'lsb_release -c | grep -i buster && uname -r | grep -v 4.19.0-14-amd64 && echo reboot' |
[admin] |
00:09 |
<andrewbogott> |
sudo cumin "A:all and not O{project:clouddb-services}" 'lsb_release -c | grep -i stretch && uname -r | grep -v 4.19.0-0.bpo.14-amd64 && reboot' |
[admin] |
00:08 |
<mutante> |
deploy1002 - rsyncing home dirs from deploy1001 |
[production] |
2021-02-26
§
|
23:20 |
<bstorm> |
rebooting clouddb-wikilabels-02 for patches |
[clouddb-services] |
22:55 |
<bstorm> |
rebooting clouddb-wikireplicas-proxy-1 and clouddb-wikireplicas-proxy-2 before (hopefully) many people are using them |
[clouddb-services] |
22:04 |
<bstorm> |
cleaned up grid jobs 1230666,1908277,1908299,2441500,2441513 |
[tools] |
21:40 |
<Majavah> |
restated stuck job stewardbot, sulwatcher seems to be doing fine |
[tools.stewardbots] |
21:27 |
<bstorm> |
hard rebooting tools-sgeexec-0947 |
[tools] |
21:21 |
<bstorm> |
hard rebooting tools-sgeexec-0952.tools.eqiad.wmflabs |
[tools] |
20:46 |
<andrewbogott> |
rebooting all hosts |
[cloudinfra] |
20:39 |
<andrewbogott> |
rebooting all hosts |
[toolsbeta] |
20:29 |
<mutante> |
deploy2001 - /srv/mediawiki-staging sudo find . -name *.cdb delete - deleted 190 GB of old cdb files (T275826 T265963) |
[production] |
20:01 |
<bd808> |
Deleted csr in strange state for tool-ores-inspect |
[tools] |
19:47 |
<James_F> |
Zuul: [mediawiki/services/geoshapes] Add typescript service CI T274380 |
[releng] |
18:31 |
<dwisehaupt> |
starting the utf8mb4 table alters on frdb2002 under a root screen session |
[production] |
17:59 |
<pt1979@cumin2001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mwmaint2002.codfw.wmnet with reason: REIMAGE |
[production] |
17:57 |
<pt1979@cumin2001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint2002.codfw.wmnet with reason: REIMAGE |
[production] |