2021-09-06
§
|
06:23 |
<marostegui> |
Optimize table dewiki.flaggedtemplates in eqiad T290057 |
[production] |
05:34 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2090.codfw.wmnet with reason: REIMAGE |
[production] |
05:32 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on db2090.codfw.wmnet with reason: REIMAGE |
[production] |
05:07 |
<marostegui> |
Stop replication on db2090 (old s4 master) T289650 T288803 |
[production] |
05:05 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db2110 (current master) from API T289650', diff saved to https://phabricator.wikimedia.org/P17223 and previous config saved to /var/cache/conftool/dbconfig/20210906-050502-marostegui.json |
[production] |
05:04 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db2090 T289650', diff saved to https://phabricator.wikimedia.org/P17222 and previous config saved to /var/cache/conftool/dbconfig/20210906-050419-marostegui.json |
[production] |
05:01 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Promote db2110 to s4 primary and set section read-write T289650', diff saved to https://phabricator.wikimedia.org/P17221 and previous config saved to /var/cache/conftool/dbconfig/20210906-050140-root.json |
[production] |
05:00 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Set s4 codfw as read-only for maintenance - T289650', diff saved to https://phabricator.wikimedia.org/P17220 and previous config saved to /var/cache/conftool/dbconfig/20210906-050048-root.json |
[production] |
05:00 |
<marostegui> |
Starting s4 codfw failover from db2090 to db2110 - T289650 |
[production] |
04:07 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Set db2110 with weight 0 T289650', diff saved to https://phabricator.wikimedia.org/P17219 and previous config saved to /var/cache/conftool/dbconfig/20210906-040740-root.json |
[production] |
04:07 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 33 hosts with reason: Primary switchover s4 T289650 |
[production] |
04:06 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on 33 hosts with reason: Primary switchover s4 T289650 |
[production] |
2021-09-04
§
|
19:50 |
<wm-bot> |
<lokal-profil> Deploy latest from Git master: b4d3e0e, 339838b (T289929), 7816a36 (T289930) |
[tools.heritage] |
13:35 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 100%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17217 and previous config saved to /var/cache/conftool/dbconfig/20210904-133532-root.json |
[production] |
13:20 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 75%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17216 and previous config saved to /var/cache/conftool/dbconfig/20210904-132029-root.json |
[production] |
13:05 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 50%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17215 and previous config saved to /var/cache/conftool/dbconfig/20210904-130525-root.json |
[production] |
12:50 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 25%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17214 and previous config saved to /var/cache/conftool/dbconfig/20210904-125021-root.json |
[production] |
12:35 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 10%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17213 and previous config saved to /var/cache/conftool/dbconfig/20210904-123518-root.json |
[production] |
12:20 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 5%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17212 and previous config saved to /var/cache/conftool/dbconfig/20210904-122014-root.json |
[production] |
09:03 |
<elukey> |
restart wmf_auto_restart_rsyslog.service on puppetdb1002 |
[production] |
09:00 |
<elukey> |
`systemctl reset-failed ifup@ens6.service` on puppetdb2002 - T273026 |
[production] |
03:02 |
<rzl@cumin2001> |
dbctl commit (dc=all): 'Depool db2137:3314', diff saved to https://phabricator.wikimedia.org/P17210 and previous config saved to /var/cache/conftool/dbconfig/20210904-030231-rzl.json |
[production] |
2021-09-03
§
|
23:02 |
<Krinkle> |
Creating integration-agent-qemu-1002 (Debian 11 Bullseye, g3.cores8.ram24.disk20.ephemeral40.4xiops), ref T284774 |
[releng] |
22:36 |
<bstorm> |
backfilling quotas in screen for T286784 |
[tools] |
22:34 |
<bstorm> |
backfilled quotas for T286784 |
[toolsbeta] |
21:49 |
<bd808@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' . |
[production] |
20:30 |
<bd808@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' . |
[production] |
19:33 |
<krinkle@deploy1002> |
Finished deploy [integration/docroot@6492b3d]: I48480e89e5f6 (duration: 00m 10s) |
[production] |
19:33 |
<krinkle@deploy1002> |
Started deploy [integration/docroot@6492b3d]: I48480e89e5f6 |
[production] |
19:26 |
<bd808@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' . |
[production] |
19:19 |
<bstorm> |
adding config group validation rules for postgresql and mysql T290349 |
[trove] |
19:14 |
<bstorm> |
adding config group validation rules for mariadb 10.5.10 T290349 |
[trove] |
19:04 |
<ryankemper> |
T290330 `ryankemper@cumin1001:~$ sudo -E cumin 'P{wdqs2*}' 'sudo rm -fv /etc/cron.hourly/restart-blazegraph'` (Cleaned up manually created crons now that we have [somewhat hacky] systemd timers doing the same job) |
[production] |
17:42 |
<dduvall@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' . |
[production] |
17:42 |
<dduvall> |
deploying blubberoid:2021-09-03-160524-production to eqiad/codfw (https://gerrit.wikimedia.org/r/c/blubber/+/716519) (T289367) |
[releng] |
17:40 |
<dduvall@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' . |
[production] |
17:39 |
<andrewbogott> |
restarting celery workers and reloading web UI to pick up timeout changes |
[quarry] |
17:36 |
<dduvall> |
staging blubberoid to deploy https://gerrit.wikimedia.org/r/c/blubber/+/716519 |
[releng] |
17:35 |
<dduvall@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . |
[production] |
17:17 |
<ryankemper> |
T290330 Deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/717508 across `wdqs` fleet; codfw wdqs hosts will restart on average once per hour now to address ongoing availability issues for wdqs codfw |
[production] |
16:45 |
<bstorm> |
set live wait_timeout variable to 28800 (the default) on the trove instance T290291 |
[quarry] |
16:32 |
<bd808@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' . |
[production] |
16:10 |
<gehel> |
blazegraph (public cofdfw cluster) will now restart every hour - T290330 |
[production] |
16:05 |
<wm-bot> |
<lucaswerkmeister> updated venv (includes mwparserfromhell 0.6.3) |
[tools.quickcategories] |
15:54 |
<wm-bot> |
<lucaswerkmeister> deployed 3698f0b79c (add passive forms to Norwegian Bokmal verbs) |
[tools.lexeme-forms] |
15:53 |
<jbond> |
enable puppet fleet wide to post puppetdb database maintance - T263578 |
[production] |
15:35 |
<wm-bot> |
<lucaswerkmeister> deployed 8051248b60 (l10n updates) |
[tools.lexeme-forms] |
15:34 |
<bstorm> |
rebooting labstore1005 to disconnect the drives from labstore1004 T290318 |
[admin] |