2021-09-04
§
|
13:05 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 50%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17215 and previous config saved to /var/cache/conftool/dbconfig/20210904-130525-root.json |
[production] |
12:50 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 25%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17214 and previous config saved to /var/cache/conftool/dbconfig/20210904-125021-root.json |
[production] |
12:35 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 10%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17213 and previous config saved to /var/cache/conftool/dbconfig/20210904-123518-root.json |
[production] |
12:20 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 5%: Slowly repool T290374', diff saved to https://phabricator.wikimedia.org/P17212 and previous config saved to /var/cache/conftool/dbconfig/20210904-122014-root.json |
[production] |
09:03 |
<elukey> |
restart wmf_auto_restart_rsyslog.service on puppetdb1002 |
[production] |
09:00 |
<elukey> |
`systemctl reset-failed ifup@ens6.service` on puppetdb2002 - T273026 |
[production] |
03:02 |
<rzl@cumin2001> |
dbctl commit (dc=all): 'Depool db2137:3314', diff saved to https://phabricator.wikimedia.org/P17210 and previous config saved to /var/cache/conftool/dbconfig/20210904-030231-rzl.json |
[production] |
2021-09-03
§
|
21:49 |
<bd808@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' . |
[production] |
20:30 |
<bd808@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' . |
[production] |
19:33 |
<krinkle@deploy1002> |
Finished deploy [integration/docroot@6492b3d]: I48480e89e5f6 (duration: 00m 10s) |
[production] |
19:33 |
<krinkle@deploy1002> |
Started deploy [integration/docroot@6492b3d]: I48480e89e5f6 |
[production] |
19:26 |
<bd808@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' . |
[production] |
19:04 |
<ryankemper> |
T290330 `ryankemper@cumin1001:~$ sudo -E cumin 'P{wdqs2*}' 'sudo rm -fv /etc/cron.hourly/restart-blazegraph'` (Cleaned up manually created crons now that we have [somewhat hacky] systemd timers doing the same job) |
[production] |
17:42 |
<dduvall@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' . |
[production] |
17:40 |
<dduvall@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' . |
[production] |
17:35 |
<dduvall@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . |
[production] |
17:17 |
<ryankemper> |
T290330 Deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/717508 across `wdqs` fleet; codfw wdqs hosts will restart on average once per hour now to address ongoing availability issues for wdqs codfw |
[production] |
16:32 |
<bd808@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' . |
[production] |
16:10 |
<gehel> |
blazegraph (public cofdfw cluster) will now restart every hour - T290330 |
[production] |
15:53 |
<jbond> |
enable puppet fleet wide to post puppetdb database maintance - T263578 |
[production] |
15:21 |
<jbond> |
create lvm snapshot puppetdb2002_data_snapshot on ganeti2023 - T263578 |
[production] |
15:17 |
<jbond> |
create lvm snapshot puppetdb1002_data_snapshot on ganeti1012 - T263578 |
[production] |
15:00 |
<jbond> |
disable puppet fleet wide to preform puppetdb database maintance - T263578 |
[production] |
14:58 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. |
[production] |
14:58 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. |
[production] |
14:35 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
14:29 |
<pt1979@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
14:20 |
<mutante> |
mw2264 - scap pull |
[production] |
14:18 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. |
[production] |
14:18 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. |
[production] |
13:11 |
<jiji@cumin1001> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet |
[production] |
13:10 |
<dcausse> |
installing openjdk-8-dbg on wdqs2007 |
[production] |
13:04 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet |
[production] |
13:02 |
<jiji@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1023.eqiad.wmnet |
[production] |
12:48 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts mc1023.eqiad.wmnet |
[production] |
12:46 |
<jiji@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc[1035-1036].eqiad.wmnet |
[production] |
12:32 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts mc[1035-1036].eqiad.wmnet |
[production] |
12:12 |
<jiji@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc[1028-1032].eqiad.wmnet |
[production] |
12:03 |
<joal@deploy1002> |
Finished deploy [analytics/refinery@7208d3d] (thin): Analytics hotfix deploy (bis) THIN [analytics/refinery@7208d3d] (duration: 00m 06s) |
[production] |
12:03 |
<joal@deploy1002> |
Started deploy [analytics/refinery@7208d3d] (thin): Analytics hotfix deploy (bis) THIN [analytics/refinery@7208d3d] |
[production] |
12:03 |
<joal@deploy1002> |
Finished deploy [analytics/refinery@7208d3d]: Analytics hotfix deploy (bis)[analytics/refinery@7208d3d] (duration: 19m 16s) |
[production] |
11:56 |
<dcausse@deploy1002> |
Finished deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA (duration: 19m 21s) |
[production] |
11:44 |
<joal@deploy1002> |
Started deploy [analytics/refinery@7208d3d]: Analytics hotfix deploy (bis)[analytics/refinery@7208d3d] |
[production] |
11:42 |
<marostegui> |
Remove flaggedrevs_stats2 and flaggedrevs_stats from enwiki - T289050 |
[production] |
11:37 |
<dcausse@deploy1002> |
Started deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA |
[production] |
11:36 |
<dcausse@deploy1002> |
Finished deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA (duration: 01m 07s) |
[production] |
11:35 |
<dcausse@deploy1002> |
Started deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA |
[production] |
10:58 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts mc[1028-1032].eqiad.wmnet |
[production] |
10:54 |
<jiji@cumin1001> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc[1025-1026].eqiad.wmnet |
[production] |
10:47 |
<joal@deploy1002> |
Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): Deploy latest code on AQS new servers - test after failures (duration: 00m 32s) |
[production] |