2021-10-01
§
|
23:19 |
<bd808@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' . |
[production] |
22:27 |
<mutante> |
puppetmaster2001 - systemctl reset-failed |
[production] |
22:16 |
<mutante> |
puppetmaster2001 systemctl disable geoip_update_ipinfo.timer |
[production] |
22:15 |
<mutante> |
puppetmaster2001 - sudo /usr/local/bin/geoipupdate_job after adding new shell command and timer - succesfully downloaded enterprise database for T288844 |
[production] |
21:56 |
<bd808@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' . |
[production] |
21:44 |
<mutante> |
puppetmasters - temp. disabling puppet one more time, now for a different deploy, to fetch an additional MaxMind database - T288844 |
[production] |
21:19 |
<mutante> |
puppetmaster2001 - puppet removed cron sync_volatile and cron sync_ca - starting and verifying new timers: 'systemctl status sync-puppet-volatile', 'systemctl status sync-puppet-ca' T273673 |
[production] |
21:12 |
<mutante> |
puppetmaster1002, puppetmaster1003, puppetmaster2002, puppetmaster2003: re-enabled puppet, they are backends. backends don't have the sync cron/job/timer, so noop as well, just like 1004/1005/2004/2005. this just leaves the actual change on 2001 - T273673 |
[production] |
21:07 |
<mutante> |
puppetmaster1004, puppetmaster1005, puppetmaster2004, puppetmaster2005: re-enabled puppet, they are "insetup" role |
[production] |
21:06 |
<mbsantos@deploy1002> |
Finished deploy [kartotherian/deploy@d309a6e] (eqiad): tegola: reduce load to 50% during the weekend (duration: 00m 54s) |
[production] |
21:05 |
<mbsantos@deploy1002> |
Started deploy [kartotherian/deploy@d309a6e] (eqiad): tegola: reduce load to 50% during the weekend |
[production] |
21:05 |
<mutante> |
puppetmaster1001 - re-enabled puppet, noop as expected, the passive host pulls from the active one, so only 2001 has the cron/job/timer |
[production] |
21:05 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
21:02 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
21:01 |
<legoktm@deploy1002> |
Synchronized wmf-config/CommonSettings.php: Revert "Have PdfHandler use Shellbox on Commons for 10% of requests" (duration: 00m 59s) |
[production] |
20:58 |
<mutante> |
temp disabling puppet on puppetmasters - deploying gerrit:724115 (gerrit:723310) T273673 |
[production] |
18:58 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-db1002.eqiad.wmnet with reason: REIMAGE |
[production] |
18:56 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-db1001.eqiad.wmnet with reason: REIMAGE |
[production] |
18:55 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-db1002.eqiad.wmnet with reason: REIMAGE |
[production] |
18:53 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-db1001.eqiad.wmnet with reason: REIMAGE |
[production] |
18:07 |
<robh@cumin1001> |
END (ERROR) - Cookbook sre.experimental.reimage (exit_code=97) for host an-db1001.eqiad.wmnet |
[production] |
18:05 |
<robh@cumin1001> |
START - Cookbook sre.experimental.reimage for host an-db1001.eqiad.wmnet |
[production] |
17:58 |
<effie> |
depool mw1025, mw1319, mw1312 for test |
[production] |
16:20 |
<dancy> |
testing upcoming Scap 4.0.2 release on beta |
[production] |
14:04 |
<bblack> |
C:envoyproxy (appservers and others): restarting envoyproxy |
[production] |
14:04 |
<bblack> |
C:envoyproxy (appservers and others): ca-certificates updated via cumin to workaround T292291 issues |
[production] |
13:45 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. |
[production] |
13:45 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. |
[production] |
13:23 |
<bblack> |
manually trying LE expired root workaround on mwdebug1001 with puppet disabled ... |
[production] |
13:12 |
<gehel@cumin1001> |
START - Cookbook sre.wdqs.data-reload |
[production] |
13:11 |
<gehel@cumin1001> |
END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97) |
[production] |
13:11 |
<gehel@cumin1001> |
START - Cookbook sre.wdqs.data-reload |
[production] |
13:10 |
<gehel@cumin1001> |
START - Cookbook sre.wdqs.data-reload |
[production] |
11:42 |
<jgiannelos@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . |
[production] |
11:11 |
<jynus> |
manually migrating some vms out of ganeti1009 to avoid excessive memory pressure |
[production] |
10:58 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1164 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17413 and previous config saved to /var/cache/conftool/dbconfig/20211001-105849-root.json |
[production] |
10:57 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17412 and previous config saved to /var/cache/conftool/dbconfig/20211001-105735-root.json |
[production] |
10:43 |
<jgiannelos@deploy1002> |
Finished deploy [kartotherian/deploy@d4caf6d] (eqiad): Increase mirrored traffic to 100% for eqiad (duration: 00m 49s) |
[production] |
10:43 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1164 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17411 and previous config saved to /var/cache/conftool/dbconfig/20211001-104345-root.json |
[production] |
10:43 |
<jgiannelos@deploy1002> |
Started deploy [kartotherian/deploy@d4caf6d] (eqiad): Increase mirrored traffic to 100% for eqiad |
[production] |