2021-07-20
§
|
11:06 |
<oblivian@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
11:03 |
<oblivian@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
10:58 |
<oblivian@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
10:57 |
<oblivian@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
10:53 |
<oblivian@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
10:43 |
<hnowlan@puppetmaster1001> |
conftool action : set/weight=10; selector: name=maps100[79].eqiad.wmnet |
[production] |
10:35 |
<hnowlan@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=maps100[79].eqiad.wmnet |
[production] |
10:11 |
<jgiannelos@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . |
[production] |
09:39 |
<kormat@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 T281058 |
[production] |
09:39 |
<kormat@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 T281058 |
[production] |
08:27 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2352.codfw.wmnet |
[production] |
08:21 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host mw2352.codfw.wmnet |
[production] |
08:02 |
<btullis> |
racadm serveraction powercycle on an-worker1106 due to CPU soft lock-ups on host |
[production] |
07:54 |
<jmm@cumin2002> |
END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host idp-test2001.wikimedia.org |
[production] |
07:50 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org |
[production] |
07:10 |
<jmm@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=ldap-replica1004.wikimedia.org |
[production] |
03:17 |
<eileen> |
civicrm revision changed from 20e9ef6bbb to 819c11307d, config revision is bb405c5232 |
[production] |
2021-07-19
§
|
20:48 |
<urbanecm> |
Deploy security patch for T286884 |
[production] |
20:29 |
<vgutierrez> |
pool text@codfw - T286921 |
[production] |
20:23 |
<volans@cumin2002> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
20:18 |
<volans@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
20:08 |
<dancy@deploy1002> |
Synchronized php-1.37.0-wmf.14/includes/export/WikiExporter.php: Backport: [[gerrit:705467|prevent PageIdentity checks in RevisionStore from breaking xml dumps (T286877)]] (duration: 00m 58s) |
[production] |
19:21 |
<Jeff_Green> |
authdns-update to remove payments100[1-4].frack.eqiad.wmnet |
[production] |
19:14 |
<dancy@deploy1002> |
Synchronized php-1.37.0-wmf.14/includes/Revision/RevisionStore.php: Backport: [[gerrit:705448|Add sanity check to newRevisionFromRowAndSlots. (T286877)]] (duration: 00m 57s) |
[production] |
18:53 |
<vgutierrez> |
running puppet and restarting pybal on lvs2009 - T286921 |
[production] |
18:46 |
<topranks> |
Running homer to re-enable port xe-2/0/43 on asw2-a2-codfw (lvs2009) - T286921 |
[production] |
18:46 |
<brennen> |
gerrit1001: restarting gerrit |
[production] |
18:40 |
<vgutierrez> |
stop pybal on lvs2009 - T286921 |
[production] |
18:38 |
<brennen> |
re-enabling puppet on gerrit1001] |
[production] |
18:35 |
<vgutierrez> |
running puppet and restarting pybal on lvs2010 - T286921 |
[production] |
18:27 |
<ryankemper> |
T264053 Deploying fix for timer issue on relforge: `ryankemper@cumin1001:~$ sudo cumin -b 2 'P{relforge*}' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'` |
[production] |
18:27 |
<topranks> |
Running homer to re-enable port xe-2/0/44 on asw2-a2-codfw (lvs2010) |
[production] |
18:27 |
<ryankemper> |
T264053 Deploying fix for timer issue on cloudelastic: `ryankemper@cumin1001:~$ sudo cumin -b 6 'P{cloudelastic*}' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'` |
[production] |
18:22 |
<vgutierrez> |
disable puppet & stop pybal on lvs2010 - T286921 |
[production] |
18:20 |
<vgutierrez> |
enabling pybal on lvs2007 - T286921 |
[production] |
18:19 |
<ryankemper> |
T264053 Deploying fix for timer issue: `ryankemper@cumin1001:~$ sudo cumin -b 36 'P{elastic*}' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'` |
[production] |
18:14 |
<topranks> |
Running homer to re-enable asw-a2-codfw xe-2/0/45 port [lvs2007] |
[production] |
18:06 |
<dancy@deploy1002> |
Synchronized .pipeline: Config: [[gerrit:705437|pipeline: Perform mergeMessageFileList and rebuildLocalisationCache separately]] (duration: 00m 56s) |
[production] |
17:54 |
<mbsantos@deploy1002> |
Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s) |
[production] |
17:54 |
<mbsantos@deploy1002> |
Started deploy [tilerator/deploy@82e5f94]: (no justification provided) |
[production] |
17:53 |
<mbsantos@deploy1002> |
Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s) |
[production] |
17:53 |
<mbsantos@deploy1002> |
Started deploy [tilerator/deploy@82e5f94]: (no justification provided) |
[production] |
17:53 |
<mbsantos@deploy1002> |
Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 21s) |
[production] |
17:53 |
<mbsantos@deploy1002> |
Started deploy [tilerator/deploy@82e5f94]: (no justification provided) |
[production] |
17:52 |
<mbsantos@deploy1002> |
Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 15s) |
[production] |
17:52 |
<mbsantos@deploy1002> |
Started deploy [tilerator/deploy@82e5f94]: (no justification provided) |
[production] |
17:52 |
<mbsantos@deploy1002> |
Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 16s) |
[production] |
17:51 |
<mbsantos@deploy1002> |
Started deploy [tilerator/deploy@82e5f94]: (no justification provided) |
[production] |
17:42 |
<ryankemper> |
[Elastic] Noted `Jul 16 18:31:20 elastic2038 elasticsearch[957]: 2021-07-16 18:31:20,657 main ERROR Unknown GELF server hostname:udp:logstash.svc.eqiad.wmnet` in elasticsearch service logs (unit had been running for 2 days) thus the restart of the elasticsearch service |
[production] |
17:41 |
<ryankemper> |
[Elastic] Restarted elasticsearch services on `elastic2038`; afterwards restarted prometheus exporters; no units failed any longer |
[production] |