2021-02-22
§
|
23:59 |
<mutante> |
logstash2031 - systemctl reset-failed |
[production] |
23:53 |
<mutante> |
stat1007 - same problem and alerts as stat1004 |
[production] |
23:52 |
<mutante> |
stat1004 - systemctl reset-failed to clear icinga alerts for systemd state caused by jupyterhub singleuser services |
[production] |
23:47 |
<dpifke@deploy1001> |
Finished deploy [performance/arc-lamp@1f3bce1]: Revert https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/664600 (duration: 00m 05s) |
[production] |
23:47 |
<dpifke@deploy1001> |
Started deploy [performance/arc-lamp@1f3bce1]: Revert https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/664600 |
[production] |
23:37 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1286.eqiad.wmnet |
[production] |
23:36 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1286.eqiad.wmnet |
[production] |
23:34 |
<milimetric@deploy1001> |
Finished deploy [analytics/refinery@3de01b5] (thin): Fix camus (duration: 00m 07s) |
[production] |
23:34 |
<milimetric@deploy1001> |
Started deploy [analytics/refinery@3de01b5] (thin): Fix camus |
[production] |
23:33 |
<milimetric@deploy1001> |
Finished deploy [analytics/refinery@3de01b5]: Fix camus (duration: 14m 03s) |
[production] |
23:27 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1014.eqiad.wmnet with reason: REIMAGE |
[production] |
23:25 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1014.eqiad.wmnet with reason: REIMAGE |
[production] |
23:22 |
<oblivian@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . |
[production] |
23:22 |
<oblivian@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . |
[production] |
23:19 |
<milimetric@deploy1001> |
Started deploy [analytics/refinery@3de01b5]: Fix camus |
[production] |
23:18 |
<oblivian@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . |
[production] |
23:18 |
<oblivian@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . |
[production] |
23:09 |
<ppchelko@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . |
[production] |
23:09 |
<ppchelko@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . |
[production] |
23:06 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1410.eqiad.wmnet |
[production] |
23:06 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1412.eqiad.wmnet |
[production] |
23:02 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1412.eqiad.wmnet |
[production] |
23:00 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1410.eqiad.wmnet |
[production] |
22:52 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1286.eqiad.wmnet with reason: REIMAGE |
[production] |
22:50 |
<legoktm> |
disabling puppet on mwdebug1001 to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/664903 |
[production] |
22:49 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw1286.eqiad.wmnet with reason: REIMAGE |
[production] |
22:45 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1412.eqiad.wmnet with reason: REIMAGE |
[production] |
22:43 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1410.eqiad.wmnet with reason: REIMAGE |
[production] |
22:42 |
<krinkle@deploy1001> |
Synchronized w/fatal-error.php: df694d695 (duration: 00m 56s) |
[production] |
22:42 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw1412.eqiad.wmnet with reason: REIMAGE |
[production] |
22:41 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw1410.eqiad.wmnet with reason: REIMAGE |
[production] |
22:31 |
<ppchelko@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . |
[production] |
22:31 |
<ppchelko@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . |
[production] |
22:18 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1279.eqiad.wmnet |
[production] |
22:18 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1312.eqiad.wmnet |
[production] |
22:16 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1314.eqiad.wmnet |
[production] |
21:46 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1314.eqiad.wmnet |
[production] |
21:46 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1279.eqiad.wmnet |
[production] |
21:45 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1312.eqiad.wmnet |
[production] |
21:00 |
<Amir1> |
end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T273463 T271985 T273468) |
[production] |
20:59 |
<sbassett> |
Deployed security patch for T274883 |
[production] |
20:59 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1279.eqiad.wmnet with reason: REIMAGE |
[production] |
20:57 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw1279.eqiad.wmnet with reason: REIMAGE |
[production] |