2020-06-01
§
|
09:05 |
<filippo@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
09:04 |
<filippo@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) |
[production] |
09:03 |
<filippo@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
08:58 |
<godog> |
prometheus eqiad lvextend --resizefs --size +100G vg-ssd/prometheus-ops |
[production] |
08:43 |
<mutante> |
deneb - apt-get remove --purge apt-listchanges (packages was in status "rc" causing DPKG alert, should be removed but config was not purged) |
[production] |
08:41 |
<mutante> |
deneb - apt-get remove python3-debconf (package was in status "ri" causing DPKG icinga alert. ri means it should be removed but is not) |
[production] |
08:33 |
<XioNoX> |
restart cr1-codfw:fpc0 - T254110 |
[production] |
08:22 |
<mutante> |
mw1331 re-enabled puppet (SAL told me about an experiment a little while ago) |
[production] |
08:19 |
<jynus> |
disabling puppet on all db/es/pc hosts for deploy of gerrit:599596 |
[production] |
07:05 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1142 to clone db1147 T252512', diff saved to https://phabricator.wikimedia.org/P11339 and previous config saved to /var/cache/conftool/dbconfig/20200601-070519-marostegui.json |
[production] |
05:03 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool enwiki db2071 slave to test new index - T238966', diff saved to https://phabricator.wikimedia.org/P11338 and previous config saved to /var/cache/conftool/dbconfig/20200601-050354-marostegui.json |
[production] |
04:54 |
<marostegui> |
Drop testreduce_0715 from m5 master T245408 |
[production] |
04:44 |
<marostegui> |
Depool db1141 from Analytics role - T249188 |
[production] |
2020-05-29
§
|
22:32 |
<bstorm_> |
updated views on labsdb1010 T252219 |
[production] |
20:55 |
<bstorm_> |
updating views on labsdb1011 T252219 |
[production] |
19:27 |
<ryankemper> |
Successfully finished a rolling restart of the `cloudelastic` clusters (chi, psi, omega) as part of elasticsearch plugins upgrade. Host and service checks re-enabled. |
[production] |
17:28 |
<bstorm_> |
updating views on labsdb1009 T252219 |
[production] |
16:50 |
<ryankemper> |
Performing a rolling restart of the `cloudelastic` clusters (chi, psi, omega) as part of elasticsearch plugins upgrade. Host and service checks disabled. |
[production] |
16:00 |
<bstorm_> |
Updating views on labsdb1012 T252219 |
[production] |
15:59 |
<ryankemper> |
Concluded rolling restart of the `relforge` clusters as part of elasticsearch plugins upgrade. Both hosts `relforge1001` and `relforge1002` are back up. Downtime lifted. |
[production] |
15:29 |
<ryankemper> |
Performing a rolling restart of the `relforge` clusters as part of elasticsearch plugins upgrade |
[production] |
14:59 |
<cdanis> |
disabling puppet on netflow* to deploy Ic71e96f0 T253128 |
[production] |
14:47 |
<akosiaris@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . |
[production] |
14:47 |
<akosiaris@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . |
[production] |
14:41 |
<akosiaris@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . |
[production] |
14:41 |
<akosiaris@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . |
[production] |
14:35 |
<akosiaris@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . |
[production] |
14:35 |
<akosiaris@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' . |
[production] |
14:27 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
14:24 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
14:15 |
<mdholloway> |
ran extensions/MachineVision/maintenance/removeBlacklistedSuggestions.php on commonswiki (T253821) |
[production] |
12:49 |
<hnowlan> |
reimaging restbase2009 after disk replacement |
[production] |
12:37 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
12:35 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
12:15 |
<godog> |
roll-restart to upgrade thanos to 0.13.0rc0 - T252186 T233956 |
[production] |
11:32 |
<moritzm> |
installing cups security updates (client-side libs/tools) |
[production] |
11:01 |
<ema> |
upload prometheus-rdkafka-exporter 0.2 to buster-wikimedia T253551 |
[production] |
10:53 |
<moritzm> |
updating mwdebug2002 to 7.2.31 |
[production] |
10:02 |
<marostegui> |
Compress InnoDB on db1138 T232446 |
[production] |
08:30 |
<godog> |
update swift uid/gid on thanos hosts - T123918 |
[production] |
08:04 |
<mutante> |
phabricator - restarted apache2 - back for me now |
[production] |
08:03 |
<XioNoX> |
add new AMS-IX link to LACP bundle |
[production] |
08:01 |
<mutante> |
phabricator - broken due to "PhabricatorRepositoryMirrorEngine::pushToGitRepository" starting git process that uses 100% CPU, stopped phd service |
[production] |
07:56 |
<mutante> |
phabricator - killed pid 25070 (git) which used 100% of CPU, restarted phd service |
[production] |
07:25 |
<moritzm> |
updating perf on buster systems to new version from 10.4 point release |
[production] |
07:15 |
<moritzm> |
installing el-api update from latest Buster point release |
[production] |
07:12 |
<moritzm> |
installing xdg-utils update from latest Buster point release |
[production] |
07:11 |
<mutante> |
mw1293 (canary jobrunner ) replace apache2.conf with version from mwdebug1001, restart apache, to debug for T190111 |
[production] |
07:00 |
<moritzm> |
installing rake security updates |
[production] |