2020-06-01
§
|
11:30 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1142, db1147 T252512', diff saved to https://phabricator.wikimedia.org/P11343 and previous config saved to /var/cache/conftool/dbconfig/20200601-113032-marostegui.json |
[production] |
10:49 |
<jdrewniak@deploy1001> |
Synchronized portals: Wikimedia Portals Update: [[gerrit:601328| Bumping portals to master (601328)]] (duration: 00m 59s) |
[production] |
10:48 |
<jdrewniak@deploy1001> |
Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:601328| Bumping portals to master (601328)]] (duration: 01m 03s) |
[production] |
10:28 |
<Reedy> |
webservice restart |
[tools.meta] |
10:08 |
<RhinosF1> |
END MAINT -- T254046 |
[tools.zppixbot] |
10:00 |
<RhinosF1> |
sopel.bot re-created for T254046 |
[tools.zppixbot] |
09:56 |
<RhinosF1> |
webservice --backend=kubernetes php7.2 start --canonical for T254046 |
[tools.zppixbot] |
09:55 |
<RhinosF1> |
revert sitenotice for T254046 |
[tools.zppixbot] |
09:45 |
<RhinosF1> |
empty pycache and switch to new deployment config |
[tools.zppixbot] |
09:37 |
<volans@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
09:34 |
<RhinosF1> |
tar+gzip sopel logs, empty non sopel rubbish |
[tools.zppixbot] |
09:31 |
<RhinosF1> |
cleaned known_users |
[tools.zppixbot] |
09:30 |
<volans@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
09:27 |
<RhinosF1> |
cleaned up channel_values |
[tools.zppixbot] |
09:26 |
<jynus> |
reenabling puppet on all db/es/pc hosts after deploy of gerrit:599596 |
[production] |
09:26 |
<RhinosF1> |
dropped channels noot watched from seen database |
[tools.zppixbot] |
09:22 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1142, db1147 T252512', diff saved to https://phabricator.wikimedia.org/P11342 and previous config saved to /var/cache/conftool/dbconfig/20200601-092220-marostegui.json |
[production] |
09:18 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Add db1147 to dbctl, depooled T252512', diff saved to https://phabricator.wikimedia.org/P11341 and previous config saved to /var/cache/conftool/dbconfig/20200601-091809-marostegui.json |
[production] |
09:06 |
<filippo@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) |
[production] |
09:05 |
<filippo@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
09:05 |
<filippo@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) |
[production] |
09:05 |
<XioNoX> |
offline cr1-codfw:fpc0 - T254110 |
[production] |
09:05 |
<filippo@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
09:04 |
<filippo@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) |
[production] |
09:03 |
<filippo@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
09:01 |
<RhinosF1> |
deleted sopel.bot deployment and stopped webservice - START T254046 |
[tools.zppixbot] |
08:58 |
<godog> |
prometheus eqiad lvextend --resizefs --size +100G vg-ssd/prometheus-ops |
[production] |
08:43 |
<mutante> |
deneb - apt-get remove --purge apt-listchanges (packages was in status "rc" causing DPKG alert, should be removed but config was not purged) |
[production] |
08:41 |
<mutante> |
deneb - apt-get remove python3-debconf (package was in status "ri" causing DPKG icinga alert. ri means it should be removed but is not) |
[production] |
08:33 |
<XioNoX> |
restart cr1-codfw:fpc0 - T254110 |
[production] |
08:22 |
<mutante> |
mw1331 re-enabled puppet (SAL told me about an experiment a little while ago) |
[production] |
08:19 |
<jynus> |
disabling puppet on all db/es/pc hosts for deploy of gerrit:599596 |
[production] |
08:17 |
<RhinosF1> |
upload starter-new.sh and switched sopelbot.yaml foor T254046 |
[tools.zppixbot] |
07:46 |
<RF1dle> |
add notice for T254046 to wiki index about |
[tools.zppixbot] |
07:05 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1142 to clone db1147 T252512', diff saved to https://phabricator.wikimedia.org/P11339 and previous config saved to /var/cache/conftool/dbconfig/20200601-070519-marostegui.json |
[production] |
06:53 |
<elukey> |
re-run virtualpageview-hourly-wf-2020-5-31-19 |
[analytics] |
06:28 |
<elukey> |
temporary stop of all RU jobs on an-launcher1001 to priviledge camus and others |
[analytics] |
06:03 |
<elukey> |
kill all airflow-related processes on an-launcher1001 - host killing tasks due to OOM |
[analytics] |
05:03 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool enwiki db2071 slave to test new index - T238966', diff saved to https://phabricator.wikimedia.org/P11338 and previous config saved to /var/cache/conftool/dbconfig/20200601-050354-marostegui.json |
[production] |
04:54 |
<marostegui> |
Drop testreduce_0715 from m5 master T245408 |
[production] |
04:44 |
<marostegui> |
Depool db1141 from Analytics role - T249188 |
[production] |
00:39 |
<bd808> |
Ugh. Prior SAL message was about tools-sgeexec-0940 |
[tools] |
00:39 |
<bd808> |
Compressed /var/log/account/pacct.0 ahead of rotation schedule to free some space on the root partition |
[tools] |
00:31 |
<bd808> |
Also, why is tools.squirrelnestbot running a job for tools.unblockbot? |
[tools.squirrelnestbot] |
00:31 |
<bd808> |
Stopped grid job running tools.unblockbot/unblockbot.sh. Script is in an infinite crash loop because it does not handle https properly. |
[tools.squirrelnestbot] |