2021-06-15
§
|
16:31 |
<bstorm> |
truncated 26GB error.log T284964 |
[tools.stimmberechtigung] |
16:16 |
<razzi> |
sudo systemctl stop 'hadoop-*' on an-master1002 |
[analytics] |
16:15 |
<majavah> |
deleting unused shutdown nodes: tools-checker-03 tools-k8s-haproxy-1 tools-k8s-haproxy-2 |
[tools] |
16:14 |
<razzi> |
sudo systemctl stop hadoop-* on an-master1001, then realize I meant to do this on an-master1002, so start hadoop-* |
[analytics] |
16:12 |
<balloons> |
add 8 CPU/16G RAM to quota T284973 |
[metricsinfra] |
16:11 |
<razzi> |
downtime an-master1002 |
[analytics] |
16:11 |
<razzi@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 60 days, 0:00:00 on an-master1002.eqiad.wmnet with reason: Update operating system to bullseye |
[production] |
16:11 |
<razzi@cumin1001> |
START - Cookbook sre.hosts.downtime for 60 days, 0:00:00 on an-master1002.eqiad.wmnet with reason: Update operating system to bullseye |
[production] |
16:09 |
<majavah> |
set toolsbeta-bastion-05 as grid submit host |
[toolsbeta] |
16:08 |
<bstorm> |
truncated 28GB person_bkl2.out T284964 |
[tools.persondata] |
15:55 |
<razzi> |
sudo transfer.py an-master1001.eqiad.wmnet:/srv/hadoop/backup/hdfs-namenode-snapshot-buster-reimage-2021-06-15.tar.gz stat1004.eqiad.wmnet:/home/razzi/hdfs-namenode-fsimage |
[analytics] |
15:54 |
<bstorm> |
truncated 42GB virgule.err file T284964 |
[tools.robokobot] |
15:42 |
<razzi> |
tar -czf /srv/hadoop/backup/hdfs-namenode-snapshot-buster-reimage-$(date --iso-8601).tar.gz current on an-master1001 |
[analytics] |
15:38 |
<razzi> |
backup /srv/hadoop/name/current to /home/razzi/hdfs-namenode-snapshot-buster-reimage-2021-06-15.tar.gz on an-master1001 |
[analytics] |
15:33 |
<razzi> |
sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -saveNamespace |
[analytics] |
15:28 |
<MacFan4000> |
copied freenode channel config for #wikimedia-fundraising to libera |
[wm-bot] |
15:27 |
<razzi> |
sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode enter |
[analytics] |
15:25 |
<razzi> |
kill running yarn applications via for loop |
[analytics] |
15:11 |
<razzi> |
sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues |
[analytics] |
15:09 |
<razzi> |
disable puppet on an-mastesr |
[analytics] |
15:08 |
<razzi> |
run puppet on an-masters to update capacity-scheduler.xml |
[analytics] |
15:02 |
<razzi> |
disable puppet on an-masters |
[analytics] |
15:01 |
<razzi> |
sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues to stop queues |
[analytics] |
14:55 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
14:51 |
<cmjohnson@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
14:35 |
<razzi> |
disable jobs that use hadoop on an-launcher1002 following https://phabricator.wikimedia.org/T278423#7094641 |
[analytics] |
14:25 |
<XioNoX> |
re-enable cr1-codfw:xe-5/1/2 |
[production] |
13:23 |
<marostegui> |
Upgrade clouddb1018 |
[production] |
13:15 |
<effie> |
enable puppet on canaries |
[production] |
13:10 |
<effie> |
disable puppet on canaries to deploy 699908 |
[production] |
12:54 |
<MacFan4000> |
killed a few lingering connections to freenode (wm-bot on freenode is now discontinued) |
[wm-bot] |
10:45 |
<XioNoX> |
re-enable cr1-codfw:xe-5/1/2 |
[production] |
09:42 |
<XioNoX> |
cr1-codfw# set interfaces xe-5/1/2 disable |
[production] |
09:25 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repool db2080', diff saved to https://phabricator.wikimedia.org/P16533 and previous config saved to /var/cache/conftool/dbconfig/20210615-092511-marostegui.json |
[production] |
09:24 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repool db2086:3318, db2082', diff saved to https://phabricator.wikimedia.org/P16532 and previous config saved to /var/cache/conftool/dbconfig/20210615-092409-marostegui.json |
[production] |
09:08 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db2086:3318', diff saved to https://phabricator.wikimedia.org/P16531 and previous config saved to /var/cache/conftool/dbconfig/20210615-090802-marostegui.json |
[production] |
09:06 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Pool db2083', diff saved to https://phabricator.wikimedia.org/P16530 and previous config saved to /var/cache/conftool/dbconfig/20210615-090650-marostegui.json |
[production] |
09:02 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Pool db2084', diff saved to https://phabricator.wikimedia.org/P16529 and previous config saved to /var/cache/conftool/dbconfig/20210615-090243-marostegui.json |
[production] |
09:02 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Pool db2081', diff saved to https://phabricator.wikimedia.org/P16528 and previous config saved to /var/cache/conftool/dbconfig/20210615-090206-marostegui.json |
[production] |
08:59 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db2082', diff saved to https://phabricator.wikimedia.org/P16527 and previous config saved to /var/cache/conftool/dbconfig/20210615-085953-marostegui.json |
[production] |
08:59 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repool db2091', diff saved to https://phabricator.wikimedia.org/P16526 and previous config saved to /var/cache/conftool/dbconfig/20210615-085938-marostegui.json |
[production] |
08:32 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db2080 db2083 db2084 db2091', diff saved to https://phabricator.wikimedia.org/P16525 and previous config saved to /var/cache/conftool/dbconfig/20210615-083233-marostegui.json |
[production] |
08:28 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db2081', diff saved to https://phabricator.wikimedia.org/P16524 and previous config saved to /var/cache/conftool/dbconfig/20210615-082857-marostegui.json |
[production] |
06:10 |
<XioNoX> |
roll OSPF link-protection to all routers - T167306 |
[production] |
02:30 |
<eileen> |
civicrm revision changed from d9d61dad0b to acbcce94a2, config revision is 2aed6ff89b |
[production] |
01:22 |
<eileen> |
civicrm revision changed from 28ace1b86f to d9d61dad0b, config revision is 2aed6ff89b |
[production] |
01:18 |
<bstorm> |
running a modified version of the prometheus dir size cron in screen T284964 |
[admin] |
00:40 |
<bstorm> |
truncated 4GB uwsgi.log to free space T284968 |
[tools.eatchabot] |
00:37 |
<eileen> |
civicrm revision changed from 31d07115a0 to 28ace1b86f, config revision is 2aed6ff89b |
[production] |