251-300 of 10000 results (21ms)
2021-05-25 ยง
19:20 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
19:17 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
19:17 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
19:12 <twentyafterfour@deploy1002> Finished scap: testwikis wikis to 1.37.0-wmf.7 (duration: 33m 29s) [production]
19:12 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
18:38 <twentyafterfour@deploy1002> Started scap: testwikis wikis to 1.37.0-wmf.7 [production]
18:16 <razzi> sudo systemctl start all failed units from `systemctl list-units --state=failed` on an-launcher1002 [analytics]
18:14 <razzi> sudo systemctl start eventlogging_to_druid_navigationtiming_hourly.service [analytics]
18:08 <krinkle@deploy1002> Synchronized wmf-config/CommonSettings.php: I2ebe9674fb109f (duration: 00m 56s) [production]
18:01 <razzi> manually edit /etc/hadoop/conf/capacity-scheduler.xml to make queues running and sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues [analytics]
17:52 <razzi> sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues on an-master1001 and an-master1002 [analytics]
17:34 <Krinkle> mwmaint1002: Running purge-parsercache-now.php on server 2/4 (pc1007, depooled spare). Ref P16060, T280605, T282761. [production]
17:30 <marostegui@cumin1001> dbctl commit (dc=all): 'db1164 (re)pooling @ 100%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16207 and previous config saved to /var/cache/conftool/dbconfig/20210525-173031-root.json [production]
17:28 <razzi> sudo systemctl restart refine_eventlogging_legacy [analytics]
17:28 <razzi> sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues to enable submitting jobs once again [analytics]
17:22 <effie> disable puppet on mc2019 (for tests) [production]
17:15 <marostegui@cumin1001> dbctl commit (dc=all): 'db1164 (re)pooling @ 75%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16206 and previous config saved to /var/cache/conftool/dbconfig/20210525-171527-root.json [production]
17:14 <andrewbogott> deleting old ingress controllers toolsbeta-test-k8s-ingress-1 and toolsbeta-test-k8s-ingress-2 [toolsbeta]
17:13 <andrewbogott> created two new ingress nodes, toolsbeta-test-k8s-ingress-4 and toolsbeta-test-k8s-ingress-5 [toolsbeta]
17:07 <razzi> re-enabled puppet on an-masters and an-launcher [analytics]
17:04 <razzi> sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode leave [analytics]
17:03 <razzi> sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet [analytics]
17:00 <marostegui@cumin1001> dbctl commit (dc=all): 'db1164 (re)pooling @ 50%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16205 and previous config saved to /var/cache/conftool/dbconfig/20210525-170024-root.json [production]
16:45 <marostegui@cumin1001> dbctl commit (dc=all): 'db1164 (re)pooling @ 25%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16203 and previous config saved to /var/cache/conftool/dbconfig/20210525-164520-root.json [production]
16:43 <razzi> sudo systemctl restart hadoop-hdfs-namenode on an-master1001 [analytics]
16:38 <razzi> sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -saveNamespace [analytics]
16:35 <razzi> sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode enter [analytics]
16:28 <razzi> sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet [analytics]
16:23 <razzi> sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode leave [analytics]
16:14 <bd808> Closed #wikimedia-cloud-admin on f***node [admin]
16:11 <bd808> Closed #wikimedia-cloud-feed on f***node [admin]
16:06 <razzi> sudo systemctl restart hadoop-hdfs-namenode [analytics]
15:52 <razzi> checkpoint hdfs with sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -saveNamespace [analytics]
15:51 <razzi> enable safe mode on an-master1001 with sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode enter [analytics]
15:36 <razzi> disable puppet on an-master1001.eqiad.wmnet and an-master1002.eqiad.wmnet again [analytics]
15:35 <razzi> re-enable puppet on an-masters, run puppet, and sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues [analytics]
15:32 <razzi> disable puppet on an-master1001.eqiad.wmnet and an-master1002.eqiad.wmnet [analytics]
15:19 <dcaro> rebooted cloudvirt1020, starting VMs (T275893) [admin]
15:13 <dcaro> rebooting cloudvirt1020 (T275893) [admin]
15:09 <dcaro> turning off VM toolsbeta-test-k8s-etcd-14 to be able to reboot cloudvirt1020 [toolsbeta]
14:42 <dcaro> taking cloudvirt1020 out for maintenance (openstack wise) so no new VMs are scheduled on it (T275893) [admin]
14:39 <razzi> stop puppet on an-launcher and stop hadoop-related timers [analytics]
14:38 <wm-bot> <bd808> Restart to fix irc connections. This is getting really boring. [tools.bridgebot]
14:35 <dcaro> taking down clouddb1002 replica for reboot of cloudvirt1020 (T275893) [clouddb-services]
12:55 <urbanecm@deploy1002> Synchronized static/images/project-logos/: 63ad5fda: Revert "Add svwiki 20th anniversary logos" (T282389) (duration: 00m 56s) [production]
12:52 <urbanecm@deploy1002> Synchronized wmf-config/logos.php: 94ede526: Revert "Use svwiki 20th anniversary logos" (T282389) (duration: 00m 56s) [production]
12:21 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1164', diff saved to https://phabricator.wikimedia.org/P16200 and previous config saved to /var/cache/conftool/dbconfig/20210525-122127-marostegui.json [production]
12:07 <marostegui@cumin1001> dbctl commit (dc=all): 'remove db1124 from dbctl', diff saved to https://phabricator.wikimedia.org/P16199 and previous config saved to /var/cache/conftool/dbconfig/20210525-120718-marostegui.json [production]
11:35 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1124 will be moved to the test cluster', diff saved to https://phabricator.wikimedia.org/P16198 and previous config saved to /var/cache/conftool/dbconfig/20210525-113521-marostegui.json [production]
11:26 <hnowlan@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport [production]