2021-05-25
ยง
|
16:43 |
<razzi> |
sudo systemctl restart hadoop-hdfs-namenode on an-master1001 |
[analytics] |
16:38 |
<razzi> |
sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -saveNamespace |
[analytics] |
16:35 |
<razzi> |
sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode enter |
[analytics] |
16:28 |
<razzi> |
sudo -u hdfs /usr/bin/hdfs haadmin -failover an-master1002-eqiad-wmnet an-master1001-eqiad-wmnet |
[analytics] |
16:23 |
<razzi> |
sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode leave |
[analytics] |
16:14 |
<bd808> |
Closed #wikimedia-cloud-admin on f***node |
[admin] |
16:11 |
<bd808> |
Closed #wikimedia-cloud-feed on f***node |
[admin] |
16:06 |
<razzi> |
sudo systemctl restart hadoop-hdfs-namenode |
[analytics] |
15:52 |
<razzi> |
checkpoint hdfs with sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -saveNamespace |
[analytics] |
15:51 |
<razzi> |
enable safe mode on an-master1001 with sudo -u hdfs kerberos-run-command hdfs hdfs dfsadmin -safemode enter |
[analytics] |
15:36 |
<razzi> |
disable puppet on an-master1001.eqiad.wmnet and an-master1002.eqiad.wmnet again |
[analytics] |
15:35 |
<razzi> |
re-enable puppet on an-masters, run puppet, and sudo -u yarn kerberos-run-command yarn yarn rmadmin -refreshQueues |
[analytics] |
15:32 |
<razzi> |
disable puppet on an-master1001.eqiad.wmnet and an-master1002.eqiad.wmnet |
[analytics] |
15:19 |
<dcaro> |
rebooted cloudvirt1020, starting VMs (T275893) |
[admin] |
15:13 |
<dcaro> |
rebooting cloudvirt1020 (T275893) |
[admin] |
15:09 |
<dcaro> |
turning off VM toolsbeta-test-k8s-etcd-14 to be able to reboot cloudvirt1020 |
[toolsbeta] |
14:42 |
<dcaro> |
taking cloudvirt1020 out for maintenance (openstack wise) so no new VMs are scheduled on it (T275893) |
[admin] |
14:39 |
<razzi> |
stop puppet on an-launcher and stop hadoop-related timers |
[analytics] |
14:38 |
<wm-bot> |
<bd808> Restart to fix irc connections. This is getting really boring. |
[tools.bridgebot] |
14:35 |
<dcaro> |
taking down clouddb1002 replica for reboot of cloudvirt1020 (T275893) |
[clouddb-services] |
12:55 |
<urbanecm@deploy1002> |
Synchronized static/images/project-logos/: 63ad5fda: Revert "Add svwiki 20th anniversary logos" (T282389) (duration: 00m 56s) |
[production] |
12:52 |
<urbanecm@deploy1002> |
Synchronized wmf-config/logos.php: 94ede526: Revert "Use svwiki 20th anniversary logos" (T282389) (duration: 00m 56s) |
[production] |
12:21 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1164', diff saved to https://phabricator.wikimedia.org/P16200 and previous config saved to /var/cache/conftool/dbconfig/20210525-122127-marostegui.json |
[production] |
12:07 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'remove db1124 from dbctl', diff saved to https://phabricator.wikimedia.org/P16199 and previous config saved to /var/cache/conftool/dbconfig/20210525-120718-marostegui.json |
[production] |
11:35 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1124 will be moved to the test cluster', diff saved to https://phabricator.wikimedia.org/P16198 and previous config saved to /var/cache/conftool/dbconfig/20210525-113521-marostegui.json |
[production] |
11:26 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport |
[production] |
11:26 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport |
[production] |
11:21 |
<Lucas_WMDE> |
EU backport&config window done |
[production] |
11:20 |
<lucaswerkmeister-wmde@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:679327|Change HTTP to HTTPS for concept URIs on Commons (T258590)]] (duration: 00m 56s) |
[production] |
11:17 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P16196 and previous config saved to /var/cache/conftool/dbconfig/20210525-111719-root.json |
[production] |
11:02 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P16195 and previous config saved to /var/cache/conftool/dbconfig/20210525-110215-root.json |
[production] |
10:47 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P16194 and previous config saved to /var/cache/conftool/dbconfig/20210525-104711-root.json |
[production] |
10:32 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P16193 and previous config saved to /var/cache/conftool/dbconfig/20210525-103208-root.json |
[production] |
09:58 |
<ema> |
cp3054: upgrade varnish to latest LTS (6.0.7-1wm1) T264398 |
[production] |
09:28 |
<jynus> |
updating puppet facts on cloud from puppetmaster1001 |
[production] |
09:05 |
<kormat@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc[2007,2010].codfw.wmnet,pc1007.eqiad.wmnet with reason: Purging parsercache T282761 |
[production] |
09:05 |
<kormat@cumin1001> |
START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc[2007,2010].codfw.wmnet,pc1007.eqiad.wmnet with reason: Purging parsercache T282761 |
[production] |
09:01 |
<kormat> |
stopping replication on pc1010 T282761 |
[production] |
09:00 |
<kormat@deploy1002> |
Synchronized wmf-config/db-eqiad.php: Set pc1010 as pc1 primary T282761 (duration: 00m 58s) |
[production] |
08:57 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
08:52 |
<marostegui@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
08:34 |
<wm-bot> |
<poslovitch> [job] Refactored the script to query up to 2-days old recordings since recordings' 'date' are stored with a 1-day precision |
[tools.lingua-libre-bot] |
08:20 |
<jynus@cumin2001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on backup2007.codfw.wmnet with reason: REIMAGE |
[production] |
08:18 |
<jynus@cumin2001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on backup2006.codfw.wmnet with reason: REIMAGE |
[production] |
08:17 |
<jynus@cumin2001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on backup2007.codfw.wmnet with reason: REIMAGE |
[production] |
08:16 |
<jynus@cumin2001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on backup2005.codfw.wmnet with reason: REIMAGE |
[production] |
08:16 |
<jynus@cumin2001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on backup2006.codfw.wmnet with reason: REIMAGE |
[production] |
08:14 |
<jynus@cumin2001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on backup2005.codfw.wmnet with reason: REIMAGE |
[production] |
08:02 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: Repool db1184', diff saved to https://phabricator.wikimedia.org/P16192 and previous config saved to /var/cache/conftool/dbconfig/20210525-080234-root.json |
[production] |
07:49 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P16191 and previous config saved to /var/cache/conftool/dbconfig/20210525-074950-marostegui.json |
[production] |