2021-03-09 §
22:00 <razzi> rebalance kafka partitions for webrequest_upload partition 14 [analytics]
20:42 <elukey> reimaged an-worker1091 to buster [analytics]
18:26 <elukey> reimage an-worker1087 to buster [analytics]
16:40 <elukey> reimage analytics1077 to buster [analytics]
15:36 <razzi> rebalance kafka partitions for webrequest_upload partition 13 [analytics]
15:18 <elukey> reimage analytics1072 (hadoop hdfs journal node) to buster [analytics]
14:29 <elukey> drain + reimage an-worker1090/89 to Buster [analytics]
13:26 <elukey> reimage an-worker1102 and an-worker1080 (hdfs journal node) to Buster [analytics]
12:59 <elukey> drain + reimage an-worker1103 to Buster [analytics]
09:14 <elukey> drain + reimage analytics1076 and an-worker1112 to Buster [analytics]
07:01 <elukey> drain + reimage an-worker109[4,5] to Buster [analytics]
2021-03-08 §
23:22 <razzi> rebalance kafka partitions for webrequest_upload partition 12 [analytics]
18:49 <razzi> rebalance kafka partitions for webrequest_upload partition 11 [analytics]
18:11 <elukey> drain + reimage an-worker11[15,16] to Buster [analytics]
17:12 <elukey> drain + reimage an-worker11[13,14] to Buster [analytics]
16:17 <elukey> drain + reimage an-worker1109/1110 to Buster [analytics]
14:54 <elukey> drain + reimage an-worker110[7,8] to Buster [analytics]
14:52 <ottomata> altered topics (eqiad|codfw).mediawiki.client.session_tick to have 2 partitions - T276502 [analytics]
13:51 <elukey> drain + reimage an-worker110[4,5] to Buster [analytics]
10:41 <elukey> drain + reimage an-worker1104/1089 to Debian Buster [analytics]
09:19 <elukey> drain + reimage an-worker108[3,4] to Buster [analytics]
08:20 <elukey> drain + reimage an-worker108[1,2] to Buster [analytics]
07:23 <elukey> drain + reimage analytics107[4,5] to Buster [analytics]
2021-03-07 §
08:00 <elukey> "megacli -LDSetProp -ForcedWB -Immediate -Lall -aAll" on analytics1066 [analytics]
07:49 <elukey> umount /var/lib/hadoop/data/e on analytics1059 and restart hadoop daemons to exclude failed disk - T276696 [analytics]
2021-03-05 §
18:30 <razzi> run again sudo -i wmf-auto-reimage-host -p T269211 clouddb1021.eqiad.wmnet --new [analytics]
18:18 <razzi> sudo cookbook sre.dns.netbox -t T269211 "Move clouddb1021 to private vlan" [analytics]
18:17 <razzi> re-run interface_automation.ProvisionServerNetwork with private vlan [analytics]
18:16 <razzi> delete non-mgmt interface for clouddb1021 [analytics]
17:07 <razzi> sudo -i wmf-auto-reimage-host -p T269211 clouddb1021.eqiad.wmnet --new [analytics]
16:54 <razzi> sudo cookbook sre.dns.netbox -t T269211 "Reimage and rename labsdb1012 to clouddb1021" [analytics]
16:52 <razzi> run script at https://netbox.wikimedia.org/extras/scripts/interface_automation.ProvisionServerNetwork/ [analytics]
16:47 <razzi> edit https://netbox.wikimedia.org/dcim/devices/2078/ device name from labsdb1012 to clouddb1021 [analytics]
16:30 <razzi> delete non-mgmt interfaces for labsdb1012 at https://netbox.wikimedia.org/dcim/devices/2078/interfaces/ [analytics]
16:28 <razzi> rename https://netbox.wikimedia.org/ipam/ip-addresses/734/ DNS name from labsdb1012.mgmt.eqiad.wmnet to clouddb1021.mgmt.eqiad.wmnet [analytics]
16:08 <razzi> sudo cookbook sre.hosts.decommission labsdb1012.eqiad.wmnet -t T269211 [analytics]
15:52 <razzi> stop mariadb on labsdb1012 [analytics]
15:39 <razzi> rebalance kafka partitions for webrequest_upload partition 10 [analytics]
15:07 <elukey> drain + reimage analytics1073 and an-worker1086 to Debian Buster [analytics]
13:36 <elukey> roll restart HDFS Namenodes for the Hadoop cluster to pick up new Xmx settings (https://gerrit.wikimedia.org/r/c/operations/puppet/+/668659) [analytics]
10:20 <elukey> force run of refinery-druid-drop-public-snapshots to check Druid public's performances [analytics]
10:06 <elukey> failover HDFS Namenode from 1002 to 1001 (high GC pauses triggered the HDFS zkfc daemon on 1001 and the failover to 1002) [analytics]
08:32 <elukey> drain + reimage an-worker107[8,9] to Debian Buster (one Journal node included) [analytics]
07:22 <elukey> drain + reimage analytics107[0-1] to debian buster [analytics]
07:13 <elukey> add analytis1066 back with /dev/sdb removed [analytics]
07:01 <elukey> stop hadoop daemons on analytics1066 - disk errors on /dev/sdb after reimage [analytics]
2021-03-04 §
21:19 <razzi> rebalance kafka partitions for webrequest_upload partition 9 [analytics]
16:27 <elukey> drain + reimage analytics106[8,9] to Debian Buster (one is a journalnode) [analytics]
15:12 <elukey> drain + reimage analytics106[6,7] to Debian Buster [analytics]
14:21 <elukey> drain + reimage analytics1065 to Debian Buster [analytics]