2021-03-08
§
|
14:54 |
<elukey> |
drain + reimage an-worker110[7,8] to Buster |
[analytics] |
14:52 |
<ottomata> |
altered topics (eqiad|codfw).mediawiki.client.session_tick to have 2 partitions - T276502 |
[analytics] |
13:51 |
<elukey> |
drain + reimage an-worker110[4,5] to Buster |
[analytics] |
10:41 |
<elukey> |
drain + reimage an-worker1104/1089 to Debian Buster |
[analytics] |
09:19 |
<elukey> |
drain + reimage an-worker108[3,4] to Buster |
[analytics] |
08:20 |
<elukey> |
drain + reimage an-worker108[1,2] to Buster |
[analytics] |
07:23 |
<elukey> |
drain + reimage analytics107[4,5] to Buster |
[analytics] |
2021-03-05
§
|
18:30 |
<razzi> |
run again sudo -i wmf-auto-reimage-host -p T269211 clouddb1021.eqiad.wmnet --new |
[analytics] |
18:18 |
<razzi> |
sudo cookbook sre.dns.netbox -t T269211 "Move clouddb1021 to private vlan" |
[analytics] |
18:17 |
<razzi> |
re-run interface_automation.ProvisionServerNetwork with private vlan |
[analytics] |
18:16 |
<razzi> |
delete non-mgmt interface for clouddb1021 |
[analytics] |
17:07 |
<razzi> |
sudo -i wmf-auto-reimage-host -p T269211 clouddb1021.eqiad.wmnet --new |
[analytics] |
16:54 |
<razzi> |
sudo cookbook sre.dns.netbox -t T269211 "Reimage and rename labsdb1012 to clouddb1021" |
[analytics] |
16:52 |
<razzi> |
run script at https://netbox.wikimedia.org/extras/scripts/interface_automation.ProvisionServerNetwork/ |
[analytics] |
16:47 |
<razzi> |
edit https://netbox.wikimedia.org/dcim/devices/2078/ device name from labsdb1012 to clouddb1021 |
[analytics] |
16:30 |
<razzi> |
delete non-mgmt interfaces for labsdb1012 at https://netbox.wikimedia.org/dcim/devices/2078/interfaces/ |
[analytics] |
16:28 |
<razzi> |
rename https://netbox.wikimedia.org/ipam/ip-addresses/734/ DNS name from labsdb1012.mgmt.eqiad.wmnet to clouddb1021.mgmt.eqiad.wmnet |
[analytics] |
16:08 |
<razzi> |
sudo cookbook sre.hosts.decommission labsdb1012.eqiad.wmnet -t T269211 |
[analytics] |
15:52 |
<razzi> |
stop mariadb on labsdb1012 |
[analytics] |
15:39 |
<razzi> |
rebalance kafka partitions for webrequest_upload partition 10 |
[analytics] |
15:07 |
<elukey> |
drain + reimage analytics1073 and an-worker1086 to Debian Buster |
[analytics] |
13:36 |
<elukey> |
roll restart HDFS Namenodes for the Hadoop cluster to pick up new Xmx settings (https://gerrit.wikimedia.org/r/c/operations/puppet/+/668659) |
[analytics] |
10:20 |
<elukey> |
force run of refinery-druid-drop-public-snapshots to check Druid public's performances |
[analytics] |
10:06 |
<elukey> |
failover HDFS Namenode from 1002 to 1001 (high GC pauses triggered the HDFS zkfc daemon on 1001 and the failover to 1002) |
[analytics] |
08:32 |
<elukey> |
drain + reimage an-worker107[8,9] to Debian Buster (one Journal node included) |
[analytics] |
07:22 |
<elukey> |
drain + reimage analytics107[0-1] to debian buster |
[analytics] |
07:13 |
<elukey> |
add analytis1066 back with /dev/sdb removed |
[analytics] |
07:01 |
<elukey> |
stop hadoop daemons on analytics1066 - disk errors on /dev/sdb after reimage |
[analytics] |
2021-03-04
§
|
21:19 |
<razzi> |
rebalance kafka partitions for webrequest_upload partition 9 |
[analytics] |
16:27 |
<elukey> |
drain + reimage analytics106[8,9] to Debian Buster (one is a journalnode) |
[analytics] |
15:12 |
<elukey> |
drain + reimage analytics106[6,7] to Debian Buster |
[analytics] |
14:21 |
<elukey> |
drain + reimage analytics1065 to Debian Buster |
[analytics] |
13:32 |
<elukey> |
drain + reimage analytics10[63,64] to Debian Buster |
[analytics] |
12:48 |
<elukey> |
drain + reimage analytics10[61,62] to Debian Buster |
[analytics] |
10:40 |
<elukey> |
drain + reimage analytics1059/1060 to Debian Buster |
[analytics] |
09:32 |
<elukey> |
reboot an-worker[1097-1101] (GPU workers) to pick up the new kernel (5.10) |
[analytics] |
09:02 |
<elukey> |
kill/start mediawiki-geoeditors-monthly to apply backtick change (hive script) |
[analytics] |
08:48 |
<elukey> |
deploy refinery to hdfs |
[analytics] |
08:34 |
<elukey> |
deploy refinery to fix https://gerrit.wikimedia.org/r/c/analytics/refinery/+/668111 |
[analytics] |
07:38 |
<elukey> |
reboot an-worker1096 to pick up 5.10 kernel |
[analytics] |