2021-03-05
§
|
16:47 |
<razzi> |
edit https://netbox.wikimedia.org/dcim/devices/2078/ device name from labsdb1012 to clouddb1021 |
[analytics] |
16:30 |
<razzi> |
delete non-mgmt interfaces for labsdb1012 at https://netbox.wikimedia.org/dcim/devices/2078/interfaces/ |
[analytics] |
16:28 |
<razzi> |
rename https://netbox.wikimedia.org/ipam/ip-addresses/734/ DNS name from labsdb1012.mgmt.eqiad.wmnet to clouddb1021.mgmt.eqiad.wmnet |
[analytics] |
16:08 |
<razzi> |
sudo cookbook sre.hosts.decommission labsdb1012.eqiad.wmnet -t T269211 |
[analytics] |
15:52 |
<razzi> |
stop mariadb on labsdb1012 |
[analytics] |
15:39 |
<razzi> |
rebalance kafka partitions for webrequest_upload partition 10 |
[analytics] |
15:07 |
<elukey> |
drain + reimage analytics1073 and an-worker1086 to Debian Buster |
[analytics] |
13:36 |
<elukey> |
roll restart HDFS Namenodes for the Hadoop cluster to pick up new Xmx settings (https://gerrit.wikimedia.org/r/c/operations/puppet/+/668659) |
[analytics] |
10:20 |
<elukey> |
force run of refinery-druid-drop-public-snapshots to check Druid public's performances |
[analytics] |
10:06 |
<elukey> |
failover HDFS Namenode from 1002 to 1001 (high GC pauses triggered the HDFS zkfc daemon on 1001 and the failover to 1002) |
[analytics] |
08:32 |
<elukey> |
drain + reimage an-worker107[8,9] to Debian Buster (one Journal node included) |
[analytics] |
07:22 |
<elukey> |
drain + reimage analytics107[0-1] to debian buster |
[analytics] |
07:13 |
<elukey> |
add analytis1066 back with /dev/sdb removed |
[analytics] |
07:01 |
<elukey> |
stop hadoop daemons on analytics1066 - disk errors on /dev/sdb after reimage |
[analytics] |
2021-03-04
§
|
21:19 |
<razzi> |
rebalance kafka partitions for webrequest_upload partition 9 |
[analytics] |
16:27 |
<elukey> |
drain + reimage analytics106[8,9] to Debian Buster (one is a journalnode) |
[analytics] |
15:12 |
<elukey> |
drain + reimage analytics106[6,7] to Debian Buster |
[analytics] |
14:21 |
<elukey> |
drain + reimage analytics1065 to Debian Buster |
[analytics] |
13:32 |
<elukey> |
drain + reimage analytics10[63,64] to Debian Buster |
[analytics] |
12:48 |
<elukey> |
drain + reimage analytics10[61,62] to Debian Buster |
[analytics] |
10:40 |
<elukey> |
drain + reimage analytics1059/1060 to Debian Buster |
[analytics] |
09:32 |
<elukey> |
reboot an-worker[1097-1101] (GPU workers) to pick up the new kernel (5.10) |
[analytics] |
09:02 |
<elukey> |
kill/start mediawiki-geoeditors-monthly to apply backtick change (hive script) |
[analytics] |
08:48 |
<elukey> |
deploy refinery to hdfs |
[analytics] |
08:34 |
<elukey> |
deploy refinery to fix https://gerrit.wikimedia.org/r/c/analytics/refinery/+/668111 |
[analytics] |
07:38 |
<elukey> |
reboot an-worker1096 to pick up 5.10 kernel |
[analytics] |
2021-03-02
§
|
23:15 |
<mforns> |
finished deployment of refinery to hdfs |
[analytics] |
21:59 |
<mforns> |
starting refinery deployment using scap |
[analytics] |
21:48 |
<mforns> |
deployed refinery-source v0.1.2 |
[analytics] |
17:26 |
<razzi> |
rebalance kafka partitions for webrequest_upload partition 7 |
[analytics] |
13:42 |
<elukey> |
Add an-worker11[19,20-28,30,31] to Analytics Hadoop |
[analytics] |
10:21 |
<elukey> |
roll restart druid historicals on druid public to pick up new cache settings (enable segment caching) |
[analytics] |
10:14 |
<elukey> |
roll restart druid brokers on druid public to pick up new cache settings (no segment caching, only query caching) |
[analytics] |
08:01 |
<elukey> |
manual start of performance-asotranking on stat1007 (requested by Gilles) - T276121 |
[analytics] |
2021-03-01
§
|
21:24 |
<razzi> |
rebalance kafka partitions for webrequest_upload partition 6 |
[analytics] |
18:14 |
<razzi> |
restart timer that wasn't running on an-worker1101: sudo systemctl restart prometheus-debian-version-textfile.timer |
[analytics] |
17:40 |
<elukey> |
reimage an-worker1098 (GPU worker node) to Buster |
[analytics] |
14:48 |
<elukey> |
reimage an-worker1097 (gpu node) to debian buster |
[analytics] |
11:55 |
<elukey> |
roll restart druid broker on druid-analytics (again) to enable query cache settings (missing config due to typo) |
[analytics] |
11:34 |
<elukey> |
roll restart historical daemons (again) on druid-analytics to remove stale config and enable (finally) segment caching. |
[analytics] |
11:02 |
<elukey> |
roll restart druid-broker and druid-historical daemons on druid-analytics to pick up new cache settings (disable segment caching on broker and enable it on historicals) |
[analytics] |
09:11 |
<elukey> |
restart hadoop daemons on an-worker1112 to pick up the new disk |
[analytics] |
09:11 |
<elukey> |
remount /dev/sdl on an-worker1112 (wasn't able to make it fail) |
[analytics] |