2020-06-17 §
06:40 <elukey> reboot krb1001 for kernel upgrades [analytics]
06:24 <elukey> reboot an-master100[1,2] for kernel upgrades [analytics]
06:03 <elukey> reboot an-conf100[1-3] for kernel upgrades [analytics]
05:45 <elukey> reboot stat1007/8 for kernel upgrades [analytics]
2020-06-16 §
19:58 <ottomata> evolving event.SearchSatisfaction Hive table using /analytics/legacy/searchsatisfaction/latest schema [analytics]
19:41 <ottomata> bumping Refine refinery jar version to 0.0.127 - T238230 [analytics]
19:17 <ottomata> deploying refinery source 0.0.127 for eventlogging -> eventgate migration - T249261 [analytics]
16:02 <elukey> reboot kafka-jumbo1008 for kernel upgrades [analytics]
15:33 <milimetric> refinery deployed and synced to hdfs, with refinery-source at 0.0.126 [analytics]
15:20 <elukey> reboot kafka-jumbo1007 for kernel upgrades [analytics]
15:13 <elukey> re-enabling timers on launcher after maintenance [analytics]
15:06 <elukey> reboot an-coord1001 for kernel upgrades [analytics]
14:27 <elukey> stop timers on an-launcher1001, prep before rebooting an-coord1001 [analytics]
14:23 <elukey> reboot druid100[7,8] for kernel upgrades [analytics]
11:51 <elukey> re-run webrequest-druid-hourly-coord 16/06T10 [analytics]
11:36 <elukey> reboot an-druid100[1,2] for kernel upgrades [analytics]
2020-06-15 §
09:37 <elukey> restart refinery-druid-drop-public-snapshots.service after change in vlan firewall rules (added druid100[7,8] to term druid) [analytics]
2020-06-11 §
15:01 <mforns> started refinery deploy for v0.0.126 [analytics]
14:58 <mforns> deployed refinery-source v0.0.126 [analytics]
13:57 <ottomata> removed accidentally added page_restrictions column(s) on Hive table event.mediawiki_user_blocks_change after a incorrect schema change was merged (no data was ever set in this column) [analytics]
2020-06-09 §
07:32 <elukey> upgrade ROCm to 3.3 on stat1005 [analytics]
2020-06-08 §
15:42 <elukey> remove access to notebook100[3,4] - T249752 [analytics]
14:07 <elukey> move matomo cron archiver to systemd timer archiver (with nagios alarming) [analytics]
14:02 <elukey> re-enable timers on an-coord1001 [analytics]
14:01 <elukey> restart hive/oozie on an-coord1001 for openjdk upgrades [analytics]
13:42 <elukey> roll restart kafka jumbo brokers for openjdk upgrades [analytics]
13:26 <elukey> stop timers on an-launcher to drain jobs and restart hive/oozie for openjdk upgrades [analytics]
2020-06-05 §
17:56 <elukey> roll restart presto server on an-presto* to pick up new openjdk upgrades [analytics]
16:45 <elukey> upgrade turnilo to 1.24.0 [analytics]
13:26 <elukey> reimage druid1006 to debian buster [analytics]
09:26 <elukey> roll restart cassandra on AQS to pick up openjdk upgrades [analytics]
2020-06-04 §
19:12 <elukey> roll restart of aqs to pick up new druid settings [analytics]
18:39 <mforns> deployed wikistats2 2.7.5 [analytics]
13:33 <elukey> re-enable netflow hive2druid jobs after https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/602356/ [analytics]
10:56 <elukey> depooled and reimage druid1004 to Debian Buster (Druid public cluster) [analytics]
07:31 <elukey> stop netflow hive2druid timers to do some experiments [analytics]
06:13 <elukey> kill application_1589903254658_75731 (druid indexation for netflow still running since 12h ago) [analytics]
05:36 <elukey> restart druid middlemanager on druid1002 - strange protobuf warnings, netflow hive2druid indexation job stuck for hours [analytics]
05:13 <elukey> reimage druid1003 to Buster [analytics]
2020-06-03 §
17:10 <elukey> restart RU jobs after adding memory to an-launcher1001 [analytics]
16:57 <elukey> reboot an-launcher1001 to get new memory [analytics]
16:01 <elukey> stop timers on an-launcher, prep for reboot [analytics]
09:35 <elukey> re-run webrequest-druid-hourly-coord 03/06T7 (failed due to druid1002 moving to buster) [analytics]
08:50 <elukey> reimage druid1002 to Buster [analytics]
2020-06-01 §
14:54 <elukey> stop all timers on an-launcher1001, prep step for reboot [analytics]
12:54 <elukey> /user/dedcode/.Trash/* -skipTrash [analytics]
06:53 <elukey> re-run virtualpageview-hourly-wf-2020-5-31-19 [analytics]
06:28 <elukey> temporary stop of all RU jobs on an-launcher1001 to priviledge camus and others [analytics]
06:03 <elukey> kill all airflow-related processes on an-launcher1001 - host killing tasks due to OOM [analytics]
2020-05-30 §
08:15 <elukey> manual reset-failed of monitor_refine_mediawiki_job_events_failure_flags [analytics]