2101-2150 of 4847 results (21ms)
2020-06-08 §
14:07 <elukey> move matomo cron archiver to systemd timer archiver (with nagios alarming) [analytics]
14:02 <elukey> re-enable timers on an-coord1001 [analytics]
14:01 <elukey> restart hive/oozie on an-coord1001 for openjdk upgrades [analytics]
13:42 <elukey> roll restart kafka jumbo brokers for openjdk upgrades [analytics]
13:26 <elukey> stop timers on an-launcher to drain jobs and restart hive/oozie for openjdk upgrades [analytics]
2020-06-05 §
17:56 <elukey> roll restart presto server on an-presto* to pick up new openjdk upgrades [analytics]
16:45 <elukey> upgrade turnilo to 1.24.0 [analytics]
13:26 <elukey> reimage druid1006 to debian buster [analytics]
09:26 <elukey> roll restart cassandra on AQS to pick up openjdk upgrades [analytics]
2020-06-04 §
19:12 <elukey> roll restart of aqs to pick up new druid settings [analytics]
18:39 <mforns> deployed wikistats2 2.7.5 [analytics]
13:33 <elukey> re-enable netflow hive2druid jobs after https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/602356/ [analytics]
10:56 <elukey> depooled and reimage druid1004 to Debian Buster (Druid public cluster) [analytics]
07:31 <elukey> stop netflow hive2druid timers to do some experiments [analytics]
06:13 <elukey> kill application_1589903254658_75731 (druid indexation for netflow still running since 12h ago) [analytics]
05:36 <elukey> restart druid middlemanager on druid1002 - strange protobuf warnings, netflow hive2druid indexation job stuck for hours [analytics]
05:13 <elukey> reimage druid1003 to Buster [analytics]
2020-06-03 §
17:10 <elukey> restart RU jobs after adding memory to an-launcher1001 [analytics]
16:57 <elukey> reboot an-launcher1001 to get new memory [analytics]
16:01 <elukey> stop timers on an-launcher, prep for reboot [analytics]
09:35 <elukey> re-run webrequest-druid-hourly-coord 03/06T7 (failed due to druid1002 moving to buster) [analytics]
08:50 <elukey> reimage druid1002 to Buster [analytics]
2020-06-01 §
14:54 <elukey> stop all timers on an-launcher1001, prep step for reboot [analytics]
12:54 <elukey> /user/dedcode/.Trash/* -skipTrash [analytics]
06:53 <elukey> re-run virtualpageview-hourly-wf-2020-5-31-19 [analytics]
06:28 <elukey> temporary stop of all RU jobs on an-launcher1001 to priviledge camus and others [analytics]
06:03 <elukey> kill all airflow-related processes on an-launcher1001 - host killing tasks due to OOM [analytics]
2020-05-30 §
08:15 <elukey> manual reset-failed of monitor_refine_mediawiki_job_events_failure_flags [analytics]
2020-05-29 §
13:19 <elukey> re-run druid webrequest hourly 29/05T11 (failed due to a host reimage in progress) [analytics]
12:19 <elukey> reimage druid1001 to Debian Buster [analytics]
10:05 <elukey> move el2druid config from druid1001 to an-druid1001 [analytics]
2020-05-28 §
18:31 <milimetric> after deployment, restarted four oozie jobs with new SLAs and fixed datasets definitions [analytics]
06:40 <elukey> slowly restarting all RU units on an-launcher1001 [analytics]
06:32 <elukey> delete old RU pid files with timestamp May 27 19:00 (scap deployment failed to an-launcher due to disk issues) except ./jobs/reportupdater-queries/pingback/.reportupdater.pid that was working fine [analytics]
2020-05-27 §
19:53 <joal> Start pageview-complete dump oozie job after deploy [analytics]
19:24 <joal> Deploy refinery onto hdfs [analytics]
19:22 <joal> restart failed services on an-launcher1001 [analytics]
19:06 <joal> Deploy refinery using scap to an-launcher1001 only [analytics]
18:41 <joal> Deploying refinery with scap [analytics]
13:42 <ottomata> increased Kafka topic retention in jumbo-eqiad to 31 days for (eqiad|codfw).mediawiki.revision-create - T253753 [analytics]
07:09 <joal> Rerun webrequest-druid-hourly-wf-2020-5-26-17 [analytics]
07:04 <elukey> matomo upgraded to 3.13.5 on matomo1001 [analytics]
06:17 <elukey> superset upgraded to 0.36 [analytics]
05:52 <elukey> attempt to upgrade Superset to 0.36 - downtime expected [analytics]
2020-05-24 §
10:04 <elukey> re-run virtualpageview-hourly 23/05T15 - failed due to a sporadic kerberos/hive issue [analytics]
2020-05-22 §
09:11 <elukey> superset upgrade attempt to 0.36 failed due to a db upgrade error (not seen in staging), rollback to 0.35.2 [analytics]
08:15 <elukey> superset down for maintenance [analytics]
07:09 <elukey> add druid100[7,8] to the LVS druid-public-brokers service (serving AQS's traffic) [analytics]
2020-05-21 §
17:24 <elukey> add druid100[7,8] to the druid public cluster (not serving load balancer traffic for the moment, only joining the cluster) - T252771 [analytics]
16:44 <elukey> roll restart druid historical nodes on druid100[4-6] (public cluster) to pick up new settings - T252771 [analytics]