801-850 of 1959 results (23ms)
2018-03-06 §
09:41 <elukey> stop eventlogging's mysql consumers for db1107 (el master) kernel updates [analytics]
2018-03-05 §
18:22 <elukey> restart webrequest-load-wf-upload-2018-3-5-16 via Hue (failed due to reboots) [analytics]
18:21 <elukey> restart webrequest-load-wf-text-2018-3-5-16 via Hue (failed due to reboots) [analytics]
15:00 <mforns> rerun mediacounts-load-wf-2018-3-5-9 [analytics]
10:57 <joal> Relaunch Mediawiki-history job manually from spark2 to see if new versions helps [analytics]
10:57 <joal> Killing failing Mediawiki-History job for 2018-03 [analytics]
2018-03-02 §
15:33 <mforns> rerun webrequest-load-wf-text-2018-3-2-12 [analytics]
2018-03-01 §
14:59 <elukey> shutdown deployment-eventlog02 in favor of deployment-eventlog05 in deployment-prep (Ubuntu -> Debian EL migration) [analytics]
09:45 <elukey> rerun webrequest-load-wf-text-2018-3-1-6 manually, failed due to analytics1030's reboot [analytics]
2018-02-28 §
22:09 <milimetric> re-deployed refinery for a small docs fix in the sqoop script [analytics]
17:55 <milimetric> Refinery synced to HDFS, deploy completed [analytics]
17:40 <milimetric> deploying Refinery [analytics]
08:38 <joal> rerun cassandra-hourly-wf-local_group_default_T_pageviews_per_project_v2-2018-2-27-15 [analytics]
2018-02-27 §
19:12 <ottomata> updating spark2-* CLIs to spark 2.2.1: T185581 [analytics]
2018-02-21 §
20:48 <ottomata> now running 2 camus webrequest jobs, one consuming from jumbo (no data yet), the other from analytics. these should be fine to run in parallel. [analytics]
07:21 <elukey> reboot db1108 (analytics-slave.eqiad.wmnet) for mariadb+kernel updates [analytics]
2018-02-19 §
17:14 <elukey> deployed eventlogging - https://gerrit.wikimedia.org/r/#/c/405687/ [analytics]
07:35 <elukey> re-run wikidata-specialentitydata_metrics-wf-2018-2-17 via Hue [analytics]
2018-02-16 §
15:41 <elukey> add analytics1057 back in the Hadoop worker pool after disk swap [analytics]
10:55 <elukey> increased topic partitions for netflow to 3 [analytics]
2018-02-15 §
13:54 <milimetric> deployment of refinery and refinery-source done [analytics]
12:52 <joal> Killing webrequest-load bundle (next restart should be at hour 12:00) [analytics]
08:18 <elukey> removed jmxtrans and java 7 from analytics1003 and re-launched refinery-drop-mediawiki-snapshots [analytics]
07:51 <elukey> removed default-java packages from analytics1003 and re-launched refinery-drop-mediawiki-snapshots [analytics]
2018-02-14 §
13:44 <elukey> rollback java 8 upgrade for archiva - issues with Analytics builds [analytics]
13:35 <elukey> installed openjdk-8 on meitnerium, manually upgraded java-update-alternatives to java8, restarted archiva [analytics]
13:14 <elukey> removed java 7 packages from analytics100[12] [analytics]
12:43 <elukey> jmxtrans removed from all the Hadoop workers [analytics]
12:43 <elukey> openjdk-7-* packages removed from all the Hadoop workers [analytics]
2018-02-13 §
11:42 <elukey> force kill of yarn nodemanager + other containers on analytics1057 (node failed, unit masked, processes still around) [analytics]
2018-02-12 §
23:16 <elukey> re-run webrequest-load-wf-upload-2018-2-12-21 via Hue (node managers failure) [analytics]
23:13 <elukey> manual restart of Yarn Node Managers on analytics1058/31 [analytics]
23:09 <elukey> cleaned up tmp files on all analytics hadoop worker nodes, job filling up tmp [analytics]
17:18 <elukey> home dirs on stat1004 moved to /srv/home (/home symlinks to it) [analytics]
17:15 <ottomata> restarting eventlogging-processors to blacklist Print schema in eventlogging-valid-mixed (MySQL) [analytics]
14:46 <ottomata> deploying eventlogging for T186833 with EventCapsule in code and IP NO_DB_PROPERTIES [analytics]
2018-02-09 §
12:19 <joal> Rerun wikidata-articleplaceholder_metrics-wf-2018-2-8 [analytics]
2018-02-08 §
16:23 <elukey> stop archiva on meitnerium to swap /var/lib/archiva from the root partition to a new separate one [analytics]
2018-02-07 §
13:55 <joal> Manually restarted druid indexation after weird failure of mediawiki-history-reduced-wf-2018-01 [analytics]
13:49 <elukey> restart overlord/middlemanager on druid1005 [analytics]
2018-02-06 §
19:40 <joal> Manually restarted druid indexation after weird failure of mediawiki-history-reduced-wf-2018-01 [analytics]
15:36 <elukey> drain + shutdown of analytics1038 to replace faulty BBU [analytics]
09:58 <elukey> applied https://gerrit.wikimedia.org/r/c/405687/ manually on deployment-eventlog02 for testing [analytics]
2018-02-05 §
15:51 <elukey> live hacked deployment-eventlog02's /srv/deployment/eventlogging/analytics/eventlogging/handlers.py to add poll(0) to the confluent kafka producer - T185291 [analytics]
11:03 <elukey> restart eventlogging/forwarder legacy-zmq on eventlog1001 due to slow memory leak over time (cached memory down to zero) [analytics]
2018-02-02 §
17:09 <joal> Webrequest upload 2018-02-02 hours 9 and 11 dataloss warning have been checked - They are false positive [analytics]
09:56 <joal> unique_devices-per_project_family-monthly-wf-2018-1 after failure [analytics]
2018-02-01 §
17:00 <ottomata> killing stuck JsonRefine eventlogging analytics job application_1515441536446_52892, not sure why this is stuck. [analytics]
14:06 <joal> Dataloss alerts for upload 2018-02-01 hours 1, 2, 3 and 5 were false positives [analytics]
12:17 <joal> Restart cassandra monthly bundle after January deploy [analytics]