5301-5350 of 10000 results (35ms)
2020-05-11 §
09:31 <hashar> contint2001 started zuul-merger again (had permission issues in /var/lib/zuul ) [production]
09:07 <mutante> contint1001 - rsync -avpz --delete /srv/jenkins/ rsync://contint2001.wikimedia.org/ci--srv-/jenkins/ (T224591) [production]
09:05 <mutante> contint2001 - mkdir /srv/jenkins [production]
08:55 <hashar> contint2001 stopping zuul-merger , permission problem [production]
08:46 <godog> bounce ferm on kubernetes1007 to resolve icinga UNKNOWN [production]
08:40 <mutante> rsyncing /var/lib/jenkins from contint1001 to contint2001 with --delete [production]
08:32 <mutante> rsynced data from contint1001 to contint2001 - pathes per T224591#6039192 for the migration later today [production]
08:30 <ema> cp3050: upgrade atskafka to 0.6 T237993 [production]
08:30 <_joe_> removing the iptables DROP rule on mc1020 T251378 [production]
07:54 <moritzm> installing squid security updates [production]
07:21 <moritzm> updated buster netboot images to 10.4 (updated to latest point release) [production]
07:08 <_joe_> dropping requests to mc1020 via a firewall rule T251378 [production]
06:04 <elukey> restart wikimedia-discovery-golden on stat1007 - apparenlty killed by no memory left to allocate on the system [production]
2020-05-10 §
12:18 <marostegui> Start event scheduler on db1115 after a massive delete - T252324 [production]
11:05 <marostegui> Stop event scheduler on db1115 to perform a massive delete - T252324 [production]
10:27 <dcausse> restarting blazgraph on wdqs1004: T242453 [production]
09:56 <marostegui> Change scaling_governor from powersave to performance on db1115 - T252324 [production]
09:25 <marostegui> Stop MySQL and restart db1115 - T252324 [production]
08:50 <marostegui> Restart mysql on db1115 to change buffer pool size from 20GB to 40GB T252324 ( [production]
08:44 <elukey> Power cycle analytics1052 after eno1 issue [production]
08:01 <marostegui> Disable unused events like %_schema T252324 T231185 [production]
07:11 <marostegui> Restart mysql on db1115 T231185 [production]
07:11 <marostegui> Truncate tendril. processlist_query_log T231185 [production]
2020-05-08 §
21:45 <bstorm_> cleaned up wb_terms_no_longer_updated view for testwikidatawiki and testcommonswiki on labsdb1010 T251598 [production]
21:45 <bstorm_> cleaned up wb_terms_no_longer_updated view on labsdb1012 T251598 [production]
21:33 <bstorm_> cleaning up wb_terms_no_longer_updated view on labsdb1009 T251598 [production]
21:06 <ottomata> running prefered replica election for kafka-jumbo to get preferred leaders back after reboot of broker earlier today - T252203 [production]
19:16 <jhuneidi@deploy1001> helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' . [production]
19:12 <jhuneidi@deploy1001> helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' . [production]
19:07 <jhuneidi@deploy1001> helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . [production]
18:12 <andrewbogott> reprepro copy buster-wikimedia stretch-wikimedia prometheus-openstack-exporter for T252121 [production]
17:59 <marostegui> Extend /srv by 500G on labsdb1011 T249188 [production]
16:55 <pt1979@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
16:53 <pt1979@cumin2001> START - Cookbook sre.hosts.downtime [production]
16:51 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
16:48 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime [production]
16:39 <pt1979@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
16:37 <pt1979@cumin2001> START - Cookbook sre.hosts.downtime [production]
16:14 <pt1979@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
16:12 <pt1979@cumin2001> START - Cookbook sre.hosts.downtime [production]
15:43 <pt1979@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
15:41 <pt1979@cumin2001> START - Cookbook sre.hosts.downtime [production]
15:36 <ottomata> starting kafka broker on kafka-jumbo1006, same issue on other brokers when they are leaders of offending partitions - T252203 [production]
15:31 <pt1979@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
15:28 <pt1979@cumin2001> START - Cookbook sre.hosts.downtime [production]
15:27 <ottomata> stopping kafka broker on kafka-jumbo1006 to investigate camus import failures - T252203 [production]
14:50 <otto@deploy1001> Finished deploy [analytics/refinery@4a2c530]: fix for camus wrapper, deploy to an-launcher1001 only (duration: 00m 03s) [production]
14:50 <otto@deploy1001> Started deploy [analytics/refinery@4a2c530]: fix for camus wrapper, deploy to an-launcher1001 only [production]
14:05 <akosiaris> T243106 undo experiment with DROP iptable rules this time around. Use mw1331, mw1348 [production]
13:22 <vgutierrez> rolling restart of ats-tls on eqiad, codfw, ulsfo and eqsin - T249335 [production]