3701-3750 of 10000 results (26ms)
2020-10-29 §
09:52 <elukey> add gdnsd.service to all gdnsd hosts (with LimitNOFILE=infinity as override) - no daemon restart done - T266746 [production]
09:41 <marostegui> Deploy schema change on s8 wikidata codfw master (db2079) T264109 [production]
09:33 <elukey> clean up 10.64.21.7/24 and 2620:0:861:105:10:64:21:7/64 from netbox (an-test-ui1001 already have ips previously allocated by makevm) [production]
09:32 <elukey@cumin1001> END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) [production]
09:23 <elukey@cumin1001> START - Cookbook sre.ganeti.makevm [production]
08:54 <vgutierrez> turn off ECDHE-ECDSA-AES128-SHA support on the main caching cluster - T258405 [production]
08:54 <moritzm> fixing up stray jenkins auto restart timers on secondary releases server [production]
08:53 <vgutierrez> A:cp (except cp3052, running varnish 5) upgrade libvmod-netmapper to 1.9-1 T266567 T264398 [production]
08:48 <moritzm> fixing up stray mcelog auto restart timers on kubestage* [production]
08:38 <moritzm> fixing up stray cas auto restart timers on secondary IDP servers [production]
08:19 <moritzm> fixing up stray pmacctd auto restart timers on netflow* [production]
08:19 <moritzm> fixing up stray pcacctd auto restart timers on netflow* [production]
08:02 <marostegui> Disconnect replication codfw -> eqiad on s1 T266663 [production]
07:56 <vgutierrez> set LimitNOFILE=500000 for gdnsd on authdns1001 [production]
07:54 <marostegui> Disconnect replication codfw -> eqiad on s4 T266663 [production]
07:50 <vgutierrez> restart haproxy on authdns2001 [production]
07:49 <marostegui> Disconnect replication codfw -> eqiad on s8 T266663 [production]
07:48 <godog> swift codfw-prod: bump object weight for ms-be2057 - T261633 [production]
07:46 <marostegui> Disconnect replication codfw -> eqiad on s3 T266663 [production]
07:43 <vgutierrez> restart anycast-healthchecker on authdns2001 [production]
07:34 <vgutierrez> set LimitNOFILE=500000 for gdnsd on authdns2001 [production]
07:27 <elukey> "sudo truncate -s 10g /var/log/daemon.log" on authdns2001 [production]
06:52 <marostegui> Disconnect replication codfw -> eqiad on s2 T266663 [production]
06:38 <marostegui> Disconnect replication codfw -> eqiad on s7 T266663 [production]
06:36 <marostegui> Disconnect replication codfw -> eqiad on s6 T266663 [production]
06:25 <elukey> execute 'truncate -s 10g /var/log/syslog.1 on authdns2001 - root partition full [production]
06:23 <marostegui> Disconnect replication codfw -> eqiad on s5 T266663 [production]
06:10 <marostegui> Disconnect replication codfw -> eqiad on es4 and es5 T266663 [production]
06:07 <marostegui> Disconnect replication codfw -> eqiad on x1 T266663 [production]
05:58 <marostegui> Disconnect replication codfw -> eqiad on pc1, pc2 and pc3 T266663 [production]
04:06 <ryankemper@cumin1001> END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0) [production]
01:41 <mutante> scandium reimaged a second time after making puppet changes to ensure nodejs/npm is NOT installed anymore (T257906) [production]
01:17 <ryankemper> T266492 Beginning rolling restart of eqiad cirrus cluster, 3 nodes at a time, on `ryankemper@cumin1001` tmux session `elasticsearch_restart_eqiad` [production]
01:16 <ryankemper@cumin1001> START - Cookbook sre.elasticsearch.rolling-restart [production]
00:51 <ryankemper> Finished restart of wdqs categories across production hosts; wdqs deploy is complete and the service is healthy [production]
00:14 <Amir1> rolling restart of ores [production]
00:12 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
00:10 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime [production]
00:04 <ryankemper> Beginning restart of wdqs categories across production hosts, one at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'` [production]
00:03 <ryankemper> Restarted wdqs categories across test hosts: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'` [production]
00:03 <ryankemper> Restarted wdqs updater across all hosts: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'` [production]
00:02 <ryankemper> Following wdqs deploy, https://query.wikidata.org successfully responds to an example query [production]
00:01 <ryankemper@deploy1001> Finished deploy [wdqs/wdqs@8c97b17]: 0.3.53 (duration: 09m 29s) [production]
2020-10-28 §
23:54 <ryankemper> Canary `wdqs1003` tests pass, proceeding with wdqs deploy to rest of fleet [production]
23:52 <ryankemper@deploy1001> Started deploy [wdqs/wdqs@8c97b17]: 0.3.53 [production]
23:52 <ryankemper@deploy1001> deploy aborted: 0.3.53 (duration: 00m 00s) [production]
23:52 <ryankemper@deploy1001> Started deploy [wdqs/wdqs@8c97b17]: 0.3.53 [production]
22:54 <mutante> scandium - scap pull after reinstalling OS [production]
22:14 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
22:12 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime [production]