651-700 of 10000 results (36ms)
2017-05-01 §
19:10 <mobrovac@naos> Finished deploy [mobileapps/deploy@b5afcb8]: Forced deploy to bring the targets to the current version (duration: 02m 08s) [production]
19:08 <mobrovac@naos> Started deploy [mobileapps/deploy@b5afcb8]: Forced deploy to bring the targets to the current version [production]
18:46 <mutante> temp. re-enabling puppet on restbase1018 and running it once to fix icinga config syntax error. then disabling it again. restbase service stopped before and after. this box has a broken disk. [production]
18:35 <mutante> brought mc1018 back up, ran puppet on it and then on Icinga. parent was adjusted from asw-d-eqiad to asw2-2-eqiad. reduced icinga config errors by 50% :p (1 of 2 left, restbase1018) [production]
18:28 <mutante> powercycling mc1018 [production]
18:19 <mutante> manually removed asw-d-eqiad remnants from /etc/icinga/puppet_hosts.cfg to fix icinga config after gerrit:351167 / T148506. fixes Icinga config error. then puppet adds it back [production]
18:03 <andrewbogott> restarting nova-fullstack tests but saving instance 2d60e8c5-fb2a-4681-ac0a-ae2162bb13fb for future research [production]
17:03 <mutante> phab2001 - start/stop phd service - that fixed "systemd state" icinga check, even though phd does not run just like before [production]
16:53 <bblack> reverting inter-caching routing from codfw-switchover period: https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Switchback [production]
16:52 <bblack@neodymium> conftool action : set/pooled=yes; selector: dc=eqiad,cluster=cache_upload,name=cp107[1234].eqiad.wmnet [production]
16:19 <mobrovac@naos> Finished deploy [citoid/deploy@747777f]: Remove mwDeprecated - T93514 (duration: 02m 19s) [production]
16:17 <mobrovac@naos> Started deploy [citoid/deploy@747777f]: Remove mwDeprecated - T93514 [production]
15:46 <jynus> shutting down db1063 for maintenance T164107 [production]
15:13 <bblack> restarting varnish backend on cp2002 (mailbox issues) [production]
12:58 <Amir1> cleaning ores_classification rows half an hour or so (T159753) [production]
11:31 <jynus> running alter table on categorylinks on db1054, 68, 62 T164185 [production]
11:25 <jynus> running alter table on enwiki.categorylinks on db1052 T164185 [production]
03:46 <tstarling@naos> Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/347537/ (duration: 01m 01s) [production]
03:44 <tstarling@naos> Synchronized wmf-config/etcd.php: https://gerrit.wikimedia.org/r/#/c/347537/ (duration: 02m 39s) [production]
2017-04-30 §
16:35 <urandom> T160759: Restoring default tombstone_threshold on restbase1009 [production]
16:29 <ppchelko@naos> Finished deploy [restbase/deploy@4f96ae3]: Blacklist a zhwiki page that's causing issues (duration: 07m 27s) [production]
16:21 <ppchelko@naos> Started deploy [restbase/deploy@4f96ae3]: Blacklist a zhwiki page that's causing issues [production]
15:31 <elukey> set tombstone_failure_threshold=1000 to restbase1009-a with P5165 on restbase1009-a - T160759 [production]
15:24 <elukey> set tombstone_failure_threshold=10000 to restbase1009-a with P5165 on restbase1009-a - T160759 [production]
07:45 <elukey> deleted /srv/cassandra-a/commitlog/CommitLog-5-1490738321543.log from restbase1009-a (empty commit log file created before OOM - backup in /home/elukey) [production]
2017-04-29 §
10:50 <elukey> set sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=65 to kafka[1018,1020,1022].eqiad.wmnet (was 120 - maybe related to T136094 ?) [production]
10:39 <elukey> start ferm on kafka1020/18 (nodes were previously down for maintenance, not sure why ferm wasn't started) [production]
09:59 <reedy@naos> Synchronized wmf-config/CommonSettings.php: Revert pdf processor firejails T164045 (duration: 02m 41s) [production]
2017-04-28 §
21:24 <Dereckson> End of live debug on mwdebug1001, restored previous state with a local scap pull [production]
21:00 <ejegg> updated payments-wiki from 1620b8233321099262ff4333a2269f0563107e66 to 4c5630283c57efbc454cc70d47218f7f22ea252a [production]
20:23 <Dereckson> Live debug on mwdebug1001 for T164059 [production]
19:30 <jynus> shutting down db1063 - I see high temperatures reported, and going up T164107 [production]
19:08 <urandom> T163936: reenabling puppet on restbase-dev1001 [production]
18:14 <urandom> T163936: disabling puppet on restbase-dev1001 (t-shooting c-m-c) [production]
17:08 <jynus> restarting replication on all nodes on s7-eqiad T164092 [production]
16:38 <jynus> stopping replication on all nodes on s7-eqiad in case db1062 boots up in a corrupted state [production]
16:36 <jynus> restarting db1062 once more T164092 [production]
15:56 <godog> poweroff prometheus1004 for ram upgrade - T163385 [production]
15:40 <jynus> deploying new events_coredb_slave.sql on codfw T160984 [production]
15:21 <godog> poweroff prometheus1003 for ram upgrade - T163385 [production]
14:55 <gehel> shutting down elastic2020 for mainboard replacement - T149006 [production]
14:32 <marostegui@naos> Synchronized wmf-config/db-eqiad.php: Change db1063 IP and rack - T163895 (duration: 00m 48s) [production]
14:31 <marostegui@naos> Synchronized wmf-config/db-codfw.php: Change db1063 IP and rack - T163895 (duration: 00m 50s) [production]
14:04 <marostegui> Stop and shutdown db1063 - T163895 [production]
14:04 <marostegui@naos> Synchronized wmf-config/db-eqiad.php: Change db1062 rack location - T163895 (duration: 00m 52s) [production]
13:59 <moritzm> installing ghostscript security updates [production]
13:56 <urandom> T163936: restarting cassandra-metrics-collector, restbase production [production]
13:55 <urandom> $ readlink /usr/local/lib/cassandra-metrics-collector/cassandra-metrics-collector.jar [production]
13:50 <ema> varnish 4.1.6-1wm1 uploaded to apt.w.o [production]
13:46 <urandom> T163936: restarting cassandra-metrics-collector on restbase1007 [production]