751-800 of 10000 results (38ms)
2017-05-02 §
08:04 <hashar> Upgrading Jenkins to 2.7.4 - T144106 [production]
07:59 <elukey> Swap mc1001->mc1012 with mc1019->mc2030 - T137345 (more informative :) [production]
07:58 <elukey> wap mc1001->mc1012 with mc1019->mc2030 [production]
07:36 <_joe_> starting etcd replication codfw => eqiad [production]
06:46 <_joe_> disabling etcd auth on conf1*, converting to use nginx for TLS/auth T159687 [production]
03:10 <mattflaschen@naos> Synchronized php-1.29.0-wmf.21/extensions/FlaggedRevs/: Urgent deploy: Fix FlaggedRevs fatal, and also a filter issue: T164096 and T164049 (duration: 00m 56s) [production]
02:45 <tstarling@naos> Synchronized php-1.29.0-wmf.21/includes/config/EtcdConfig.php: EtcdConfig backported bug fixes (duration: 01m 02s) [production]
02:34 <tstarling@naos> Synchronized wmf-config/CommonSettings.php: siteinfo hook (duration: 02m 39s) [production]
00:33 <tstarling@puppetmaster1001> conftool action : set/@read-write.yaml; selector: name=ReadOnly [production]
00:33 <tstarling@puppetmaster1001> conftool action : set/@dc-codfw.yaml; selector: name=WMFMasterDatacenter [production]
00:25 <TimStarling> populating production etcd with initial mediawiki config keys [production]
2017-05-01 §
23:41 <mutante> netmon1002 - signed puppet cert, initial puppet run, accept salt-key,.. (T159756) [production]
23:15 <mutante> netmon1002 - boot into PXE, initial OS install (T159756) [production]
23:06 <bd808> Ran puppet cert clean striker-deploy03.striker.eqiad.wmflabs on labcontrol1001 [production]
19:43 <ejegg> updated payments-wiki from 4c5630283c57efbc454cc70d47218f7f22ea252a to 57451dee67e498d445a6f9bc10d40acf3df65f38 [production]
19:10 <mobrovac@naos> Finished deploy [mobileapps/deploy@b5afcb8]: Forced deploy to bring the targets to the current version (duration: 02m 08s) [production]
19:08 <mobrovac@naos> Started deploy [mobileapps/deploy@b5afcb8]: Forced deploy to bring the targets to the current version [production]
18:46 <mutante> temp. re-enabling puppet on restbase1018 and running it once to fix icinga config syntax error. then disabling it again. restbase service stopped before and after. this box has a broken disk. [production]
18:35 <mutante> brought mc1018 back up, ran puppet on it and then on Icinga. parent was adjusted from asw-d-eqiad to asw2-2-eqiad. reduced icinga config errors by 50% :p (1 of 2 left, restbase1018) [production]
18:28 <mutante> powercycling mc1018 [production]
18:19 <mutante> manually removed asw-d-eqiad remnants from /etc/icinga/puppet_hosts.cfg to fix icinga config after gerrit:351167 / T148506. fixes Icinga config error. then puppet adds it back [production]
18:03 <andrewbogott> restarting nova-fullstack tests but saving instance 2d60e8c5-fb2a-4681-ac0a-ae2162bb13fb for future research [production]
17:03 <mutante> phab2001 - start/stop phd service - that fixed "systemd state" icinga check, even though phd does not run just like before [production]
16:53 <bblack> reverting inter-caching routing from codfw-switchover period: https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Switchback [production]
16:52 <bblack@neodymium> conftool action : set/pooled=yes; selector: dc=eqiad,cluster=cache_upload,name=cp107[1234].eqiad.wmnet [production]
16:19 <mobrovac@naos> Finished deploy [citoid/deploy@747777f]: Remove mwDeprecated - T93514 (duration: 02m 19s) [production]
16:17 <mobrovac@naos> Started deploy [citoid/deploy@747777f]: Remove mwDeprecated - T93514 [production]
15:46 <jynus> shutting down db1063 for maintenance T164107 [production]
15:13 <bblack> restarting varnish backend on cp2002 (mailbox issues) [production]
12:58 <Amir1> cleaning ores_classification rows half an hour or so (T159753) [production]
11:31 <jynus> running alter table on categorylinks on db1054, 68, 62 T164185 [production]
11:25 <jynus> running alter table on enwiki.categorylinks on db1052 T164185 [production]
03:46 <tstarling@naos> Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/347537/ (duration: 01m 01s) [production]
03:44 <tstarling@naos> Synchronized wmf-config/etcd.php: https://gerrit.wikimedia.org/r/#/c/347537/ (duration: 02m 39s) [production]
2017-04-30 §
16:35 <urandom> T160759: Restoring default tombstone_threshold on restbase1009 [production]
16:29 <ppchelko@naos> Finished deploy [restbase/deploy@4f96ae3]: Blacklist a zhwiki page that's causing issues (duration: 07m 27s) [production]
16:21 <ppchelko@naos> Started deploy [restbase/deploy@4f96ae3]: Blacklist a zhwiki page that's causing issues [production]
15:31 <elukey> set tombstone_failure_threshold=1000 to restbase1009-a with P5165 on restbase1009-a - T160759 [production]
15:24 <elukey> set tombstone_failure_threshold=10000 to restbase1009-a with P5165 on restbase1009-a - T160759 [production]
07:45 <elukey> deleted /srv/cassandra-a/commitlog/CommitLog-5-1490738321543.log from restbase1009-a (empty commit log file created before OOM - backup in /home/elukey) [production]
2017-04-29 §
10:50 <elukey> set sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=65 to kafka[1018,1020,1022].eqiad.wmnet (was 120 - maybe related to T136094 ?) [production]
10:39 <elukey> start ferm on kafka1020/18 (nodes were previously down for maintenance, not sure why ferm wasn't started) [production]
09:59 <reedy@naos> Synchronized wmf-config/CommonSettings.php: Revert pdf processor firejails T164045 (duration: 02m 41s) [production]
2017-04-28 §
21:24 <Dereckson> End of live debug on mwdebug1001, restored previous state with a local scap pull [production]
21:00 <ejegg> updated payments-wiki from 1620b8233321099262ff4333a2269f0563107e66 to 4c5630283c57efbc454cc70d47218f7f22ea252a [production]
20:23 <Dereckson> Live debug on mwdebug1001 for T164059 [production]
19:30 <jynus> shutting down db1063 - I see high temperatures reported, and going up T164107 [production]
19:08 <urandom> T163936: reenabling puppet on restbase-dev1001 [production]
18:14 <urandom> T163936: disabling puppet on restbase-dev1001 (t-shooting c-m-c) [production]
17:08 <jynus> restarting replication on all nodes on s7-eqiad T164092 [production]