production SAL

5401-5450 of 10000 results (51ms)

2017-05-01 §
23:41	<mutante>	netmon1002 - signed puppet cert, initial puppet run, accept salt-key,.. (T159756)	[production]
23:15	<mutante>	netmon1002 - boot into PXE, initial OS install (T159756)	[production]
23:06	<bd808>	Ran puppet cert clean striker-deploy03.striker.eqiad.wmflabs on labcontrol1001	[production]
19:43	<ejegg>	updated payments-wiki from 4c5630283c57efbc454cc70d47218f7f22ea252a to 57451dee67e498d445a6f9bc10d40acf3df65f38	[production]
19:10	<mobrovac@naos>	Finished deploy [mobileapps/deploy@b5afcb8]: Forced deploy to bring the targets to the current version (duration: 02m 08s)	[production]
19:08	<mobrovac@naos>	Started deploy [mobileapps/deploy@b5afcb8]: Forced deploy to bring the targets to the current version	[production]
18:46	<mutante>	temp. re-enabling puppet on restbase1018 and running it once to fix icinga config syntax error. then disabling it again. restbase service stopped before and after. this box has a broken disk.	[production]
18:35	<mutante>	brought mc1018 back up, ran puppet on it and then on Icinga. parent was adjusted from asw-d-eqiad to asw2-2-eqiad. reduced icinga config errors by 50% :p (1 of 2 left, restbase1018)	[production]
18:28	<mutante>	powercycling mc1018	[production]
18:19	<mutante>	manually removed asw-d-eqiad remnants from /etc/icinga/puppet_hosts.cfg to fix icinga config after gerrit:351167 / T148506. fixes Icinga config error. then puppet adds it back	[production]
18:03	<andrewbogott>	restarting nova-fullstack tests but saving instance 2d60e8c5-fb2a-4681-ac0a-ae2162bb13fb for future research	[production]
17:03	<mutante>	phab2001 - start/stop phd service - that fixed "systemd state" icinga check, even though phd does not run just like before	[production]
16:53	<bblack>	reverting inter-caching routing from codfw-switchover period: https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Switchback	[production]
16:52	<bblack@neodymium>	conftool action : set/pooled=yes; selector: dc=eqiad,cluster=cache_upload,name=cp107[1234].eqiad.wmnet	[production]
16:19	<mobrovac@naos>	Finished deploy [citoid/deploy@747777f]: Remove mwDeprecated - T93514 (duration: 02m 19s)	[production]
16:17	<mobrovac@naos>	Started deploy [citoid/deploy@747777f]: Remove mwDeprecated - T93514	[production]
15:46	<jynus>	shutting down db1063 for maintenance T164107	[production]
15:13	<bblack>	restarting varnish backend on cp2002 (mailbox issues)	[production]
12:58	<Amir1>	cleaning ores_classification rows half an hour or so (T159753)	[production]
11:31	<jynus>	running alter table on categorylinks on db1054, 68, 62 T164185	[production]
11:25	<jynus>	running alter table on enwiki.categorylinks on db1052 T164185	[production]
03:46	<tstarling@naos>	Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/347537/ (duration: 01m 01s)	[production]
03:44	<tstarling@naos>	Synchronized wmf-config/etcd.php: https://gerrit.wikimedia.org/r/#/c/347537/ (duration: 02m 39s)	[production]
2017-04-30 §
16:35	<urandom>	T160759: Restoring default tombstone_threshold on restbase1009	[production]
16:29	<ppchelko@naos>	Finished deploy [restbase/deploy@4f96ae3]: Blacklist a zhwiki page that's causing issues (duration: 07m 27s)	[production]
16:21	<ppchelko@naos>	Started deploy [restbase/deploy@4f96ae3]: Blacklist a zhwiki page that's causing issues	[production]
15:31	<elukey>	set tombstone_failure_threshold=1000 to restbase1009-a with P5165 on restbase1009-a - T160759	[production]
15:24	<elukey>	set tombstone_failure_threshold=10000 to restbase1009-a with P5165 on restbase1009-a - T160759	[production]
07:45	<elukey>	deleted /srv/cassandra-a/commitlog/CommitLog-5-1490738321543.log from restbase1009-a (empty commit log file created before OOM - backup in /home/elukey)	[production]
2017-04-29 §
10:50	<elukey>	set sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=65 to kafka[1018,1020,1022].eqiad.wmnet (was 120 - maybe related to T136094 ?)	[production]
10:39	<elukey>	start ferm on kafka1020/18 (nodes were previously down for maintenance, not sure why ferm wasn't started)	[production]
09:59	<reedy@naos>	Synchronized wmf-config/CommonSettings.php: Revert pdf processor firejails T164045 (duration: 02m 41s)	[production]
2017-04-28 §
21:24	<Dereckson>	End of live debug on mwdebug1001, restored previous state with a local scap pull	[production]
21:00	<ejegg>	updated payments-wiki from 1620b8233321099262ff4333a2269f0563107e66 to 4c5630283c57efbc454cc70d47218f7f22ea252a	[production]
20:23	<Dereckson>	Live debug on mwdebug1001 for T164059	[production]
19:30	<jynus>	shutting down db1063 - I see high temperatures reported, and going up T164107	[production]
19:08	<urandom>	T163936: reenabling puppet on restbase-dev1001	[production]
18:14	<urandom>	T163936: disabling puppet on restbase-dev1001 (t-shooting c-m-c)	[production]
17:08	<jynus>	restarting replication on all nodes on s7-eqiad T164092	[production]
16:38	<jynus>	stopping replication on all nodes on s7-eqiad in case db1062 boots up in a corrupted state	[production]
16:36	<jynus>	restarting db1062 once more T164092	[production]
15:56	<godog>	poweroff prometheus1004 for ram upgrade - T163385	[production]
15:40	<jynus>	deploying new events_coredb_slave.sql on codfw T160984	[production]
15:21	<godog>	poweroff prometheus1003 for ram upgrade - T163385	[production]
14:55	<gehel>	shutting down elastic2020 for mainboard replacement - T149006	[production]
14:32	<marostegui@naos>	Synchronized wmf-config/db-eqiad.php: Change db1063 IP and rack - T163895 (duration: 00m 48s)	[production]
14:31	<marostegui@naos>	Synchronized wmf-config/db-codfw.php: Change db1063 IP and rack - T163895 (duration: 00m 50s)	[production]
14:04	<marostegui>	Stop and shutdown db1063 - T163895	[production]
14:04	<marostegui@naos>	Synchronized wmf-config/db-eqiad.php: Change db1062 rack location - T163895 (duration: 00m 52s)	[production]
13:59	<moritzm>	installing ghostscript security updates	[production]