production SAL

6401-6450 of 10000 results (28ms)

2012-07-02 §
15:04	<mutante>	authdns-update to switch jobs.wm redirect to wikimedia-lb to fix SSL cert mismatch (RT-3071)	[production]
14:55	<mark>	Reboot of cr1-sdtpa did not fix the RE packet loss issue... therefore unlikely to be leap second related	[production]
14:41	<mark>	Rebooting cr1-sdtpa	[production]
14:37	<mark>	Shutdown PyBal BGP sessions on cr1-sdtpa	[production]
14:34	<mark>	Shutdown BGP session to 2828 on cr1-sdtpa	[production]
13:36	<hashar>	db12 suffering some 1400sec (and growing) replag. mysqldump in progress on that host.	[production]
12:35	<mutante>	installing upgrades on fenari (linux-firmware linux-libc-dev..)	[production]
12:27	<mutante>	rebooting gallium one more time to install kernel	[production]
12:26	<mutante>	upgrading kernel on gallium	[production]
12:23	<hashar>	synchronized live-1.5/CREDITS	[production]
11:31	<mark>	Now we have packet loss within pmtpa/sdtpa... reverting change	[production]
10:57	<mark>	Problems on one of two pmtpa-eqiad waves; raised OSPF metric to 60 to failover traffic to the other link	[production]
10:50	<Tim>	fixing leap second issue on bastion1 by rebooting it	[production]
10:47	<Tim>	fixed leap second issue on bastion-restricted	[production]
09:57	<Tim>	fixing leap second issue on virt1,virt2,virt3,virt4,virt5	[production]
09:53	<Tim>	fixing leap second issue on aluminium,gallium,manganese	[production]
09:47	<Tim>	fixing leap second issue on formey,grosley,hooper,sanger,sockpuppet	[production]
09:43	<Tim>	on fenari: fixed leap second issue with the mozilla method	[production]
09:39	<apergos>	rebooting gallium, it's pretty unhappy (maybe related to leap second issue)	[production]
08:14	<hashar:>	srv190 srv266 srv281 timeouts on sync-file	[production]
08:14	<hashar>	synchronized wmf-config/InitialiseSettings.php 'Bug 37457 - fix import sources for viwikibooks'	[production]
08:11	<hashar>	Stopped Jenkins on gallium. It is not doing anything anyway. Asked to reboot box {{rt\|3208}}	[production]
02:53	<LocalisationUpdate>	completed (1.20wmf5) at Mon Jul 2 02:53:51 UTC 2012	[production]
02:28	<LocalisationUpdate>	completed (1.20wmf6) at Mon Jul 2 02:28:48 UTC 2012	[production]
01:48	<Tim>	kill -CONT on populateRevisionSha1.php processes	[production]
00:47	<Tim>	on nfs1: trying leap second fix suggested at https://bugzilla.mozilla.org/show_bug.cgi?id=769972#c5	[production]
00:26	<tstarling>	synchronized wmf-config/db.php 'reduce db32 read load to zero due to persistent lag'	[production]
00:12	<Tim>	switched enwiki back to r/w	[production]
00:12	<tstarling>	synchronized wmf-config/db.php	[production]
00:06	<Tim>	on hume: stopped all populateRevisionSha1.php processes with kill -STOP	[production]
00:03	<reedy>	synchronized wmf-config/db.php 's1/enwiki into readonly'	[production]
2012-07-01 §
19:12	<reedy>	synchronized php-1.20wmf6/extensions/WikimediaMaintenance/ 'Update to master for hashar'	[production]
17:55	<aaron>	synchronized php-1.20wmf5/includes/WikiPage.php 'more logging'	[production]
17:45	<aaron>	synchronized php-1.20wmf5/includes/WikiPage.php 'more logging'	[production]
17:43	<aaron>	synchronized php-1.20wmf5/includes/WikiPage.php 'more logging'	[production]
17:32	<aaron>	synchronized php-1.20wmf5/includes/WikiPage.php	[production]
17:30	<aaron>	synchronized php-1.20wmf5/includes/WikiPage.php	[production]
16:53	<aaron>	synchronized php-1.20wmf5/includes/WikiPage.php	[production]
16:48	<aaron>	synchronized php-1.20wmf5/includes/WikiPage.php 'logging'	[production]
12:54	<notpeter>	also going to reboot all pmtpa search nodes. not in prod, but are still freaking out from leap second bug.	[production]
05:33	<aaron>	synchronized php-1.20wmf5/includes/WikiPage.php 'logging'	[production]
04:25	<LocalisationUpdate>	completed (1.20wmf5) at Sun Jul 1 04:25:25 UTC 2012	[production]
04:06	<Ryan_Lane>	virt1000 is back up, rebooting virt0	[production]
04:02	<Ryan_Lane>	rebooting virt1000	[production]
03:16	<LocalisationUpdate>	completed (1.20wmf6) at Sun Jul 1 03:16:39 UTC 2012	[production]
01:43	<notpeter>	that worked. restarting all remaining search nodes.	[production]
01:39	<notpeter>	problem with lucene persisting through service restart, but not node restart. restarting en pool nodes.	[production]
01:20	<paravoid>	restarting opendj (nfs1/nfs2), load spike, possibly related to leap second	[production]
00:51	<notpeter>	search1004 dead. powercycling.	[production]
00:50	<notpeter>	based on ganglia evidence, lucene seems to have been affected by leap second bug. restartig each instance, one minute wait in between	[production]