production SAL

501-550 of 10000 results (16ms)

2011-04-26 §
08:21	<Andrew>	sync-common-all worked. scap still broken	[production]
08:21	<andrew>	ran sync-common-all	[production]
08:21	<Andrew>	trying sync-common-all	[production]
08:19	<Andrew>	syncs are broken, log littered with XXX: [sudo] password for andrew:	[production]
08:12	<Andrew>	re-scapping, typo in extension-list	[production]
08:12	<andrew>	synchronizing Wikimedia installation... Revision: 86895:	[production]
08:11	<Andrew>	Scapping to enable DisableAccount extension	[production]
08:11	<andrew>	synchronizing Wikimedia installation... Revision: 86895:	[production]
08:02	<andrew>	synchronizing Wikimedia installation... Revision: 86895:	[production]
08:02	<Andrew>	running scap to deploy the code itself	[production]
08:01	<Andrew>	deploying DisableAccount extension to checkuserwiki, stewardwiki, arbcom_enwiki since the special page was removed without consulting Philippe	[production]
02:15	<robh>	synchronized php-1.17/wmf-config/InitialiseSettings.php 'adding settings for checkuser and steward wikis'	[production]
2011-04-25 §
23:33	<Ryan_Lane>	added python-mwclient to lucid repo	[production]
21:36	<RobH>	storage2 still offline, wont boot into os, but is remotely accessible	[production]
21:20	<RobH>	trying to fix storage2	[production]
20:16	<notpeter>	actually adding everyone on ops to watchmouse service... didn't know this had not already been done.	[production]
20:02	<RobH>	updated csw1 to removed labels and move to default vlan ports 11/12, 11/14, 11/19, & 11/21. old connection ports for dataset2, tridge, ms1, and ms5	[production]
19:53	<RobH>	the datacenter is looking awesome.	[production]
19:45	<RobH>	ms1 moved from temp network to permanent home, no downtime, responding fine	[production]
19:42	<RobH>	ms5 connection moved, no downtime, responds fine, less than 4 seconds	[production]
19:40	<RobH>	updated csw1-sdtpa 15/1,15/2 from vlan 105 to vlan 2, 15/3 and 15/4 from vlan 105 to 101	[production]
18:52	<RobH>	snapshot4 relocated to new home, ready for os install	[production]
18:42	<RobH>	db19 and db20 back online (not in services as they have other issues)	[production]
18:39	<RobH>	db19 and db20 powering back up	[production]
18:25	<RobH>	virt4 experienced an accidental reboot when rebalancing power in the rack, my fault, not the hardware	[production]
18:12	<RobH>	rack b2 power rebalanced	[production]
18:01	<RobH>	db19 set to slave, depooled in db.php, no other services evident, shutting down (mysql stopped cleanly)	[production]
18:00	<RobH>	db20 shutdown	[production]
18:00	<RobH>	didnt log that i setup ports 11/38-40 for db19, db20, and snapshot4 on csw1-sdtpa. tested out fine and all my major configuration changes on netowrk should be complete	[production]
17:56	<RobH>	ok, db20 and db19 are coming offline to relocate their rack location due to power distro issues	[production]
15:47	<RobH>	delay, not coming down yet, need more cables	[production]
15:46	<RobH>	db19 is coming down as well, it is depooled anyhow	[production]
15:46	<RobH>	db20 is coming down, ganglia aggregation for those hosts may be delayed until it is back online.	[production]
15:21	<RobH>	relocating snapshot4 into rack c2, it will be offline during this process	[production]
15:20	<RobH>	db43-db47 network setup, sites not down, yay me	[production]
15:10	<RobH>	being on csw1 makes robh nervous.	[production]
15:09	<RobH>	labeling and setting up ports on 11/33 through 11/37 on csw1-sdtpa for db43 through db47	[production]
14:47	<RobH>	fixed storage2 serial console (set it to higher rate, magically works, or it just fears me) and also confirmed its remote power control is functioning	[production]
14:42	<RobH>	stealing dataset1's known good scs connection to test storage2. dataset1 service will remain unaffected.	[production]
2011-04-24 §
21:30	<Ryan_Lane>	restarting apache on mobile1	[production]
15:35	<RobH>	swapping bad disk in db30, hotswap, should be fine	[production]
14:36	<RobH>	swapping out the management switch in c1-sdtpa. msw-c1-sdtpa will be offline, so the mgmt interfaces of servers in that rack will be offline. all normal services will remain unaffected.	[production]
2011-04-23 §
22:31	<RobH>	required even.	[production]
22:31	<RobH>	no drives display error leds, futher investigation requried	[production]
22:27	<RobH>	ms2 is having bad drive investigated. if we do this right, it wont go down. if we don't it will. is a slave es server.	[production]
22:00	<RobH>	singer returned to operation, blog, techblog, survey, and secure returned to normal operation	[production]
21:52	<RobH>	singer is once again coming back down for drive replacement. This will take offline blog.wikimedia.org, techblog.wikimedia.org, survey.wikimedia.org, and secure.wikipedia.org. Service will be returned as soon as possible.	[production]
21:19	<RobH>	singer back online, for awhile, will come back down for further repair shortly.	[production]
21:05	<RobH>	singer going down, blogs will be offline, so will secure, system will return to service as soon as possible	[production]
21:00	<RobH>	preparing to fix the dead drive in singer, this will offline secure, blog, techblog, and survey during the drive replacement process	[production]