5801-5850 of 10000 results (36ms)
2011-04-25 §
18:42 <RobH> db19 and db20 back online (not in services as they have other issues) [production]
18:39 <RobH> db19 and db20 powering back up [production]
18:25 <RobH> virt4 experienced an accidental reboot when rebalancing power in the rack, my fault, not the hardware [production]
18:12 <RobH> rack b2 power rebalanced [production]
18:01 <RobH> db19 set to slave, depooled in db.php, no other services evident, shutting down (mysql stopped cleanly) [production]
18:00 <RobH> db20 shutdown [production]
18:00 <RobH> didnt log that i setup ports 11/38-40 for db19, db20, and snapshot4 on csw1-sdtpa. tested out fine and all my major configuration changes on netowrk should be complete [production]
17:56 <RobH> ok, db20 and db19 are coming offline to relocate their rack location due to power distro issues [production]
15:47 <RobH> delay, not coming down yet, need more cables [production]
15:46 <RobH> db19 is coming down as well, it is depooled anyhow [production]
15:46 <RobH> db20 is coming down, ganglia aggregation for those hosts may be delayed until it is back online. [production]
15:21 <RobH> relocating snapshot4 into rack c2, it will be offline during this process [production]
15:20 <RobH> db43-db47 network setup, sites not down, yay me [production]
15:10 <RobH> being on csw1 makes robh nervous. [production]
15:09 <RobH> labeling and setting up ports on 11/33 through 11/37 on csw1-sdtpa for db43 through db47 [production]
14:47 <RobH> fixed storage2 serial console (set it to higher rate, magically works, or it just fears me) and also confirmed its remote power control is functioning [production]
14:42 <RobH> stealing dataset1's known good scs connection to test storage2. dataset1 service will remain unaffected. [production]
2011-04-24 §
21:30 <Ryan_Lane> restarting apache on mobile1 [production]
15:35 <RobH> swapping bad disk in db30, hotswap, should be fine [production]
14:36 <RobH> swapping out the management switch in c1-sdtpa. msw-c1-sdtpa will be offline, so the mgmt interfaces of servers in that rack will be offline. all normal services will remain unaffected. [production]
2011-04-23 §
22:31 <RobH> required even. [production]
22:31 <RobH> no drives display error leds, futher investigation requried [production]
22:27 <RobH> ms2 is having bad drive investigated. if we do this right, it wont go down. if we don't it will. is a slave es server. [production]
22:00 <RobH> singer returned to operation, blog, techblog, survey, and secure returned to normal operation [production]
21:52 <RobH> singer is once again coming back down for drive replacement. This will take offline blog.wikimedia.org, techblog.wikimedia.org, survey.wikimedia.org, and secure.wikipedia.org. Service will be returned as soon as possible. [production]
21:19 <RobH> singer back online, for awhile, will come back down for further repair shortly. [production]
21:05 <RobH> singer going down, blogs will be offline, so will secure, system will return to service as soon as possible [production]
21:00 <RobH> preparing to fix the dead drive in singer, this will offline secure, blog, techblog, and survey during the drive replacement process [production]
19:50 <mark> Upgrading mr1-pmtpa to junos 10.4R3.4 [production]
17:49 <RobH> migrating searchidx1 & search1-search10 to new ports in same rack. moving one at a time and ensuring link lights between moves. (already tested with search10) [production]
14:11 <RobH> db19 is back online, seems to not have any mysql setup done. [production]
14:02 <RobH> restarting db19 [production]
14:02 <RobH> arcconf checks out all drives on db19 are indeed working as rich found earlier [production]
12:47 <mark> Added (x121Address=1) condition to the LDAP query of the ldap_aliases router on mchenry's exim [production]
00:32 <hcatlin> Mobile: Deploying fix to an issue that kept the standard-style Main_Page from displaying on mobile [production]
00:25 <Ryan_Lane> restarting memcached on all of the mobile servers [production]
00:23 <Ryan_Lane> repooling mobile3, since mobile will die without it (fun!!) [production]
00:17 <Ryan_Lane> depooling mobile3 [production]
00:13 <Ryan_Lane> restarting apache on mobile3 [production]
00:10 <Ryan_Lane> puppet was broken on mobile1, reinstalled it [production]
2011-04-22 §
23:56 <domas> detached gdb from srv193 apache, apparently it was used for something [production]
23:14 <notpeter> restarting nagios (again)wq [production]
22:43 <notpeter> restarting nagios [production]
19:23 <apergos> shot all stopped rsyncs on ms5 (that were copying from ms4 about two weeks ago), changed all perms on the directories they had reached so thumbs can be served/read from them.. oh. not me, someone else must have done it, I'm not here :-P [production]
19:02 <RobH> ms4 shutting down for memory troubleshooting [production]
18:52 <RobH> ms4 troubleshooting, disragrd bounces] [production]
18:51 <notpeter> restarting nagios [production]
12:41 <hcatlin> Restarting mobile cluster with April code update. [production]
00:49 <notpeter> restarting nagios. hopefully now with more sms! [production]
2011-04-21 §
23:32 <midom> synchronized php-1.17/includes/ImagePage.php [production]