501-550 of 10000 results (12ms)
2011-04-26 §
08:21 <Andrew> sync-common-all worked. scap still broken [production]
08:21 <andrew> ran sync-common-all [production]
08:21 <Andrew> trying sync-common-all [production]
08:19 <Andrew> syncs are broken, log littered with XXX: [sudo] password for andrew: [production]
08:12 <Andrew> re-scapping, typo in extension-list [production]
08:12 <andrew> synchronizing Wikimedia installation... Revision: 86895: [production]
08:11 <Andrew> Scapping to enable DisableAccount extension [production]
08:11 <andrew> synchronizing Wikimedia installation... Revision: 86895: [production]
08:02 <andrew> synchronizing Wikimedia installation... Revision: 86895: [production]
08:02 <Andrew> running scap to deploy the code itself [production]
08:01 <Andrew> deploying DisableAccount extension to checkuserwiki, stewardwiki, arbcom_enwiki since the special page was removed without consulting Philippe [production]
02:15 <robh> synchronized php-1.17/wmf-config/InitialiseSettings.php 'adding settings for checkuser and steward wikis' [production]
2011-04-25 §
23:33 <Ryan_Lane> added python-mwclient to lucid repo [production]
21:36 <RobH> storage2 still offline, wont boot into os, but is remotely accessible [production]
21:20 <RobH> trying to fix storage2 [production]
20:16 <notpeter> actually adding everyone on ops to watchmouse service... didn't know this had not already been done. [production]
20:02 <RobH> updated csw1 to removed labels and move to default vlan ports 11/12, 11/14, 11/19, & 11/21. old connection ports for dataset2, tridge, ms1, and ms5 [production]
19:53 <RobH> the datacenter is looking awesome. [production]
19:45 <RobH> ms1 moved from temp network to permanent home, no downtime, responding fine [production]
19:42 <RobH> ms5 connection moved, no downtime, responds fine, less than 4 seconds [production]
19:40 <RobH> updated csw1-sdtpa 15/1,15/2 from vlan 105 to vlan 2, 15/3 and 15/4 from vlan 105 to 101 [production]
18:52 <RobH> snapshot4 relocated to new home, ready for os install [production]
18:42 <RobH> db19 and db20 back online (not in services as they have other issues) [production]
18:39 <RobH> db19 and db20 powering back up [production]
18:25 <RobH> virt4 experienced an accidental reboot when rebalancing power in the rack, my fault, not the hardware [production]
18:12 <RobH> rack b2 power rebalanced [production]
18:01 <RobH> db19 set to slave, depooled in db.php, no other services evident, shutting down (mysql stopped cleanly) [production]
18:00 <RobH> db20 shutdown [production]
18:00 <RobH> didnt log that i setup ports 11/38-40 for db19, db20, and snapshot4 on csw1-sdtpa. tested out fine and all my major configuration changes on netowrk should be complete [production]
17:56 <RobH> ok, db20 and db19 are coming offline to relocate their rack location due to power distro issues [production]
15:47 <RobH> delay, not coming down yet, need more cables [production]
15:46 <RobH> db19 is coming down as well, it is depooled anyhow [production]
15:46 <RobH> db20 is coming down, ganglia aggregation for those hosts may be delayed until it is back online. [production]
15:21 <RobH> relocating snapshot4 into rack c2, it will be offline during this process [production]
15:20 <RobH> db43-db47 network setup, sites not down, yay me [production]
15:10 <RobH> being on csw1 makes robh nervous. [production]
15:09 <RobH> labeling and setting up ports on 11/33 through 11/37 on csw1-sdtpa for db43 through db47 [production]
14:47 <RobH> fixed storage2 serial console (set it to higher rate, magically works, or it just fears me) and also confirmed its remote power control is functioning [production]
14:42 <RobH> stealing dataset1's known good scs connection to test storage2. dataset1 service will remain unaffected. [production]
2011-04-24 §
21:30 <Ryan_Lane> restarting apache on mobile1 [production]
15:35 <RobH> swapping bad disk in db30, hotswap, should be fine [production]
14:36 <RobH> swapping out the management switch in c1-sdtpa. msw-c1-sdtpa will be offline, so the mgmt interfaces of servers in that rack will be offline. all normal services will remain unaffected. [production]
2011-04-23 §
22:31 <RobH> required even. [production]
22:31 <RobH> no drives display error leds, futher investigation requried [production]
22:27 <RobH> ms2 is having bad drive investigated. if we do this right, it wont go down. if we don't it will. is a slave es server. [production]
22:00 <RobH> singer returned to operation, blog, techblog, survey, and secure returned to normal operation [production]
21:52 <RobH> singer is once again coming back down for drive replacement. This will take offline blog.wikimedia.org, techblog.wikimedia.org, survey.wikimedia.org, and secure.wikipedia.org. Service will be returned as soon as possible. [production]
21:19 <RobH> singer back online, for awhile, will come back down for further repair shortly. [production]
21:05 <RobH> singer going down, blogs will be offline, so will secure, system will return to service as soon as possible [production]
21:00 <RobH> preparing to fix the dead drive in singer, this will offline secure, blog, techblog, and survey during the drive replacement process [production]