2011-04-25
§
|
23:33 |
<Ryan_Lane> |
added python-mwclient to lucid repo |
[production] |
21:36 |
<RobH> |
storage2 still offline, wont boot into os, but is remotely accessible |
[production] |
21:20 |
<RobH> |
trying to fix storage2 |
[production] |
20:16 |
<notpeter> |
actually adding everyone on ops to watchmouse service... didn't know this had not already been done. |
[production] |
20:02 |
<RobH> |
updated csw1 to removed labels and move to default vlan ports 11/12, 11/14, 11/19, & 11/21. old connection ports for dataset2, tridge, ms1, and ms5 |
[production] |
19:53 |
<RobH> |
the datacenter is looking awesome. |
[production] |
19:45 |
<RobH> |
ms1 moved from temp network to permanent home, no downtime, responding fine |
[production] |
19:42 |
<RobH> |
ms5 connection moved, no downtime, responds fine, less than 4 seconds |
[production] |
19:40 |
<RobH> |
updated csw1-sdtpa 15/1,15/2 from vlan 105 to vlan 2, 15/3 and 15/4 from vlan 105 to 101 |
[production] |
18:52 |
<RobH> |
snapshot4 relocated to new home, ready for os install |
[production] |
18:42 |
<RobH> |
db19 and db20 back online (not in services as they have other issues) |
[production] |
18:39 |
<RobH> |
db19 and db20 powering back up |
[production] |
18:25 |
<RobH> |
virt4 experienced an accidental reboot when rebalancing power in the rack, my fault, not the hardware |
[production] |
18:12 |
<RobH> |
rack b2 power rebalanced |
[production] |
18:01 |
<RobH> |
db19 set to slave, depooled in db.php, no other services evident, shutting down (mysql stopped cleanly) |
[production] |
18:00 |
<RobH> |
db20 shutdown |
[production] |
18:00 |
<RobH> |
didnt log that i setup ports 11/38-40 for db19, db20, and snapshot4 on csw1-sdtpa. tested out fine and all my major configuration changes on netowrk should be complete |
[production] |
17:56 |
<RobH> |
ok, db20 and db19 are coming offline to relocate their rack location due to power distro issues |
[production] |
15:47 |
<RobH> |
delay, not coming down yet, need more cables |
[production] |
15:46 |
<RobH> |
db19 is coming down as well, it is depooled anyhow |
[production] |
15:46 |
<RobH> |
db20 is coming down, ganglia aggregation for those hosts may be delayed until it is back online. |
[production] |
15:21 |
<RobH> |
relocating snapshot4 into rack c2, it will be offline during this process |
[production] |
15:20 |
<RobH> |
db43-db47 network setup, sites not down, yay me |
[production] |
15:10 |
<RobH> |
being on csw1 makes robh nervous. |
[production] |
15:09 |
<RobH> |
labeling and setting up ports on 11/33 through 11/37 on csw1-sdtpa for db43 through db47 |
[production] |
14:47 |
<RobH> |
fixed storage2 serial console (set it to higher rate, magically works, or it just fears me) and also confirmed its remote power control is functioning |
[production] |
14:42 |
<RobH> |
stealing dataset1's known good scs connection to test storage2. dataset1 service will remain unaffected. |
[production] |
2011-04-23
§
|
22:31 |
<RobH> |
required even. |
[production] |
22:31 |
<RobH> |
no drives display error leds, futher investigation requried |
[production] |
22:27 |
<RobH> |
ms2 is having bad drive investigated. if we do this right, it wont go down. if we don't it will. is a slave es server. |
[production] |
22:00 |
<RobH> |
singer returned to operation, blog, techblog, survey, and secure returned to normal operation |
[production] |
21:52 |
<RobH> |
singer is once again coming back down for drive replacement. This will take offline blog.wikimedia.org, techblog.wikimedia.org, survey.wikimedia.org, and secure.wikipedia.org. Service will be returned as soon as possible. |
[production] |
21:19 |
<RobH> |
singer back online, for awhile, will come back down for further repair shortly. |
[production] |
21:05 |
<RobH> |
singer going down, blogs will be offline, so will secure, system will return to service as soon as possible |
[production] |
21:00 |
<RobH> |
preparing to fix the dead drive in singer, this will offline secure, blog, techblog, and survey during the drive replacement process |
[production] |
19:50 |
<mark> |
Upgrading mr1-pmtpa to junos 10.4R3.4 |
[production] |
17:49 |
<RobH> |
migrating searchidx1 & search1-search10 to new ports in same rack. moving one at a time and ensuring link lights between moves. (already tested with search10) |
[production] |
14:11 |
<RobH> |
db19 is back online, seems to not have any mysql setup done. |
[production] |
14:02 |
<RobH> |
restarting db19 |
[production] |
14:02 |
<RobH> |
arcconf checks out all drives on db19 are indeed working as rich found earlier |
[production] |
12:47 |
<mark> |
Added (x121Address=1) condition to the LDAP query of the ldap_aliases router on mchenry's exim |
[production] |
00:32 |
<hcatlin> |
Mobile: Deploying fix to an issue that kept the standard-style Main_Page from displaying on mobile |
[production] |
00:25 |
<Ryan_Lane> |
restarting memcached on all of the mobile servers |
[production] |
00:23 |
<Ryan_Lane> |
repooling mobile3, since mobile will die without it (fun!!) |
[production] |
00:17 |
<Ryan_Lane> |
depooling mobile3 |
[production] |
00:13 |
<Ryan_Lane> |
restarting apache on mobile3 |
[production] |
00:10 |
<Ryan_Lane> |
puppet was broken on mobile1, reinstalled it |
[production] |