2011-04-26
§
|
19:50 |
<RoanKattouw> |
Running sync-common-all to deploy UploadWizard changse |
[production] |
17:52 |
<pdhanda> |
Running maintenance/populateParentId.php on all wikis |
[production] |
08:21 |
<Andrew> |
sync-common-all worked. scap still broken |
[production] |
08:21 |
<andrew> |
ran sync-common-all |
[production] |
08:21 |
<Andrew> |
trying sync-common-all |
[production] |
08:19 |
<Andrew> |
syncs are broken, log littered with XXX: [sudo] password for andrew: |
[production] |
08:12 |
<Andrew> |
re-scapping, typo in extension-list |
[production] |
08:12 |
<andrew> |
synchronizing Wikimedia installation... Revision: 86895: |
[production] |
08:11 |
<Andrew> |
Scapping to enable DisableAccount extension |
[production] |
08:11 |
<andrew> |
synchronizing Wikimedia installation... Revision: 86895: |
[production] |
08:02 |
<andrew> |
synchronizing Wikimedia installation... Revision: 86895: |
[production] |
08:02 |
<Andrew> |
running scap to deploy the code itself |
[production] |
08:01 |
<Andrew> |
deploying DisableAccount extension to checkuserwiki, stewardwiki, arbcom_enwiki since the special page was removed without consulting Philippe |
[production] |
02:15 |
<robh> |
synchronized php-1.17/wmf-config/InitialiseSettings.php 'adding settings for checkuser and steward wikis' |
[production] |
2011-04-25
§
|
23:33 |
<Ryan_Lane> |
added python-mwclient to lucid repo |
[production] |
21:36 |
<RobH> |
storage2 still offline, wont boot into os, but is remotely accessible |
[production] |
21:20 |
<RobH> |
trying to fix storage2 |
[production] |
20:16 |
<notpeter> |
actually adding everyone on ops to watchmouse service... didn't know this had not already been done. |
[production] |
20:02 |
<RobH> |
updated csw1 to removed labels and move to default vlan ports 11/12, 11/14, 11/19, & 11/21. old connection ports for dataset2, tridge, ms1, and ms5 |
[production] |
19:53 |
<RobH> |
the datacenter is looking awesome. |
[production] |
19:45 |
<RobH> |
ms1 moved from temp network to permanent home, no downtime, responding fine |
[production] |
19:42 |
<RobH> |
ms5 connection moved, no downtime, responds fine, less than 4 seconds |
[production] |
19:40 |
<RobH> |
updated csw1-sdtpa 15/1,15/2 from vlan 105 to vlan 2, 15/3 and 15/4 from vlan 105 to 101 |
[production] |
18:52 |
<RobH> |
snapshot4 relocated to new home, ready for os install |
[production] |
18:42 |
<RobH> |
db19 and db20 back online (not in services as they have other issues) |
[production] |
18:39 |
<RobH> |
db19 and db20 powering back up |
[production] |
18:25 |
<RobH> |
virt4 experienced an accidental reboot when rebalancing power in the rack, my fault, not the hardware |
[production] |
18:12 |
<RobH> |
rack b2 power rebalanced |
[production] |
18:01 |
<RobH> |
db19 set to slave, depooled in db.php, no other services evident, shutting down (mysql stopped cleanly) |
[production] |
18:00 |
<RobH> |
db20 shutdown |
[production] |
18:00 |
<RobH> |
didnt log that i setup ports 11/38-40 for db19, db20, and snapshot4 on csw1-sdtpa. tested out fine and all my major configuration changes on netowrk should be complete |
[production] |
17:56 |
<RobH> |
ok, db20 and db19 are coming offline to relocate their rack location due to power distro issues |
[production] |
15:47 |
<RobH> |
delay, not coming down yet, need more cables |
[production] |
15:46 |
<RobH> |
db19 is coming down as well, it is depooled anyhow |
[production] |
15:46 |
<RobH> |
db20 is coming down, ganglia aggregation for those hosts may be delayed until it is back online. |
[production] |
15:21 |
<RobH> |
relocating snapshot4 into rack c2, it will be offline during this process |
[production] |
15:20 |
<RobH> |
db43-db47 network setup, sites not down, yay me |
[production] |
15:10 |
<RobH> |
being on csw1 makes robh nervous. |
[production] |
15:09 |
<RobH> |
labeling and setting up ports on 11/33 through 11/37 on csw1-sdtpa for db43 through db47 |
[production] |
14:47 |
<RobH> |
fixed storage2 serial console (set it to higher rate, magically works, or it just fears me) and also confirmed its remote power control is functioning |
[production] |
14:42 |
<RobH> |
stealing dataset1's known good scs connection to test storage2. dataset1 service will remain unaffected. |
[production] |
2011-04-23
§
|
22:31 |
<RobH> |
required even. |
[production] |
22:31 |
<RobH> |
no drives display error leds, futher investigation requried |
[production] |
22:27 |
<RobH> |
ms2 is having bad drive investigated. if we do this right, it wont go down. if we don't it will. is a slave es server. |
[production] |
22:00 |
<RobH> |
singer returned to operation, blog, techblog, survey, and secure returned to normal operation |
[production] |
21:52 |
<RobH> |
singer is once again coming back down for drive replacement. This will take offline blog.wikimedia.org, techblog.wikimedia.org, survey.wikimedia.org, and secure.wikipedia.org. Service will be returned as soon as possible. |
[production] |
21:19 |
<RobH> |
singer back online, for awhile, will come back down for further repair shortly. |
[production] |