2009-01-08
§
|
22:08 |
<brion> |
putting db12 back in service, caught up |
[production] |
21:42 |
<RobH> |
changed the ip address for the management interfaces on sq31-sq50 |
[production] |
21:30 |
<RobH> |
updated dns with the squids and srv mangement info for pmtpa |
[production] |
21:16 |
<brion> |
taking load off db12 while it updates |
[production] |
21:15 |
<brion> |
killing stuck query threads on db12 (lagged 13k seconds) |
[production] |
20:23 |
<RobH> |
updated dns removing a large number of decommissioned servers from records. |
[production] |
20:08 |
<RobH> |
pushed updates to dns for mangement ip allocations, changed mangement ips of search8-search12 |
[production] |
19:43 |
<RobH> |
changed the mangement ip addresses of db5-db10 to fit into current ip scheme |
[production] |
18:20 |
<RobH> |
updated dns for the management name resolution of db11-db30 |
[production] |
18:11 |
<RobH> |
ms5 has lom access enabled and is ready for testing. (Only one ethernet connection in lieu of the typical 3 on the thumper/thors) |
[production] |
15:50 |
<RobH> |
srv118 reinstalled |
[production] |
15:46 |
<RobH> |
srv136 is borked. Even after reinstall, it will run for a few minutes, then lock hard. Going to RMA it. |
[production] |
15:38 |
<RobH> |
reinstalled srv136 and srv118 cuz they were pissing me off (a valid reinstallation reason if there ever was one.) |
[production] |
15:09 |
<RobH> |
and srv118 back down, thing is borked. |
[production] |
15:06 |
<RobH> |
srv118 back online and serving requests. |
[production] |
15:01 |
<RobH> |
pushed db13 back into cluster, same with db14, from yesterdays work |
[production] |
14:26 |
<RobH> |
srv101 back online and in lvs |
[production] |
14:15 |
<RobH> |
reinstalled srv101, installing wikimedia-task-app packages now |
[production] |
06:37 |
<JeLuF> |
rebooted db18. Mysqld was stuck but couldn't be killed. |
[production] |
04:08 |
<Tim> |
migrated all locked wikis from $wgReadOnly(File) to permissions-based locking, so that stewards can edit the alternate project links, and so that various MediaWiki components don't break on page view |
[production] |
03:57 |
<river> |
set up ms3/ms4 with solaris 10 update 6 |
[production] |
2009-01-07
§
|
22:50 |
<RobH> |
db13 and db14 are replicating but not in the cluster (not sure if they are caught up) |
[production] |
22:35 |
<RobH> |
updated power strip information for ps1-a1-sdtpa and balanced load |
[production] |
22:35 |
<RobH> |
reseated mrj cable for csw1-sdtpa_1/13 |
[production] |
21:36 |
<RobH> |
started up db13 and db14 |
[production] |
21:19 |
<RobH> |
updating firmware on db13-db14 |
[production] |
21:15 |
<RobH> |
shutdown db13 and db14 to fix lom lockup issue. |
[production] |
20:52 |
<RobH> |
depooled db13 and db14 in db.php to reboot them and fix the SP lockup issue. |
[production] |
20:49 |
<RobH> |
updating firmware on db16. |
[production] |
20:43 |
<RobH> |
started mysql back up on db15 |
[production] |
20:42 |
<RobH> |
cold reset of db16 to resolve lom issue. will update firmware upon boot. |
[production] |
20:39 |
<RobH> |
swappned hostnames on ms3 and ms4, updated racktables and dns to reflect change |
[production] |
20:24 |
<brion> |
disabled wikidiff2 since it's not installed, and this apparaently is nicely broken |
[production] |
20:21 |
<RobH> |
db15 now responsive to lom and ready to be re-integrated into the cluster |
[production] |
20:12 |
<RobH> |
db15 cold reset fixes the LOM non-responsive issue. Upgrading its firmware to prevent future issues. |
[production] |
20:06 |
<brion> |
removed stray whitespace from wikitech config file which was breaking rss feeds |
[production] |