2009-02-17
§
|
23:58 |
<Rob> |
srv217-srv223 installed and online as apache servers. Updated dsh groups and nagios, as well as pybal |
[production] |
23:24 |
<Rob> |
installed OS on srv217-srv223, moving on to package installation. |
[production] |
21:12 |
<Rob> |
reinstalling srv209, which thought it was srv208. silly server. srv208 has not been installed, gave to tomasz to check against setup checklist. |
[production] |
21:05 |
<Rob> |
actually, srv209 installed as 208, bad dhcp entry. Fixing |
[production] |
21:04 |
<Rob> |
pulling srv208 and srv209 for quick reboots, their drac ips are wrong. |
[production] |
21:04 |
<Rob> |
racked srv217-223 (also racked srv224/225 but no power yet) |
[production] |
18:30 |
<brion> |
starting a batch run of update-special-pages-small just to ensure it actually works |
[production] |
18:25 |
<brion> |
fixed hardcoded /usr/local path for PHP and use of obsolete /etc/cluster in update-special-pages and update-special-pages-small; removing misleading log files ([[bugzilla:17534]]) |
[production] |
01:49 |
<Tim> |
deleting all enotif jobs from the job queue, there is still a huge backlog |
[production] |
2009-02-16
§
|
16:46 |
<mark> |
Did emergency rollback of squid 2.7.6 to squid 2.6.21 because of incompatible HTTP Host: header |
[production] |
16:21 |
<Rob> |
stopped upgrades, sq36 completed before stop |
[production] |
16:17 |
<Rob> |
performing upgrades to sq35-sq38 (not depooling in pybal, letting pybal handle that automatically) |
[production] |
16:16 |
<Rob> |
performed dist-upgrade on sq31-34 |
[production] |
15:35 |
<Rob> |
depooled sq31-sq34 for upgrade |
[production] |
08:12 |
<Tim> |
patched in r47309, Article.php tweak |
[production] |
05:00 |
<Tim> |
made runJobs.php log to UDP instead of via stdout and NFS |
[production] |
04:53 |
<Tim> |
fixed incorrect host keys in /etc/ssh/ssh_known_hosts for srv38, srv39 and srv77 |
[production] |
04:13 |
<Tim> |
removing all refreshLinks2 jobs from the job queue, duplicate removal is broken so to clear the backlog it's better to just run maintenance/refreshLinks.php |
[production] |
2009-02-15
§
|
21:59 |
<mark> |
Experimentally blocked non GET/HEAD HTTP methods on sq3 frontend squid |
[production] |
16:15 |
<mark> |
Upgraded PyBal on lvs2 - others will follow |
[production] |
13:11 |
<domas> |
db23 has multiple MCEs for same dimm logged: http://p.defau.lt/?IarKD4gbFhe5RmaV0RB_Xg |
[production] |
12:38 |
<domas> |
in wikistats, placed older than 10 days files into ./archive/yyyy/mm/ - maybe will make flack crash less :)) |
[production] |
11:56 |
<mark> |
Doing Squid memleak searching on sq1 with valgrind, pooled with weight 1 in LVS |
[production] |
03:09 |
<Andrew> |
CentralNotice still not working properly, and when we tried to set it to testwiki-only, it never came up. Left it on testwiki only for the time being, until somebody who knows CentralNotice can take a look at it. |
[production] |
02:21 |
<Tim> |
fixed permissions on the rest of the logs in /home/wikipedia/logs/norotate (fixes centralnotice) |
[production] |
2009-02-14
§
|
19:19 |
<Az1568_> |
re-enabled CentralNotice on testwiki to try and find the problem (we've had this before, but fixed it somehow...possibly with a regen? See November 16th log.) |
[production] |
18:34 |
<domas> |
filed a bug at https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/329489 - could use some Canonical escalation too |
[production] |
18:26 |
<domas> |
same affected srv47 - this is related to switching locking to fcntl() - this drives apparmor crazy |
[production] |
17:47 |
<domas> |
srv178 kernel memleaked few gigs. blame: apparmor |
[production] |
14:34 |
<domas> |
srv215 very much dead, doesn't show vitality signs even after serveractionhardreset |
[production] |
14:28 |
<domas> |
correction, srv208.mgmt is pointing to uninstalled box |
[production] |
14:27 |
<domas> |
DRAC serial on all new boxes is ttyS1 which is not in securetty |
[production] |
14:24 |
<domas> |
srv209.mgmt is actually srv208's SP, and srv208.mgmt is pointing to dead box |
[production] |
14:15 |
<domas> |
srv209,215 down? |
[production] |
13:43 |
<domas> |
installing php5-apc-3.0.19-1wm2 (no more futexes) on all ubuntu appservers. |
[production] |
11:01 |
<Andrew> |
test |
[production] |
2009-02-13
§
|
22:10 |
<mark> |
esams squid upgrade complete |
[production] |
21:05 |
<RobH> |
deployed srv207-srv216 in apaches cluster |
[production] |
20:34 |
<RobH> |
added new servers to nagois and restarted it |
[production] |
20:15 |
<RobH> |
setup all node groups, ganglia, apache, so on for srv199-srv206 and added into rotation |
[production] |
19:38 |
<mark> |
Upgrading esams squids to 2.7.6 |
[production] |
18:36 |
<mark> |
Upgraded squid on sq1 to 2.7.6 and rebooted the box |
[production] |
18:03 |
<mark> |
Memory leak issues on the upload frontend squids, which started in November |
[production] |
18:01 |
<RobH> |
sq13 back online, seems there is a memory leak, go mark for finding =] |
[production] |
17:54 |
<RobH> |
lomaria install done for domas |
[production] |
17:49 |
<RobH> |
rebooting sq13 due to it failing out in ganglia, OOM error evident. |
[production] |
17:48 |
<RobH> |
reinstalling lomaria per domas request |
[production] |
17:37 |
<RobH> |
sq8 was out of memory and locked up, rebooted, cleaned cache, and bringing back online |
[production] |
17:34 |
<RobH> |
srv38 and srv39 back in rotation |
[production] |