2010-06-04
§
|
22:52 |
<tomaszf> |
starting webstats with new binary |
[production] |
22:50 |
<tomaszf> |
stopping webstats in prep for update to track mobile stats |
[production] |
19:30 |
<atglenn> |
moved bad snapshots (apr 11 through may 6 2010) to /mnt/dumps/public/bad so public index shows only good dumps and so there will be no prefetch against them |
[production] |
18:47 |
<Fred> |
moved mobile2 to squid vlan / re-ip'ed / dns changed. mobile1 => 115 mobile2 => 116 |
[production] |
18:35 |
<catrope> |
synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix. Gotta kill this thing some time' |
[production] |
18:35 |
<catrope> |
synchronized php-1.5/extensions/UsabilityInitiative/Vector/Vector.combined.min.js 'r67355' |
[production] |
18:34 |
<catrope> |
synchronized php-1.5/extensions/UsabilityInitiative/js/plugins.combined.min.js 'r67355' |
[production] |
12:11 |
<tstarling> |
synchronized php-1.5/wmf-config/InitialiseSettings.php 'WikimediaMobile' |
[production] |
11:37 |
<Tim> |
mobile down for 15 minutes, possibly apache threads exhausted, restarting apache |
[production] |
09:56 |
<catrope> |
synchronized php-1.5/extensions/ContactPage/SpecialContact.php 'r67333' |
[production] |
09:56 |
<domas> |
deployments manage to kill apache processes sometimes |
[production] |
09:50 |
<tstarling> |
synchronizing Wikimedia installation... Revision: 66620 |
[production] |
09:50 |
<Tim> |
pushing out WikimediaMobile (r67331) in preparation for deployment on testwiki |
[production] |
08:44 |
<domas> |
decreased keepalivetimeout and timeout on mobile1 |
[production] |
08:35 |
<Tim> |
on mobile1: reduced max passenger pool size to 200, Domas and I think it's about right, shouldn't exceed allowable memory, should give us close to 100% CPU. |
[production] |
08:26 |
<Tim> |
on mobile1: domas fixed file limit, now 50k |
[production] |
08:10 |
<Tim> |
increasing MaxClients on mobile1 to 1500 |
[production] |
05:01 |
<Fred> |
Added apache2.conf, memcached.conf to puppet receipe for mobile. |
[production] |
03:43 |
<jeluf> |
synchronized php-1.5/wmf-config/InitialiseSettings.php '23784 - Modify add/remove rights for bureaucrats on officewiki' |
[production] |
02:46 |
<Tim> |
mobile1: increased ServerLimit to 1500 and reduced MaxClients to 500 |
[production] |
02:35 |
<Tim> |
on mobile1: increased memcached memory limit from 64M to 5000M |
[production] |
02:15 |
<Tim> |
switched mobile1 over from apache2-mpm-worker to apache2-mpm-prefork (via puppet) |
[production] |
01:03 |
<Tim> |
set ganglia host_dmax to 1 day |
[production] |
2010-06-03
§
|
21:57 |
<Fred> |
mobile1 re-imaged and puppetized. Changed subnet for mobile1. Changed DNS for mobile1. m pointing to newly imaged mobile1 (until transition is completed) |
[production] |
20:55 |
<jeluf> |
synchronized php-1.5/wmf-config/InitialiseSettings.php '23689 - Enable Collection extension on Thai Wikipedia' |
[production] |
20:22 |
<AaronSchulz> |
deployed r67296 FlaggedRevs_alpha |
[production] |
20:21 |
<aaron> |
synchronizing Wikimedia installation... Revision: 66620 |
[production] |
19:39 |
<mark> |
Moved mobile1 switchport from vlan 101 to 100 |
[production] |
19:36 |
<mark> |
Reverted DNS change of mobile1, back to .157 |
[production] |
17:21 |
<Fred> |
mobile1 going to be unreacheable while re-ip'ing |
[production] |
14:05 |
<midom> |
synchronized php-1.5/wmf-config/InitialiseSettings.php 'timezone change for bat-smg' |
[production] |
11:53 |
<mark> |
Made m.wikipedia.org CNAME m.wikipedia.org, m.wikipedia.org A to mobile1/2 in RR |
[production] |
10:57 |
<hcatlin> |
mobile2 has been rebuilt and is featuring the new apache/mobile stack taking 40% of all mobile traffic. pls help monitor on ganglia. |
[production] |
09:04 |
<Tim> |
cleaning COSS on sq45, resynced its configuration, will start squid when done |
[production] |
08:58 |
<Tim> |
kernel reports degraded RAID on sq33, sq34, sq35, sq37, sq38, sq40 |
[production] |
08:39 |
<Tim> |
checked all serial consoles, all nonresponsive, rebooted all |
[production] |
08:23 |
<Tim> |
sq33, sq34, sq35, sq37, sq38, sq40, sq45 have been down for 16-28 days, apparently for no good reason, can't find any log or DT entries. Will try restarts. |
[production] |
07:56 |
<Tim> |
added new squids to nagios |
[production] |
06:36 |
<Tim> |
cleaning cache directories on sq56 to avoid resurrection of expired content |
[production] |
06:35 |
<Tim> |
adding monitoring for rather important service IPs: upload.esams and text.esams |
[production] |
06:22 |
<Tim> |
sq56 not responding to ping or serial console (for 4 days), nothing in racadm getsel, rebooting |
[production] |
06:07 |
<tstarling> |
synchronized php-1.5/wmf-config/InitialiseSettings.php 'disabling ClickTracking due to CR r58099' |
[production] |
05:24 |
<Tim> |
started apache on srv216, was stopped for some reason |
[production] |
03:57 |
<Fred> |
shutting down mailman on list for a few minutes while exim and spamd catch up |
[production] |