2010-07-15
§
|
19:23 |
<mark> |
Replaced all occurrences of 'rr.wikimedia.org' with 'text.wikimedia.org' in DNS |
[production] |
19:14 |
<mark> |
Updated IP of deprecated record rr.esams.wikimedia.org |
[production] |
19:10 |
<mark> |
Started PyBal on amslvs1 with a new config; it automatically picked up the traffic for both text.esams (91.198.174.232) and bits.esams (91.198.174.233) |
[production] |
19:07 |
<mark> |
Stopped PyBal on amslvs1, BGP and OSPF did an automatic failover of bits.esams (91.198.174.233) to amslvs3 |
[production] |
18:59 |
<mark> |
Removed IP 91.198.174.2 (old text squids service ip) from amslvs1. Anyone still using the old IP after weeks will now be unable to reach our sites. |
[production] |
18:56 |
<mark> |
Depooled knsq1-knsq7 in PyBal |
[production] |
17:38 |
<Fred> |
fixed nfs mounts on Bayes. |
[production] |
15:35 |
<apergos> |
chowned /mnt/upload6/private/ExtensionDistributor/mw-snapshot/trunk/extensions tree to extdist. ExtensionDistributor apparently working now |
[production] |
15:01 |
<apergos> |
running svn cleanup on /mnt/upload6/private/ExtensionDistributor/mw-snapshot/trunk/extensions as extdist user |
[production] |
12:34 |
<tstarling> |
synchronizing Wikimedia installation... Revision: 69381 |
[production] |
12:18 |
<Tim> |
svn up/scap to r69380 |
[production] |
05:13 |
<jeluf> |
synchronized php-1.5/wmf-config/InitialiseSettings.php '24321 - ml.wikiquote.org lost its project namespace' |
[production] |
2010-07-14
§
|
23:44 |
<Fred> |
re-added ccron job to periodically save rrds on our ganglia server. (cron job seems to have vanished for some reason) |
[production] |
17:59 |
<catrope> |
synchronized php-1.5/wmf-config/InitialiseSettings.php 'Favicon for wikimaniateamwiki per Guillaume' |
[production] |
16:06 |
<Fred> |
restarted apache on mobile1 (had begun to return 500) |
[production] |
14:07 |
<mark> |
Fixed memcached on srv110 |
[production] |
12:19 |
<mark> |
Fixed ganglia and puppet on stafford |
[production] |
11:54 |
<mark> |
Migrated DNS monitoring to puppet |
[production] |
10:31 |
<mark> |
Migrated ZFS RAID nagios check to puppet |
[production] |
10:14 |
<mark> |
Migrated monitoring of lucene to puppet |
[production] |
09:37 |
<mark> |
Migrated monitoring of image scalers to puppet |
[production] |
08:49 |
<Tim> |
using stafford for some pbuilder experimentation |
[production] |
2010-07-12
§
|
16:54 |
<Fred> |
changed LONGQUERIES check threshold |
[production] |
16:08 |
<Fred> |
restarting morebots since it had died. |
[production] |
16:08 |
<Fred> |
restarting Nagios since it was down. |
[production] |
14:29 |
<mark> |
Added "cfg_file=/etc/nagios/puppet_hosts.cfg" to nagios.cfg |
[production] |
13:25 |
<JeLuF> |
added disk space monitoring for apaches |
[production] |
12:51 |
<jeluf> |
synchronized php-1.5/wmf-config/InitialiseSettings.php '24306 - Create namespaces for Lithuanian Wiktionary' |
[production] |
12:48 |
<jeluf> |
synchronized php-1.5/wmf-config/InitialiseSettings.php '24321 - ml.wikiquote.org lost its project namespace' |
[production] |
12:46 |
<jeluf> |
synchronized php-1.5/wmf-config/InitialiseSettings.php '24321 - ml.wikiquote.org lost its project namespace' |
[production] |
12:41 |
<jeluf> |
synchronized php-1.5/wmf-config/InitialiseSettings.php '24344 - Namespace changes - si.wiktionary' |
[production] |
11:45 |
<JeLuF> |
fixed broken ganglia-metrics installation on srv146 (chown gmetric /var/log/gmetricd/gmetricd.log) |
[production] |
11:41 |
<JeLuF> |
added DPKG status monitoring for all app servers to nagios. Reports all packages that are not in state 'rc' or 'ii'. |
[production] |
10:43 |
<JeLuF> |
lots of false alerts from nagios due to missing SSL setup for NRPE. Working on it. |
[production] |
09:53 |
<JeLuF> |
changed puppet config to install nrpe on all app servers |
[production] |
09:28 |
<JeLuF> |
replacing opsview-nrpe agents by nagios-nrpe agents (image_scalers, some other apaches). Most apaches already use nagios-nrpe |
[production] |
07:40 |
<Tim> |
set up NRPE disk space monitoring on ms4, discovered that /mnt2 is full |
[production] |
04:54 |
<Tim> |
updated NFS host/service groups to monitor the actual NFS servers, not a random collection of miscellaneous ex-NFS servers |
[production] |
04:46 |
<Tim> |
installed NRPE on nfs1 and nfs2 |
[production] |
04:08 |
<Tim> |
adding rendering, m, bits.esams, recursor0, recursor1, recursor0.esams to nagios |
[production] |
04:02 |
<Tim> |
added forward DNS entry for recursor0.esams, modified reverse DNS entry resolver0.esams -> recursor0.esams |
[production] |
03:55 |
<Tim> |
fixed reverse DNS entries for recursor0 and recursor1, were set incorrectly to non-existent hostnames "resolver0" and "recursor1" |
[production] |