3601-3650 of 10000 results (13ms)
2010-07-15 §
19:10 <mark> Started PyBal on amslvs1 with a new config; it automatically picked up the traffic for both text.esams (91.198.174.232) and bits.esams (91.198.174.233) [production]
19:07 <mark> Stopped PyBal on amslvs1, BGP and OSPF did an automatic failover of bits.esams (91.198.174.233) to amslvs3 [production]
18:59 <mark> Removed IP 91.198.174.2 (old text squids service ip) from amslvs1. Anyone still using the old IP after weeks will now be unable to reach our sites. [production]
18:56 <mark> Depooled knsq1-knsq7 in PyBal [production]
17:38 <Fred> fixed nfs mounts on Bayes. [production]
15:35 <apergos> chowned /mnt/upload6/private/ExtensionDistributor/mw-snapshot/trunk/extensions tree to extdist. ExtensionDistributor apparently working now [production]
15:01 <apergos> running svn cleanup on /mnt/upload6/private/ExtensionDistributor/mw-snapshot/trunk/extensions as extdist user [production]
12:34 <tstarling> synchronizing Wikimedia installation... Revision: 69381 [production]
12:18 <Tim> svn up/scap to r69380 [production]
05:13 <jeluf> synchronized php-1.5/wmf-config/InitialiseSettings.php '24321 - ml.wikiquote.org lost its project namespace' [production]
2010-07-14 §
23:44 <Fred> re-added ccron job to periodically save rrds on our ganglia server. (cron job seems to have vanished for some reason) [production]
17:59 <catrope> synchronized php-1.5/wmf-config/InitialiseSettings.php 'Favicon for wikimaniateamwiki per Guillaume' [production]
16:06 <Fred> restarted apache on mobile1 (had begun to return 500) [production]
14:07 <mark> Fixed memcached on srv110 [production]
12:19 <mark> Fixed ganglia and puppet on stafford [production]
11:54 <mark> Migrated DNS monitoring to puppet [production]
10:31 <mark> Migrated ZFS RAID nagios check to puppet [production]
10:14 <mark> Migrated monitoring of lucene to puppet [production]
09:37 <mark> Migrated monitoring of image scalers to puppet [production]
08:49 <Tim> using stafford for some pbuilder experimentation [production]
2010-07-13 §
22:02 <mark> Migrated monitoring of application servers to Puppet [production]
20:29 <mark> Fixed puppet on ms4 [production]
20:16 <mark> Hacked up nagios conf.php to not create host entries for most servers (now in puppet), except special cases [production]
19:58 <mark> Hacked up nagios conf.php to not create host entries [production]
16:51 <mark> Migrated Squid Nagios monitoring to puppet, commented some functionality in nagios conf.php [production]
15:51 <mark> Split puppet nagios config over multiple files [production]
2010-07-12 §
16:54 <Fred> changed LONGQUERIES check threshold [production]
16:08 <Fred> restarting morebots since it had died. [production]
16:08 <Fred> restarting Nagios since it was down. [production]
14:29 <mark> Added "cfg_file=/etc/nagios/puppet_hosts.cfg" to nagios.cfg [production]
13:25 <JeLuF> added disk space monitoring for apaches [production]
12:51 <jeluf> synchronized php-1.5/wmf-config/InitialiseSettings.php '24306 - Create namespaces for Lithuanian Wiktionary' [production]
12:48 <jeluf> synchronized php-1.5/wmf-config/InitialiseSettings.php '24321 - ml.wikiquote.org lost its project namespace' [production]
12:46 <jeluf> synchronized php-1.5/wmf-config/InitialiseSettings.php '24321 - ml.wikiquote.org lost its project namespace' [production]
12:41 <jeluf> synchronized php-1.5/wmf-config/InitialiseSettings.php '24344 - Namespace changes - si.wiktionary' [production]
11:45 <JeLuF> fixed broken ganglia-metrics installation on srv146 (chown gmetric /var/log/gmetricd/gmetricd.log) [production]
11:41 <JeLuF> added DPKG status monitoring for all app servers to nagios. Reports all packages that are not in state 'rc' or 'ii'. [production]
10:43 <JeLuF> lots of false alerts from nagios due to missing SSL setup for NRPE. Working on it. [production]
09:53 <JeLuF> changed puppet config to install nrpe on all app servers [production]
09:28 <JeLuF> replacing opsview-nrpe agents by nagios-nrpe agents (image_scalers, some other apaches). Most apaches already use nagios-nrpe [production]
07:40 <Tim> set up NRPE disk space monitoring on ms4, discovered that /mnt2 is full [production]
04:54 <Tim> updated NFS host/service groups to monitor the actual NFS servers, not a random collection of miscellaneous ex-NFS servers [production]
04:46 <Tim> installed NRPE on nfs1 and nfs2 [production]
04:08 <Tim> adding rendering, m, bits.esams, recursor0, recursor1, recursor0.esams to nagios [production]
04:02 <Tim> added forward DNS entry for recursor0.esams, modified reverse DNS entry resolver0.esams -> recursor0.esams [production]
03:55 <Tim> fixed reverse DNS entries for recursor0 and recursor1, were set incorrectly to non-existent hostnames "resolver0" and "recursor1" [production]
03:36 <Tim> renamed db6.mgmt to locke.mgmt [production]
2010-07-10 §
14:14 <rainman-sr> search7 disk was full, deleting some old unneccessary indexes [production]
12:50 <Fred> applied security updates on all machine running Karmic or Lucid (per USN-959-1) [production]
2010-07-09 §
18:07 <domas> forgot to log, rebooted locke, put startup stuff to rc.local, maybe Tim changed it afterwards, hehe. beer is good too. [production]