2012-02-13
§
|
02:32 |
<Tim> |
on kaulen: increased MaxClients to 500 to better deal with the connection flood |
[production] |
02:23 |
<Tim> |
bugzilla is mostly working now, although it's very slow. The DDoS requests are blocked after connection setup using <Location> |
[production] |
02:21 |
<Tim> |
on kaulen: restored MaxClients |
[production] |
02:17 |
<LocalisationUpdate> |
completed (1.18) at Mon Feb 13 02:17:50 UTC 2012 |
[production] |
01:46 |
<Tim> |
temporarily moved bugzilla to port 444 until the connection flood (~1k req/s) subsides |
[production] |
01:15 |
<Tim> |
started apache with MaxClients=30 |
[production] |
00:59 |
<Tim> |
after kaulen came back up, it was immediately overloaded with jsonrpc.cgi. Stopped apache. |
[production] |
00:54 |
<Tim> |
kaulen is not responding on ssh, web down, rebooting |
[production] |
2012-02-12
§
|
12:09 |
<mark> |
Killed lsearchd processes on search8, restarted |
[production] |
12:07 |
<mark> |
Rebalanced mw API app servers from load 120 to 150 in pybal list |
[production] |
10:08 |
<mark> |
Increased MaxClients to 100 on API apaches in Puppet |
[production] |
09:45 |
<mark> |
Restricted only opensearch API requests to the API squids |
[production] |
09:43 |
<mark> |
Restricted only opensearch API requests to the API backend apaches, other API requests now hit the main mediawiki cluster |
[production] |
08:44 |
<mark> |
maximum_forwards change deployed to all squids |
[production] |
08:42 |
<mark> |
Set maximum_forwards 2 in squid.conf, deployed to the API squids only so far, rest is pending |
[production] |
07:52 |
<binasher> |
restarted lsearchd on search{3,4,9} |
[production] |
02:19 |
<LocalisationUpdate> |
completed (1.18) at Sun Feb 12 02:19:17 UTC 2012 |
[production] |
2012-02-11
§
|
20:31 |
<apergos> |
restarted lightty on dataset2 |
[production] |
17:28 |
<RobH> |
manual test of each affected service complete, db9 fully online. |
[production] |
17:26 |
<RobH> |
db9 moved, all systems online |
[production] |
17:08 |
<RobH> |
db9 shutting down to move racks, offline during this includes: blogs, bugzilla, racktables, rt, survey, etherpad, observium |
[production] |
02:18 |
<LocalisationUpdate> |
completed (1.18) at Sat Feb 11 02:18:36 UTC 2012 |
[production] |
00:17 |
<reedy> |
synchronizing Wikimedia installation... : |
[production] |
2012-02-10
§
|
22:17 |
<LeslieCarr> |
fixing the labs apache2 puppet groups |
[production] |
21:48 |
<RobH> |
memory in cp1017 wasnt properly seated as far as i can tell, if it doesnt mess up again it should be ok. |
[production] |
21:41 |
<RobH> |
cp1017 being tested for bad memory |
[production] |
21:36 |
<RobH> |
powercycling msw-a2-eqiad resolves all mgmt issues in rack |
[production] |
21:34 |
<RobH> |
powercycling msw-a1-eqiad. |
[production] |
21:29 |
<RobH> |
db1001 rebooting, locked up |
[production] |
20:53 |
<RobH> |
updating dns for new db hosts |
[production] |
19:59 |
<Reedy> |
Checking out 1.19wmf1 to /tmp on fenari |
[production] |
19:12 |
<RobH> |
oxygen setup and installed per rt2343, still needs puppet runs and full deployment per rt 2430 |
[production] |
17:58 |
<RobH> |
updating dns for oxygen internal ip |
[production] |
17:21 |
<mutante> |
labs logging is broken |
[production] |
17:14 |
<RobH> |
oxygen offline for hard disk upgrade to replace locke |
[production] |
16:50 |
<mutante> |
running sync-apache, trying to redirect office.wm to https |
[production] |
16:07 |
<mark> |
Rebalanced appserver load balancing by giving the new mw* pmtpa app servers weight 150 in the pybal server list |
[production] |
15:17 |
<mark> |
Turned on KeepAlive on apaches for better miss service times from eqiad |
[production] |
13:42 |
<mark> |
Configured cp1001 and cp1020 to contact backend servers directly instead of via pmtpa squids |
[production] |
12:02 |
<mark> |
Decommissioning sq38, sq46 and sq47 in squid configurator |
[production] |
11:50 |
<mark> |
Making cp1001-1005 API squids |
[production] |
05:08 |
<maplebed> |
deployed squid config to uploads to send 100% of thumbnail traffic to swift |
[production] |
02:49 |
<maplebed> |
deploying fix for & bug with swift (files with an & in the name wouldn't load properly) |
[production] |
02:18 |
<LocalisationUpdate> |
completed (1.18) at Fri Feb 10 02:18:37 UTC 2012 |
[production] |
00:22 |
<LeslieCarr> |
increased nagios max concurrent checks on spence and lowered the interval between processing them |
[production] |
00:20 |
<maplebed> |
deployed squid config to upload squids rolling thumbnails back to 75% handled by swift to test the & bug |
[production] |