2014-09-04
§
|
13:54 |
<_joe_> |
stopped puppet on the appservers but mw03, testing an apache change |
[releng] |
05:28 |
<legoktm> |
stopping jobrunner on deployment-jobrunner01 |
[releng] |
05:22 |
<legoktm> |
restarted jobrunner on deployment-jobrunner01 |
[releng] |
05:14 |
<bd808> |
Bad jobs in job queue filled up /var on jobrunner01 and killed jobrunner script. Leaving down for now until I find out how to delete the bad jobs. |
[releng] |
01:41 |
<bd808> |
Killed old jobs-loop.sh processes on deployment-jobrunner01 |
[releng] |
01:24 |
<bd808> |
Many jobrunner errors like "wikiversions-labs.cdb has no version entry for `amwiki`" with various wiki names |
[releng] |
01:23 |
<bd808|AWAY> |
Started jobrunner service manually on jobrunner01. |
[releng] |
00:44 |
<bd808> |
Puppet run on deployment-jobrunner01 failing with what seem to be dns issues (getaddrinfo: Name or service not known when Trebuchet is running) |
[releng] |
00:35 |
<bd808> |
Puppet run on deployment-jobrunner01 failing with what seem to be dns issues (getaddrinfo: Name or service not known) |
[releng] |
2014-08-27
§
|
23:03 |
<hashar> |
Blacklisting the security audit IP again on deployment-cache bits01 mobile03 and text02 |
[releng] |
22:53 |
<hashar> |
removed the blackhole ip route from deployment-cache-text02 and deployment-cache-mobile03 |
[releng] |
22:48 |
<hashar> |
the IP is a known security audit. See Chris Steipp. |
[releng] |
22:46 |
<hashar> |
blackholed an IP address on deployment-cache-text02 and deployment-cache-mobile03 , it was causing hundred of requests per seconds and overloaded the beta cluster. Use route -n to find the IP |
[releng] |
22:37 |
<hashar> |
restarting udp2log-mw on deployment-bastion. It keeps crashing since fiarly recently |
[releng] |
22:26 |
<bd808> |
when restarting varnish on deployment-cache-text02, don't forget that there are 2 varnish services (varnish and varnish-frontend) |
[releng] |
22:19 |
<bd808> |
restarted varnish (again) on deployment-cache-text02 |
[releng] |
22:10 |
<bd808> |
restarted varnish on deployment-cache-text02 |
[releng] |
16:22 |
<bd808> |
killing `apt-get update` process running on deployment-bastion since Jun13 |
[releng] |
14:59 |
<bd808> |
Resolved puppet git merge conflict on deployment-salt |
[releng] |
14:49 |
<bd808> |
Moved hhvm core dumps to /data/project/hhvm-cores |
[releng] |
14:42 |
<bd808> |
Root dirve full on deployment-mediawiki02; hhvm core files are the culprit |
[releng] |
2014-08-25
§
|
23:47 |
<ori> |
stopping hhvm/apache on deployment-mediawiki02 to replace debug build of hhvm with release build |
[releng] |
21:44 |
<bd808> |
Deployed scap 116027f (Make sync-common update l10n cdb files by default) |
[releng] |
18:30 |
<ori> |
deployment-mediawiki02: cleared /tmp; running puppet |
[releng] |
15:05 |
<hashar> |
mediawiki02 rm /tmp/hhvm*.core . Filled as {{bug|69979}} |
[releng] |
15:01 |
<hashar> |
mediawiki02 rm /tmp/mw-cache-master/conf* |
[releng] |
15:01 |
<hashar> |
mediawiki02 has mw conf caches under /tmp/mw-cache-master/ and since that partition is filled up, that ends up with conf caches being null file |
[releng] |
15:00 |
<hashar> |
mediawiki02 rm /var/log/upstart/hhvm* |
[releng] |
14:53 |
<hashar> |
mediawiki02 : removed /var/lib/puppet/state/agent_catalog_run.lock |
[releng] |
14:46 |
<hashar> |
restarting udp2log-mw service on -bastion. It is stalled for some reason |
[releng] |
14:42 |
<hashar> |
on mediawiki02 , clearing out some /var/log/upstart/hhvm.* log files see {{bug|69976}} |
[releng] |
14:34 |
<hashar> |
mediawiki02 / partition is 100% full |
[releng] |