2016-11-27
§
|
21:47 |
<legoktm> |
created wmf/1.29.0-wmf.3 branch pointing at master for mediawiki/extensions/ElectronPdfService to workaround T151725 |
[production] |
09:35 |
<elukey> |
removed all the files not used in /tmp on stat1002 after a follow up with the owner |
[production] |
06:20 |
<ori@tin> |
Synchronized php-1.29.0-wmf.3/api.php: Bandaid: make API reqs fail fast if User-Agent ~= Parsoid and Host ~= eu.wikipedia.org (duration: 00m 50s) |
[production] |
05:36 |
<ori> |
Commented-out lived-hack from mw1290; if we see memory growth now, Parsoid would be strongly implicated. |
[production] |
05:33 |
<ori> |
With Parsoid requests hacked to fail fast, mw1290 is not showing the kind of aggressive growth in memory usage we're seeing on other API servers |
[production] |
05:30 |
<godog> |
roll restarting hhvm across api_cluster when hhvm uses more than 40% of memory |
[production] |
05:29 |
<bd808> |
Fixed long standing pagination bug and upgraded PHP libraries |
[tools.sal] |
05:21 |
<ori> |
Live-hacked api.php on mw1290 to die if request user-agent contains 'Parsoid'; restarted HHVM. |
[production] |
05:17 |
<godog> |
roll restarting hhvm across api_cluster when hhvm uses more than 40% of memory |
[production] |
04:57 |
<godog> |
roll-restart hhvm on api_appcluster for on machines with hhvm leaking memory |
[production] |
03:22 |
<godog> |
roll-restart hhvm across api_appserver |
[production] |
02:41 |
<godog> |
dumping hhvm backtraces and roll-restart on affected api machines |
[production] |
02:00 |
<l10nupdate@tin> |
LocalisationUpdate failed: git pull of core failed |
[production] |
2016-11-26
§
|
16:15 |
<Reedy> |
killed /srv/jenkins-workspace/workspace/mediawiki-core-*/src and /srv/jenkins-workspace/workspace/mwext-*/src from integration slaves to get rid of borked MW dirs |
[releng] |
15:51 |
<Reedy> |
deleted /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage/src on integration-slave-trusty-1006 to force a reclone |
[releng] |
15:35 |
<elukey> |
deleted tmp files on stat1002's /tmp partition because of disk space consumption. Will follow up with the owner. |
[production] |
14:14 |
<Reedy> |
moved old /srv/mediawiki-staging/php-master to /tmp/php-master, recloned MW Core, copied in LocalSettings, skins, vendor and extensions. T151676. scap sync-dir running |
[releng] |
13:36 |
<Krenair> |
ran refreshLinks on angwiki for T151584, it ran into issues with the EventBus extension at the links tables step |
[production] |
13:05 |
<Reedy> |
marked deployment-tin as offline due to T151670 |
[releng] |
12:29 |
<volans> |
manually fixed the checkout of mediawiki core on stat1002 and stat1003 that was causing Puppet failing |
[production] |
02:22 |
<l10nupdate@tin> |
ResourceLoader cache refresh completed at Sat Nov 26 02:22:26 UTC 2016 (duration 4m 18s) |
[production] |
02:18 |
<l10nupdate@tin> |
scap sync-l10n completed (1.29.0-wmf.3) (duration: 06m 28s) |
[production] |
2016-11-25
§
|
20:09 |
<Krinkle> |
mwscript deleteEqualMessages.php --wiki angwiki (T45917) |
[production] |
17:15 |
<jynus> |
drop database vewikimedia (deleted wiki) from sanitarium and its slaves |
[production] |
14:22 |
<Reedy> |
delete oathauth row on wikitech for user Liuxinyu970226 per T144805 |
[production] |
14:16 |
<Reedy> |
delete oathauth row on wikitech for user Shoichi per T144805 |
[production] |
11:05 |
<ema> |
uploaded libvmod-{netmapper,tbf,vslp} to carbon main component (T150660) |
[production] |
10:20 |
<_joe_> |
upgrading HHVM across codfw |
[production] |
09:23 |
<_joe_> |
upgraded hhvm on the debug hosts |
[production] |
09:16 |
<elukey> |
resumed oozie bundles and camus crontab after maintenance |
[analytics] |
08:58 |
<_joe_> |
uploading hhvm_3.12.7+dfsg-1+wmf4 to apt |
[production] |
08:53 |
<volans> |
restarting zotero on sca1003, almost out of RAM, puppet failing |
[production] |
08:52 |
<elukey> |
restarting Yarn and HDFS masters on analytics100[12] (Hadoop cluster) to complete the openjdk update |
[production] |
08:49 |
<elukey> |
stopping oozie and camus as prep-step for Yarn/HDFS master failover (remaining hosts with old openjdk) |
[analytics] |
07:51 |
<marostegui> |
Stopping replication db1052 for maintenance - T151607 |
[production] |
02:22 |
<l10nupdate@tin> |
ResourceLoader cache refresh completed at Fri Nov 25 02:22:40 UTC 2016 (duration 4m 20s) |
[production] |
02:18 |
<l10nupdate@tin> |
scap sync-l10n completed (1.29.0-wmf.3) (duration: 06m 48s) |
[production] |
2016-11-24
§
|
20:49 |
<hashar> |
make contint1001 Jenkins slave to only builds jobs with a label matching the node https://integration.wikimedia.org/ci/computer/contint1001/configure T86659 |
[releng] |
17:25 |
<_joe_> |
turned off additional workers for htmlcacheupdate on commonswiki as the queue has reduced to acceptable sizes (T151196) |
[production] |
16:54 |
<mafk> |
Removed 2 users from tools.stewardbots. Users with access: 12. |
[tools.stewardbots] |
15:46 |
<elukey> |
removing https://gerrit.wikimedia.org/r/#/c/322268/ from the list of cherry picks on puppet master since it is not the right way to go |
[releng] |
15:03 |
<ema> |
uploaded varnish 4.1.3-1wm4 to carbon main component, replacing version 3.0.6plus-wm9 (T150660) |
[production] |
14:47 |
<ema> |
uploaded varnishkafka 1.0.12-1 to carbon main component, replacing version 1.0.7-1 (T150660) |
[production] |
13:31 |
<akosiaris> |
balance the load between thumbor1001 and thumbor1002 evenly |
[production] |
13:31 |
<akosiaris@puppetmaster1001> |
conftool action : set/weight=10; selector: thumbor1001.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=thumbor', 'service=thumbor']) |
[production] |
13:20 |
<akosiaris@puppetmaster1001> |
conftool action : set/weight=5; selector: thumbor1001.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=thumbor', 'service=thumbor']) |
[production] |