2016-12-22
§
|
22:11 |
<thcipriani> |
disable l10nupdate cron for deployment freeze |
[production] |
21:25 |
<ebernhardson> |
restarting elasticsearch (again) on relforge100[12] to test ltr plugin |
[production] |
21:10 |
<RoanKattouw> |
Got lots of errors like 20:49:56 Unable to find remote tracking branch/tag for /srv/mediawiki-staging/php-1.29.0-wmf.6/extensions/ParserFunctions during this scap |
[production] |
21:03 |
<catrope@tin> |
Finished scap: Sync Idf4618977f172 in the OAuth extension- (duration: 24m 16s) |
[production] |
20:39 |
<catrope@tin> |
Started scap: Sync Idf4618977f172 in the OAuth extension- |
[production] |
19:59 |
<ebernhardson> |
restarting elasticsearch (again) on relforge100[12] to test ltr plugin |
[production] |
19:22 |
<gehel> |
restart wdqs-blazegraph and wdqs-updater on wdqs1001.eqiad.wmnet (suspicious load) |
[production] |
19:18 |
<ebernhardson> |
restarting elasticsearch on relforge100[12] to test ltr plugin |
[production] |
19:18 |
<jynus> |
stopping replication on dbstore2001(s2) and db2035 for enwiktionary.templatelinks reimport |
[production] |
19:04 |
<godog> |
roll restart swift proxy on ms-fe1* to drain thumbor traffic |
[production] |
15:39 |
<jynus> |
restart dbstore2001 to change buffer pool size, testing gerrit:328671 |
[production] |
14:51 |
<elukey> |
restarting the yarn node manager java daemons on all the Hadoop worker nodes due to suspect memory leak |
[production] |
14:14 |
<elukey> |
the previous entry is missing: "on analytics1032" |
[production] |
14:13 |
<elukey> |
manually starting the yarn nodemanager after OOM |
[production] |
13:41 |
<jynus> |
stopping db1035 (depooled) replication to perform maintenance to avoid disk alerts in the next 2 weeks |
[production] |
10:02 |
<moritzm> |
installing c-ares security updates on trusty systems (jessie already fixed for quite a while) |
[production] |
10:02 |
<moritzm> |
installing c-ares security updates |
[production] |
09:02 |
<moritzm> |
installing tomcat security updates |
[production] |
08:45 |
<moritzm> |
installing libav security updates on trusty systems |
[production] |
08:18 |
<moritzm> |
installing Django security updates |
[production] |
07:26 |
<elukey> |
created /var/log/squid3/access.log.1.gz on aluminum to fix cronspam - T132324 |
[production] |
02:26 |
<l10nupdate@tin> |
ResourceLoader cache refresh completed at Thu Dec 22 02:26:23 UTC 2016 (duration 4m 49s) |
[production] |
02:21 |
<l10nupdate@tin> |
scap sync-l10n completed (1.29.0-wmf.6) (duration: 07m 54s) |
[production] |
2016-12-21
§
|
23:48 |
<mutante> |
europium - jessie reinstall done - powered down until until reclaim (T153918) |
[production] |
23:31 |
<mutante> |
europium - re-installing with jessie (T82239) |
[production] |
19:15 |
<mutante> |
public1-b-eqiad and public1-c-eqiad are configured to use install1001 as DHCP, all others still use carbon as DHCP | all subnets now use install1001 as TFTP |
[production] |
19:13 |
<mutante> |
carbon - re-enabled puppet and DHCP |
[production] |
18:13 |
<mutante> |
carbon - temp stopping dhcp server |
[production] |
15:22 |
<gehel> |
truncating /var/log/elasticsearch/relforge-eqiad_feature.log on relforge100[12] |
[production] |
15:04 |
<elukey> |
removed mongodb* packages from stat1003 after https://gerrit.wikimedia.org/r/328519 |
[production] |
14:54 |
<moritzm> |
installing ghostscript security updates on trusty hosts |
[production] |
14:29 |
<moritzm> |
installing imagemagick security updates |
[production] |
13:09 |
<moritzm> |
install hdf5 security updates |
[production] |
13:03 |
<mobrovac@tin> |
Finished deploy [parsoid/deploy@dab1f27]: Bug fix for mwApiServer T153797 (duration: 05m 32s) |
[production] |
12:57 |
<mobrovac@tin> |
Starting deploy [parsoid/deploy@dab1f27]: Bug fix for mwApiServer T153797 |
[production] |
12:46 |
<moritzm> |
install openjdk-6 security update on labsdb1006 |
[production] |
10:43 |
<jynus> |
dropping non-wiki databases from labsdb1001 |
[production] |
10:01 |
<moritzm> |
installing libgme security updates |
[production] |
09:58 |
<jynus> |
extending db1035 /srv partition |
[production] |
08:42 |
<elukey> |
restarted hhvm/jobrunner (and killed ffmpeg processes) on mw116[89] |
[production] |
08:01 |
<marostegui> |
Running optimize table on db1044 for the pagelinks tables as we urgently need some space back on that host - T153826 |
[production] |
07:20 |
<marostegui> |
Running optimize table on db1045 for the revision tables as we urgently need some space back on that host - https://phabricator.wikimedia.org/T153739 |
[production] |
04:36 |
<Niharika> |
commtech Added samwilson as project admin |
[production] |
03:03 |
<dzahn@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=mw1169.eqiad.wmnet |
[production] |
02:51 |
<mutante> |
relforge1001 has huge /var/log/elastichsearch/relforge-eqiad_feature.log that wrote GBs just today but then stopped |
[production] |
02:23 |
<mutante> |
mw1169 - reinstall done - sign new puppet cert, initial run... |
[production] |
02:20 |
<l10nupdate@tin> |
scap sync-l10n completed (1.29.0-wmf.6) (duration: 07m 40s) |
[production] |
02:12 |
<mutante> |
mw1169 - delete salt key, revoke puppet cert |
[production] |
02:06 |
<mutante> |
reinstalling mw1169 (carbon DHCP, install1001 TFTP) |
[production] |
02:02 |
<mutante> |
re-enabling DHCP and puppet |
[production] |