2016-12-22
§
|
14:14 |
<elukey> |
the previous entry is missing: "on analytics1032" |
[production] |
14:13 |
<elukey> |
manually starting the yarn nodemanager after OOM |
[production] |
13:41 |
<jynus> |
stopping db1035 (depooled) replication to perform maintenance to avoid disk alerts in the next 2 weeks |
[production] |
10:02 |
<moritzm> |
installing c-ares security updates on trusty systems (jessie already fixed for quite a while) |
[production] |
10:02 |
<moritzm> |
installing c-ares security updates |
[production] |
09:02 |
<moritzm> |
installing tomcat security updates |
[production] |
08:45 |
<moritzm> |
installing libav security updates on trusty systems |
[production] |
08:18 |
<moritzm> |
installing Django security updates |
[production] |
07:26 |
<elukey> |
created /var/log/squid3/access.log.1.gz on aluminum to fix cronspam - T132324 |
[production] |
02:26 |
<l10nupdate@tin> |
ResourceLoader cache refresh completed at Thu Dec 22 02:26:23 UTC 2016 (duration 4m 49s) |
[production] |
02:21 |
<l10nupdate@tin> |
scap sync-l10n completed (1.29.0-wmf.6) (duration: 07m 54s) |
[production] |
2016-12-21
§
|
23:48 |
<mutante> |
europium - jessie reinstall done - powered down until until reclaim (T153918) |
[production] |
23:31 |
<mutante> |
europium - re-installing with jessie (T82239) |
[production] |
19:15 |
<mutante> |
public1-b-eqiad and public1-c-eqiad are configured to use install1001 as DHCP, all others still use carbon as DHCP | all subnets now use install1001 as TFTP |
[production] |
19:13 |
<mutante> |
carbon - re-enabled puppet and DHCP |
[production] |
18:13 |
<mutante> |
carbon - temp stopping dhcp server |
[production] |
15:22 |
<gehel> |
truncating /var/log/elasticsearch/relforge-eqiad_feature.log on relforge100[12] |
[production] |
15:04 |
<elukey> |
removed mongodb* packages from stat1003 after https://gerrit.wikimedia.org/r/328519 |
[production] |
14:54 |
<moritzm> |
installing ghostscript security updates on trusty hosts |
[production] |
14:29 |
<moritzm> |
installing imagemagick security updates |
[production] |
13:09 |
<moritzm> |
install hdf5 security updates |
[production] |
13:03 |
<mobrovac@tin> |
Finished deploy [parsoid/deploy@dab1f27]: Bug fix for mwApiServer T153797 (duration: 05m 32s) |
[production] |
12:57 |
<mobrovac@tin> |
Starting deploy [parsoid/deploy@dab1f27]: Bug fix for mwApiServer T153797 |
[production] |
12:46 |
<moritzm> |
install openjdk-6 security update on labsdb1006 |
[production] |
10:43 |
<jynus> |
dropping non-wiki databases from labsdb1001 |
[production] |
10:01 |
<moritzm> |
installing libgme security updates |
[production] |
09:58 |
<jynus> |
extending db1035 /srv partition |
[production] |
08:42 |
<elukey> |
restarted hhvm/jobrunner (and killed ffmpeg processes) on mw116[89] |
[production] |
08:01 |
<marostegui> |
Running optimize table on db1044 for the pagelinks tables as we urgently need some space back on that host - T153826 |
[production] |
07:20 |
<marostegui> |
Running optimize table on db1045 for the revision tables as we urgently need some space back on that host - https://phabricator.wikimedia.org/T153739 |
[production] |
04:36 |
<Niharika> |
commtech Added samwilson as project admin |
[production] |
03:03 |
<dzahn@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=mw1169.eqiad.wmnet |
[production] |
02:51 |
<mutante> |
relforge1001 has huge /var/log/elastichsearch/relforge-eqiad_feature.log that wrote GBs just today but then stopped |
[production] |
02:23 |
<mutante> |
mw1169 - reinstall done - sign new puppet cert, initial run... |
[production] |
02:20 |
<l10nupdate@tin> |
scap sync-l10n completed (1.29.0-wmf.6) (duration: 07m 40s) |
[production] |
02:12 |
<mutante> |
mw1169 - delete salt key, revoke puppet cert |
[production] |
02:06 |
<mutante> |
reinstalling mw1169 (carbon DHCP, install1001 TFTP) |
[production] |
02:02 |
<mutante> |
re-enabling DHCP and puppet |
[production] |
01:49 |
<mutante> |
carbon - temp stop DHCP service to test install from install1001 |
[production] |
01:47 |
<dzahn@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=mw1169.eqiad.wmnet |
[production] |
01:47 |
<mutante> |
mw1169 - schedule 2 hours downtime - boot for reinstall shortly |
[production] |
01:11 |
<dzahn@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=mw1168.eqiad.wmnet |
[production] |
01:09 |
<mutante> |
mw1168 - remove old salt key, accept new salt key, start minion |
[production] |