2016-07-25
§
|
18:31 |
<ottomata> |
upgrading kafka to 0.9 in main-codfw, first kafka2001 then 2002 |
[production] |
18:15 |
<mutante> |
ytterbium - revoke puppet cert, delete salt-key, remove from icinga |
[production] |
16:16 |
<urandom> |
T134016: Restarting Cassandra to apply stream timeout (restbase1013-b.eqiad.wmnet) |
[production] |
16:10 |
<urandom> |
T134016: Restarting Cassandra to apply stream timeout (restbase1013-a.eqiad.wmnet) |
[production] |
16:06 |
<urandom> |
T140825, T134016: Restarting Cassandra to apply stream timeout, and disable trickle_fsync (restbase1012-c.eqiad.wmnet) |
[production] |
16:02 |
<urandom> |
T140825, T134016: Restarting Cassandra to apply stream timeout, and disable trickle_fsync (restbase1012-b.eqiad.wmnet) |
[production] |
15:54 |
<urandom> |
T140825, T134016: Reststarting Cassandra to apply stream timeout, and disable trickle_fsync (restbase1012-a.eqiad.wmnet) |
[production] |
15:53 |
<urandom> |
T140825: Setting vm.dirty_background_bytes=24M on restbase1012.eqiad.wmnet |
[production] |
15:43 |
<urandom> |
T140825, T134016: Reststarting Cassandra to apply stream timeout, and 8MB trickle_fsync (restbase1008-c.eqiad.wmnet) |
[production] |
15:39 |
<urandom> |
T140825, T134016: Reststarting Cassandra to apply stream timeout, and 8MB trickle_fsync (restbase1008-b.eqiad.wmnet) |
[production] |
15:34 |
<urandom> |
T140825, T134016: Reststarting Cassandra to apply stream timeout, and 8MB trickle_fsync (restbase1008-a.eqiad.wmnet) |
[production] |
15:28 |
<elukey> |
Standardized the jmxtrans GC metric names to pick up automatically variations in settings. This introduces metric name changes in Hadoop, Zookeeper, Kafka. (https://gerrit.wikimedia.org/r/#/c/299118/) |
[production] |
12:53 |
<moritzm> |
installing squid security updates |
[production] |
10:10 |
<_joe_> |
remove spurious puppet facts |
[production] |
10:10 |
<_joe_> |
remove spurious puppet facts |
[production] |
10:04 |
<moritzm> |
installing Django security updates |
[production] |
09:18 |
<godog> |
swift eqiad-prod: ms-be102[3456] weight 1500 |
[production] |
03:26 |
<hashar> |
scandium: migrating zuul-merger repos from lead to gerrit.wikimedia.org: find /srv/ssd/zuul/git -path '*/.git/config' -print -execdir sed -i -e 's/lead.wikimedia.org/gerrit.wikimedia.org/' config \; |
[production] |
02:28 |
<l10nupdate@tin> |
ResourceLoader cache refresh completed at Mon Jul 25 02:28:21 UTC 2016 (duration 5m 52s) |
[production] |
02:22 |
<mwdeploy@tin> |
scap sync-l10n completed (1.28.0-wmf.11) (duration: 09m 09s) |
[production] |
02:03 |
<ostriches> |
gerrit: reindexing lucene now that we have new data. searches/dashboards may look a tad weird for a bit |
[production] |
01:53 |
<hashar> |
starting Zuul |
[production] |
01:51 |
<mutante> |
restarted grrrit-wm |
[production] |
01:39 |
<ostriches> |
lead: turning puppet back on, here we go |
[production] |
01:38 |
<jynus> |
m2 replication on db2011 stopped, master binlog pos: db1020-bin.000968:1013334195 |
[production] |
01:37 |
<hashar> |
scandium: restarted zuul-merger |
[production] |
01:36 |
<ostriches> |
ytterbium: Stopped puppet, stopped gerrit process. |
[production] |
01:34 |
<mutante> |
switched gerrit-new to gerrit in DNS |
[production] |
01:30 |
<ostriches> |
lead: stopped puppet for a few minutes |
[production] |
01:17 |
<hashar> |
scandium: migrating zuul-merger repos to lead find /srv/ssd/zuul/git -path '*/.git/config' -print -execdir sed -i -e 's/ytterbium.wikimedia.org/lead.wikimedia.org/' config \; |
[production] |
01:10 |
<hashar> |
stopping CI |
[production] |
01:09 |
<jynus> |
reviewdb backup finished, available on db1020:/srv/tmp/2016-07-25_00-54-31/ |
[production] |
01:02 |
<ostriches> |
rsyncing latest git data from ytterbium to lead |
[production] |
00:57 |
<mutante> |
manually deleted reviewer-counts cron from gerrit2 user, runs as root and puppet does not remove crons unless ensure=>absent |
[production] |
00:55 |
<jynus> |
starting hot backup of db1020's reviewdb |
[production] |
2016-07-23
§
|
15:38 |
<godog> |
stop swift in esams test cluster, lots of logging from there |
[production] |
15:37 |
<godog> |
lithium sudo lvextend --size +10G -r /dev/mapper/lithium--vg-syslog |
[production] |
04:58 |
<ori> |
Gerrit is back up after service restart; was unavailable between ~ 04:29 - 04:57 UTC |
[production] |
04:56 |
<ori> |
Restarting Gerrit on ytterbium |
[production] |
04:48 |
<ori> |
Users report Gerrit is down; on ytterbium java is occupying two cores at 100% |
[production] |
03:48 |
<chasemp> |
gnt-instance reboot seaborgium.wikimedia.org |
[production] |
02:26 |
<l10nupdate@tin> |
ResourceLoader cache refresh completed at Sat Jul 23 02:26:49 UTC 2016 (duration 5m 41s) |
[production] |
02:21 |
<mwdeploy@tin> |
scap sync-l10n completed (1.28.0-wmf.11) (duration: 08m 24s) |
[production] |
01:02 |
<tgr@tin> |
Synchronized php-1.28.0-wmf.11/extensions/CentralAuth/includes/CentralAuthPlugin.php: T141160 (duration: 00m 29s) |
[production] |
01:01 |
<tgr@tin> |
Synchronized php-1.28.0-wmf.11/extensions/CentralAuth/includes/CentralAuthHooks.php: T141160 (duration: 00m 27s) |
[production] |
01:00 |
<tgr@tin> |
Synchronized php-1.28.0-wmf.11/extensions/CentralAuth/includes/CentralAuthPrimaryAuthenticationProvider.php: T141160 (duration: 00m 28s) |
[production] |
00:37 |
<tgr> |
doing an emergency deploy of https://gerrit.wikimedia.org/r/#/c/300679 for T141160, creates dozens of new users per hour to be unattached on loginwiki which probably has weird consequences |
[production] |