2017-05-03
§
|
22:59 |
<RainbowSprinkles> |
gerrit: Quick restart to pick up logging config change |
[production] |
22:47 |
<ejegg> |
updated fundraising tools from 20afe9d20a44a0dbee58f4f82e65ba5689c60de1 to f2522cdabf1741a60b7b60ac8f7ead7afd50b054 |
[production] |
22:23 |
<ejegg> |
updated fundraising tools from a1e9342e093a85032255fc1d9904db7df13680b7 to 20afe9d20a44a0dbee58f4f82e65ba5689c60de1 |
[production] |
21:06 |
<demon@naos> |
Synchronized README: No-op, forcing co-master sync (duration: 02m 28s) |
[production] |
20:35 |
<mutante> |
mw1167 - same as mw1166 (jobrunners) - there was a hhvm[12547]: Fatal error: unknown exception followed by mysql slow query, SELECT MASTER_TID_WAIT... | systemctl restart hhvm recovers it |
[production] |
20:30 |
<mutante> |
mw1166 - restart hhvm service (Fatal error: request has exceeded memory limit) |
[production] |
20:13 |
<urandom> |
T160759: restoring default tombstone thresholds, restbase10{3,4,6} |
[production] |
19:57 |
<mutante> |
mw1287 - also restarting hhvm (with systemctl restart) |
[production] |
19:56 |
<mutante> |
mw1287 - restarted crashed apache (proxy_fcgi:error) |
[production] |
19:48 |
<demon@naos> |
Finished scap: Cleaning up some unused branches, no-op (duration: 15m 13s) |
[production] |
19:33 |
<demon@naos> |
Started scap: Cleaning up some unused branches, no-op |
[production] |
19:32 |
<demon@naos> |
Pruned MediaWiki: 1.29.0-wmf.18 (duration: 00m 19s) |
[production] |
19:30 |
<demon@naos> |
Pruned MediaWiki: 1.29.0-wmf.20 [keeping static files] (duration: 00m 44s) |
[production] |
19:27 |
<ppchelko@naos> |
Finished deploy [restbase/deploy@76d909f]: Blacklist a title to fix cassandra OOMs T160759 attempt #2 - checks timeout (duration: 01m 39s) |
[production] |
19:26 |
<ppchelko@naos> |
Started deploy [restbase/deploy@76d909f]: Blacklist a title to fix cassandra OOMs T160759 attempt #2 - checks timeout |
[production] |
19:25 |
<ppchelko@naos> |
Finished deploy [restbase/deploy@76d909f]: Blacklist a title to fix cassandra OOMs T160759 (duration: 07m 39s) |
[production] |
19:18 |
<ppchelko@naos> |
Started deploy [restbase/deploy@76d909f]: Blacklist a title to fix cassandra OOMs T160759 |
[production] |
18:48 |
<papaul> |
db2084 - signing puppet certs, salt-key, initial run |
[production] |
18:48 |
<urandom> |
T160759: reducing tombstone threshold to 1000, restbase1014 |
[production] |
18:46 |
<urandom> |
T160759: reducing tombstone threshold to 1000, restbase1016 |
[production] |
18:39 |
<urandom> |
T160759: reducing tombstone threshold to 1000, restbase1013 |
[production] |
18:35 |
<urandom> |
restarting restbase1016-c |
[production] |
18:34 |
<urandom> |
restarting restbase1013-b |
[production] |
18:00 |
<bblack> |
restart cp2005 backend (lag) |
[production] |
17:33 |
<moritzm> |
uploaded openjdk-8 u131 to apt.wikimedia.org |
[production] |
17:14 |
<jynus@naos> |
Synchronized wmf-config/InitialiseSettings.php: Disable cognate- it is causing an outage on x1 (duration: 01m 06s) |
[production] |
16:30 |
<jynus@naos> |
Synchronized wmf-config/db-eqiad.php: Fine-tune per-server load to reduce db connection errors (duration: 01m 27s) |
[production] |
16:17 |
<mutante> |
install2002 / db2084 - reverting live hack, re-enabling puppet. db2084 doesnt even talk to DHCP, all other new db servers are fine, just this one out of 22 is not. seems to be actually broken NIC, cable was switched, switch config was checked too |
[production] |
16:08 |
<mutante> |
install2002 - temp stop puppet to debug dhcp issue of db2084 |
[production] |
15:13 |
<catrope@naos> |
Synchronized php-1.29.0-wmf.21/includes/logging/LogPager.php: Replace FORCE INDEX(ls_field_val) with IGNORE INDEX(ls_log_id) (https://gerrit.wikimedia.org/r/#/c/351653/ for T17441) (duration: 01m 14s) |
[production] |
15:09 |
<RoanKattouw> |
Live-hacked (cherry-picked) https://gerrit.wikimedia.org/r/#/c/351653/ onto naos and synced to mwdebug1002 for testing |
[production] |
14:54 |
<gehel> |
restart of elasticsearch on relforge |
[production] |
14:43 |
<END> |
(PASS) - Rolling restart of parsoid in codfw and eqiad - t09_restart_parsoid (switchdc/oblivian@neodymium) |
[production] |
14:27 |
<START> |
- Rolling restart of parsoid in codfw and eqiad - t09_restart_parsoid (switchdc/oblivian@neodymium) |
[production] |
14:26 |
<END> |
(PASS) - Update Tendril tree to start from the core DB masters in eqiad - t09_tendril (switchdc/oblivian@neodymium) |
[production] |
14:25 |
<START> |
- Update Tendril tree to start from the core DB masters in eqiad - t09_tendril (switchdc/oblivian@neodymium) |
[production] |
14:25 |
<godog> |
start swiftrepl on ms-fe1005 |
[production] |
14:24 |
<END> |
(PASS) - Start MediaWiki jobrunners, videoscalers and maintenance in eqiad - t09_start_maintenance (switchdc/oblivian@neodymium) |
[production] |
14:22 |
<START> |
- Start MediaWiki jobrunners, videoscalers and maintenance in eqiad - t09_start_maintenance (switchdc/oblivian@neodymium) |
[production] |
14:21 |
<END> |
(PASS) - Restore the TTL of all the MediaWiki read-write discovery records and cleanup confd stale files - t09_restore_ttl (switchdc/oblivian@neodymium) |
[production] |
14:21 |
<START> |
- Restore the TTL of all the MediaWiki read-write discovery records and cleanup confd stale files - t09_restore_ttl (switchdc/oblivian@neodymium) |
[production] |
14:20 |
<END> |
(PASS) - Set MediaWiki in read-write mode in eqiad (db-eqiad config already merged and git pulled) - t08_stop_mediawiki_readonly (switchdc/oblivian@neodymium) |
[production] |
14:20 |
<MediaWiki> |
read-only period ends at: 2017-05-03 14:20:28.286697 (switchdc/oblivian@neodymium) |
[production] |
14:20 |
<root@naos> |
Synchronized wmf-config/db-eqiad.php: Set MediaWiki in read-write mode in datacenter eqiad (duration: 00m 32s) |
[production] |
14:19 |
<START> |
- Set MediaWiki in read-write mode in eqiad (db-eqiad config already merged and git pulled) - t08_stop_mediawiki_readonly (switchdc/oblivian@neodymium) |
[production] |