2017-05-03
§
|
14:01 |
<START> |
- Stop MediaWiki jobrunners, videoscalers and cronjobs in codfw - t01_stop_maintenance (switchdc/oblivian@neodymium) |
[production] |
14:00 |
<godog> |
stop swiftrepl on ms-fe1005 |
[production] |
13:59 |
<END> |
(PASS) - Reduce the TTL of all the MediaWiki read-write discovery records - t00_reduce_ttl (switchdc/oblivian@neodymium) |
[production] |
13:59 |
<START> |
- Reduce the TTL of all the MediaWiki read-write discovery records - t00_reduce_ttl (switchdc/oblivian@neodymium) |
[production] |
13:59 |
<END> |
(PASS) - Disabling puppet on selected hosts in codfw and eqiad - t00_disable_puppet (switchdc/oblivian@neodymium) |
[production] |
13:58 |
<START> |
- Disabling puppet on selected hosts in codfw and eqiad - t00_disable_puppet (switchdc/oblivian@neodymium) |
[production] |
13:16 |
<hashar> |
Restarting Jenkins |
[production] |
13:06 |
<marostegui> |
db1028: Increased /srv/ by 20G to clear the warning |
[production] |
11:59 |
<moritzm> |
rebooted kubernetes1002, not 1003 |
[production] |
11:59 |
<moritzm> |
rebooting kubernetes1003 for update to Linux 4.9 |
[production] |
11:39 |
<moritzm> |
rebooting kubernetes1001 for update to Linux 4.9 |
[production] |
11:37 |
<oblivian@naos> |
Synchronized wmf-config: Changing the read-only reason for the DC switchover (T164177) (duration: 01m 20s) |
[production] |
11:25 |
<moritzm> |
uploaded nodepool 0.1.1+wmf7 to apt.wikimedia.org |
[production] |
11:23 |
<hashar> |
Upgrading Jenkins 2.46.1 -> 2.46.2 - T144106 |
[production] |
11:16 |
<jynus> |
restarting replication on s*, and x1 eqiad -> codfw |
[production] |
11:02 |
<hashar> |
Restarting Nodepool |
[production] |
10:58 |
<moritzm> |
upgrading nodepool on labnodepool1001 to a package including https://gerrit.wikimedia.org/r/351608 |
[production] |
10:18 |
<END> |
(PASS) - Switch MediaWiki master datacenter and read-write discovery records from eqiad to codfw - t05_switch_datacenter (switchdc/oblivian@neodymium) |
[production] |
10:17 |
<START> |
- Switch MediaWiki master datacenter and read-write discovery records from eqiad to codfw - t05_switch_datacenter (switchdc/oblivian@neodymium) |
[production] |
10:14 |
<END> |
(PASS) - Set MediaWiki in read-write mode in codfw (db-codfw config already merged and git pulled) - t08_stop_mediawiki_readonly (switchdc/oblivian@neodymium) |
[production] |
10:14 |
<START> |
- Set MediaWiki in read-write mode in codfw (db-codfw config already merged and git pulled) - t08_stop_mediawiki_readonly (switchdc/oblivian@neodymium) |
[production] |
10:14 |
<END> |
(PASS) - Set MediaWiki in read-only mode in eqiad (db-eqiad config already merged and git pulled) - t02_start_mediawiki_readonly (switchdc/oblivian@neodymium) |
[production] |
10:13 |
<START> |
- Set MediaWiki in read-only mode in eqiad (db-eqiad config already merged and git pulled) - t02_start_mediawiki_readonly (switchdc/oblivian@neodymium) |
[production] |
10:13 |
<_joe_> |
testing reverted steps of switchdc, non-dry-run --dc-from eqiad --dc-to codfw (should be noop) |
[production] |
10:05 |
<moritzm> |
installing icu security updates on trusty (jessie already fixed) |
[production] |
09:50 |
<marostegui> |
Restart db1097 to change its binlog to STATEMENT - T155099 |
[production] |
09:19 |
<elukey> |
reboot mc[1019-1036].eqiad.wmnet for kernel upgrades |
[production] |
09:18 |
<moritzm> |
rebooting restbase1018 for update to Linux 4.9 |
[production] |
09:05 |
<godog> |
rebuild mismounted FSes on ms-be1035 - T163673 |
[production] |
08:53 |
<_joe_> |
rebooting restbase1018 T163280 |
[production] |
08:24 |
<_joe_> |
deactivating restbase1018-vg for RAID failover and rebuild T163280 |
[production] |
08:01 |
<hashar> |
Rolling back Jenkins 2.46.2 -> 2.46.1 - T144106 |
[production] |
07:53 |
<hashar> |
Upgrading Jenkins 2.46.1 -> 2.46.2 - T144106 |
[production] |
07:42 |
<_joe_> |
rebuilding RAIDs on restbase1018 T163280 |
[production] |
07:35 |
<hashar> |
Restarting Nodepool to catch up with python-jenkins 0.4.14 |
[production] |
07:35 |
<moritzm> |
updated python-jenkins on labnodepool1001 to 0.4.14 (needed by latest Jenkins LTS) |
[production] |
02:48 |
<l10nupdate@naos> |
ResourceLoader cache refresh completed at Wed May 3 02:48:33 UTC 2017 (duration 5m 21s) |
[production] |
02:43 |
<l10nupdate@naos> |
scap sync-l10n completed (1.29.0-wmf.21) (duration: 14m 02s) |
[production] |
01:41 |
<mutante> |
kubernetes - puppet fails because "E: Unable to locate package cni |
[production] |
2017-05-02
§
|
23:42 |
<TimStarling> |
EtcdConfig changes all reverted |
[production] |
23:17 |
<tstarling@puppetmaster1001> |
conftool action : set/@read-only.yaml; selector: name=ReadOnly,scope=eqiad |
[production] |
23:07 |
<TimStarling> |
scap pull on mw2017 and mwdebug1001 for etcd testing |
[production] |
23:00 |
<TimStarling> |
locking scap on naos for deployment of EtcdConfig https://gerrit.wikimedia.org/r/#/c/351132/ |
[production] |
22:57 |
<_joe_> |
upgrading python-conftool across the fleet |
[production] |
22:38 |
<mutante> |
gerrit (cobalt/gerrit2001) - deployed firewall change to allow ssh between gerrit servers for clustering, new iptables rules exist now (T152525) |
[production] |
21:52 |
<jynus> |
running previously failed alter tables on s3-eqiad T163912 |
[production] |
21:33 |
<jynus> |
creating missing math table on bdwikimedia (s3) |
[production] |
20:04 |
<hashar> |
Restarting Jenkins for plugin rollback |
[production] |
17:51 |
<bblack> |
codfw->eqiad switchback: end-user edge traffic back to normal @ eqiad ( https://gerrit.wikimedia.org/r/#/c/351330/ ) - 10 minute TTL for bulk traffic pattern shift starts now. |
[production] |
17:50 |
<mobrovac@naos> |
Finished deploy [restbase/deploy@6adb0f2]: Include displaytitle and page_id in the summary output and bump the content type version - T163729 T164079 (duration: 06m 04s) |
[production] |