2019-08-13
ยง
|
15:39 |
<bblack> |
puppet re-enabled on lvs1014, lvs1016, icinga1001 |
[production] |
15:35 |
<XioNoX> |
depool eqsin for cr2-eqsin upgrade |
[production] |
15:32 |
<bblack> |
disabled pupped on lvs1014, lvs1016, icinga1001 ahead of deploying https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/528885/ - T229621 |
[production] |
15:32 |
<gehel@cumin2001> |
START - Cookbook sre.elasticsearch.rolling-reboot |
[production] |
15:30 |
<XioNoX> |
rollback ospf + bgp changes on cr2-eqord |
[production] |
15:19 |
<XioNoX> |
restart cr2-eqord - T227886 |
[production] |
15:12 |
<XioNoX> |
disable all peering and transit on cr2-eqord |
[production] |
15:01 |
<XioNoX> |
increase ospf cost of cr2-eqord<->cr2-eqiad link (+1000) |
[production] |
14:57 |
<ema> |
cp5002: reboot for kernel upgrade |
[production] |
14:42 |
<gehel@cumin2001> |
END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99) |
[production] |
14:42 |
<gehel@cumin2001> |
START - Cookbook sre.elasticsearch.rolling-reboot |
[production] |
14:31 |
<gehel@cumin2001> |
END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99) |
[production] |
14:31 |
<gehel@cumin2001> |
START - Cookbook sre.elasticsearch.rolling-reboot |
[production] |
14:29 |
<XioNoX> |
rollback: disable all peering and transit on cr2-eqdfw |
[production] |
14:18 |
<XioNoX> |
reboot cr2-eqdfw for software upgrade - T227886 |
[production] |
14:14 |
<XioNoX> |
disable all peering and transit on cr2-eqdfw |
[production] |
14:04 |
<volans@cumin2001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) |
[production] |
14:04 |
<volans@cumin2001> |
START - Cookbook sre.hosts.decommission |
[production] |
13:20 |
<jbond42> |
rolling update of postgresql-9.6 |
[production] |
13:07 |
<jijiki> |
rolling restart hhvm on api servers in eqiad |
[production] |
12:57 |
<jijiki> |
Restart hhvm on mw1235 |
[production] |
12:17 |
<fsero@puppetmaster1001> |
conftool action : set/pooled=true; selector: dnsdisc=sessionstore|citoid|cxserver|eventgate-analytics|eventgate-main|termbox|blubberoid|mathoid|zotero,name=eqiad |
[production] |
12:08 |
<_joe_> |
restarted php-fpm on mw1221 |
[production] |
12:03 |
<fsero@> |
helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' . |
[production] |
12:00 |
<fsero@> |
helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' . |
[production] |
11:56 |
<fsero@> |
helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' . |
[production] |
11:56 |
<fsero@> |
helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' . |
[production] |
11:49 |
<fsero@> |
helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' . |
[production] |
11:44 |
<fsero> |
recreating cxserver blubber and sessionstore namespace - T228836 |
[production] |
11:39 |
<fsero@> |
helmfile [EQIAD] Ran 'apply' command on namespace 'mathoid' for release 'production' . |
[production] |
11:35 |
<gehel> |
restart wdqs-blazegraph on wdqs2001 |
[production] |
11:34 |
<gehel> |
restart wdqs-updater on wdqs2001 |
[production] |
11:30 |
<fsero@> |
helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' . |
[production] |
11:29 |
<fsero@> |
helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' . |
[production] |
11:25 |
<fsero@> |
helmfile [EQIAD] Ran 'apply' command on namespace 'citoid' for release 'production' . |
[production] |
11:21 |
<fsero> |
recreating citoid eventgate-analytics eventgate-main mathoid namespace - T228836 |
[production] |
11:20 |
<fsero@> |
helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' . |
[production] |
11:18 |
<raynor> |
EU SWAT finished |
[production] |
11:15 |
<pmiazga@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:529925|Undeploy editor gender surveys (T227793)]] (duration: 00m 48s) |
[production] |
11:13 |
<fsero> |
recreating termbox namespace - T228836 |
[production] |
11:06 |
<oblivian@> |
helmfile [EQIAD] Ran 'apply' command on namespace 'zotero' for release 'production' . |
[production] |
11:04 |
<fsero> |
resetting net.netfilter.nf_conntrack_tcp_timeout_time_wait to 65 in kubernetes2006 |
[production] |
10:59 |
<_joe_> |
[eqiad] downtiming zotero on icinga for 10 minutes while recreating the deployment with helmfile |
[production] |
10:57 |
<oblivian@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
10:57 |
<oblivian@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
10:56 |
<oblivian@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
10:56 |
<oblivian@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
10:49 |
<oblivian@> |
helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'coredns' . |
[production] |
10:44 |
<oblivian@> |
helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'coredns' . |
[production] |
10:39 |
<oblivian@> |
helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' . |
[production] |