2020-01-15
§
|
13:53 |
<akosiaris> |
update calico policy on eqiad/codfw/staging. Add new urldownloaders. T224551 |
[production] |
13:52 |
<akosiaris@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . |
[production] |
13:02 |
<_joe_> |
restarting gerrit |
[production] |
12:50 |
<XioNoX> |
reject RPKI invalids in eqsin - T220669 |
[production] |
12:38 |
<vgutierrez> |
Pooling ulsfo for ncredir service - T242321 |
[production] |
12:27 |
<awight> |
EU SWAT done |
[production] |
12:24 |
<awight@deploy1001> |
Synchronized php-1.35.0-wmf.14/extensions/Cite: SWAT: [[gerrit:564002|Don't fail with a LogicException during section preview (T242434)]] (duration: 01m 10s) |
[production] |
12:22 |
<vgutierrez> |
upgrading ats on cp4026, cp4032, cp5006 and cp5012 - T242778 T242620 |
[production] |
12:06 |
<XioNoX> |
reject RPKI invalids in ulsfo - T220669 |
[production] |
11:58 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Fully repool db1112', diff saved to https://phabricator.wikimedia.org/P10161 and previous config saved to /var/cache/conftool/dbconfig/20200115-115826-marostegui.json |
[production] |
11:36 |
<elukey> |
restart all varnishkafka daemons on cp4031 |
[production] |
11:09 |
<legoktm> |
added SonarQubeBot to "Non-Interactive Users" group on Gerrit |
[production] |
10:38 |
<moritzm> |
installing openssl1.0 updates on stretch (update to 1.0.2u) |
[production] |
10:08 |
<ema> |
cache: rolling varnish-frontend-restart to add CAP_KILL to varnish-frontend.service T242411 |
[production] |
09:55 |
<vgutierrez> |
repooling cp5012 |
[production] |
09:46 |
<vgutierrez> |
depooling cp5012 for some ats parent select tests |
[production] |
09:42 |
<XioNoX> |
enable ping offload in esams - T190090 |
[production] |
09:32 |
<marostegui> |
Deploy schema change on x1 eqiad hosts T242749 |
[production] |
09:19 |
<elukey> |
roll-restart druid brokers on druid100[4-6] - locked up after segments deletion |
[production] |
09:11 |
<marostegui> |
Deploy schema change on x1 codfw - T242749 |
[production] |
08:51 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Fully repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10160 and previous config saved to /var/cache/conftool/dbconfig/20200115-085145-marostegui.json |
[production] |
08:44 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) |
[production] |
08:40 |
<godog> |
roll restart ores in codfw/eqiad to apply logging pipeline changes |
[production] |
08:40 |
<elukey@cumin1001> |
START - Cookbook sre.aqs.roll-restart |
[production] |
08:40 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) |
[production] |
08:40 |
<elukey@cumin1001> |
START - Cookbook sre.aqs.roll-restart |
[production] |
08:23 |
<godog> |
roll restart ores in codfw/eqiad to apply logging pipeline changes |
[production] |
08:13 |
<godog> |
testing ores logging to pipeline on ores2001 |
[production] |
07:02 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10159 and previous config saved to /var/cache/conftool/dbconfig/20200115-070201-marostegui.json |
[production] |
06:53 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10158 and previous config saved to /var/cache/conftool/dbconfig/20200115-065353-marostegui.json |
[production] |
06:53 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Fully repool db1080', diff saved to https://phabricator.wikimedia.org/P10157 and previous config saved to /var/cache/conftool/dbconfig/20200115-065305-marostegui.json |
[production] |
06:46 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10156 and previous config saved to /var/cache/conftool/dbconfig/20200115-064606-marostegui.json |
[production] |
06:45 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10155 and previous config saved to /var/cache/conftool/dbconfig/20200115-064535-marostegui.json |
[production] |
06:25 |
<marostegui> |
Upgrade db1098:3316 and db1098:3317 |
[production] |
06:23 |
<mholloway-shell@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: MachineVision: Make testcommonswiki behavior consistent with commonswiki (duration: 01m 16s) |
[production] |
06:20 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1098:3316 db1098:3317 for upgrade', diff saved to https://phabricator.wikimedia.org/P10152 and previous config saved to /var/cache/conftool/dbconfig/20200115-062028-marostegui.json |
[production] |
06:19 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10151 and previous config saved to /var/cache/conftool/dbconfig/20200115-061859-marostegui.json |
[production] |
06:16 |
<marostegui> |
Remove revision partitions from db2088:3311 - T239453 |
[production] |
06:10 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repool db1103:3312 - T239453', diff saved to https://phabricator.wikimedia.org/P10150 and previous config saved to /var/cache/conftool/dbconfig/20200115-061052-marostegui.json |
[production] |
06:03 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10148 and previous config saved to /var/cache/conftool/dbconfig/20200115-060347-marostegui.json |
[production] |
06:00 |
<mholloway-shell@deploy1001> |
Finished deploy [mobileapps/deploy@3c5f615]: Update mobileapps to 7f507ae (duration: 05m 56s) |
[production] |
05:54 |
<mholloway-shell@deploy1001> |
Started deploy [mobileapps/deploy@3c5f615]: Update mobileapps to 7f507ae |
[production] |
01:32 |
<mutante> |
lvs1015 powercycling, crashed, nothing on console, lots of unknowns in icinga |
[production] |
01:17 |
<mutante> |
dbproxy1017 and dbproxy1021 were showing "haproxy failover" icinga alerts. did the check described on https://wikitech.wikimedia.org/wiki/HAProxy#Failover and it claimed on both that db1133 was DOWN..but checking db1133 itself showed it was up and working normal. in that case the docs said to 'systemctl reload haproxy'. done on both and things recovered |
[production] |
01:13 |
<mutante> |
dbproxy1017 - systemctl reload haproxy |
[production] |
00:22 |
<bstorm_> |
restarted maintain-dbusers on labstore1004 after recovering the m5 DB's connection issue |
[production] |
00:12 |
<bstorm_> |
set max_connections to 600 temporarily while troubleshooting on m5 (db1133) |
[production] |