2020-01-15
§
|
08:23 |
<godog> |
roll restart ores in codfw/eqiad to apply logging pipeline changes |
[production] |
08:13 |
<godog> |
testing ores logging to pipeline on ores2001 |
[production] |
07:02 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10159 and previous config saved to /var/cache/conftool/dbconfig/20200115-070201-marostegui.json |
[production] |
06:53 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10158 and previous config saved to /var/cache/conftool/dbconfig/20200115-065353-marostegui.json |
[production] |
06:53 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Fully repool db1080', diff saved to https://phabricator.wikimedia.org/P10157 and previous config saved to /var/cache/conftool/dbconfig/20200115-065305-marostegui.json |
[production] |
06:46 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10156 and previous config saved to /var/cache/conftool/dbconfig/20200115-064606-marostegui.json |
[production] |
06:45 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10155 and previous config saved to /var/cache/conftool/dbconfig/20200115-064535-marostegui.json |
[production] |
06:25 |
<marostegui> |
Upgrade db1098:3316 and db1098:3317 |
[production] |
06:23 |
<mholloway-shell@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: MachineVision: Make testcommonswiki behavior consistent with commonswiki (duration: 01m 16s) |
[production] |
06:20 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1098:3316 db1098:3317 for upgrade', diff saved to https://phabricator.wikimedia.org/P10152 and previous config saved to /var/cache/conftool/dbconfig/20200115-062028-marostegui.json |
[production] |
06:19 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10151 and previous config saved to /var/cache/conftool/dbconfig/20200115-061859-marostegui.json |
[production] |
06:16 |
<marostegui> |
Remove revision partitions from db2088:3311 - T239453 |
[production] |
06:10 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repool db1103:3312 - T239453', diff saved to https://phabricator.wikimedia.org/P10150 and previous config saved to /var/cache/conftool/dbconfig/20200115-061052-marostegui.json |
[production] |
06:03 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10148 and previous config saved to /var/cache/conftool/dbconfig/20200115-060347-marostegui.json |
[production] |
06:00 |
<mholloway-shell@deploy1001> |
Finished deploy [mobileapps/deploy@3c5f615]: Update mobileapps to 7f507ae (duration: 05m 56s) |
[production] |
05:54 |
<mholloway-shell@deploy1001> |
Started deploy [mobileapps/deploy@3c5f615]: Update mobileapps to 7f507ae |
[production] |
01:32 |
<mutante> |
lvs1015 powercycling, crashed, nothing on console, lots of unknowns in icinga |
[production] |
01:17 |
<mutante> |
dbproxy1017 and dbproxy1021 were showing "haproxy failover" icinga alerts. did the check described on https://wikitech.wikimedia.org/wiki/HAProxy#Failover and it claimed on both that db1133 was DOWN..but checking db1133 itself showed it was up and working normal. in that case the docs said to 'systemctl reload haproxy'. done on both and things recovered |
[production] |
01:13 |
<mutante> |
dbproxy1017 - systemctl reload haproxy |
[production] |
00:22 |
<bstorm_> |
restarted maintain-dbusers on labstore1004 after recovering the m5 DB's connection issue |
[production] |
00:12 |
<bstorm_> |
set max_connections to 600 temporarily while troubleshooting on m5 (db1133) |
[production] |
2020-01-14
§
|
20:11 |
<milimetric@deploy1001> |
Finished deploy [analytics/aqs/deploy@1cf0530]: Increment service-runner to latest version (duration: 04m 48s) |
[production] |
20:07 |
<milimetric@deploy1001> |
Started deploy [analytics/aqs/deploy@1cf0530]: Increment service-runner to latest version |
[production] |
19:22 |
<urbanecm@deploy1001> |
Synchronized wmf-config/CommonSettings.php: SWAT: e400916: [wikitech] Restore contentadmin ability to manage abuse filters (duration: 01m 05s) |
[production] |
18:11 |
<vgutierrez> |
repooling cp5012 |
[production] |
18:06 |
<vgutierrez> |
depool cp5012 for some ats parent select debugging |
[production] |
17:43 |
<vgutierrez> |
repooling cp4027 |
[production] |
17:39 |
<vgutierrez> |
depooling cp4027 for some ats-tls parent balancing tests |
[production] |
17:21 |
<_joe_> |
upload docker-report 0.0.2 to {buster,stretch}-wikimedia T242604 |
[production] |
16:53 |
<liw@deploy1001> |
rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.15 |
[production] |
16:46 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
16:44 |
<liw> |
branch is cut for 1.35.0-wmv.15; train window is closed, but I'll continue train since the next time slot seems to not have anything |
[production] |
16:44 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
16:41 |
<marostegui> |
Enable puppet back on install1002 and install2002 - T242481 |
[production] |
16:31 |
<liw@deploy1001> |
Finished scap: testwiki to php-1.34.0-wmf.15 and rebuild l10n cache (try 2) (duration: 43m 29s) |
[production] |
16:26 |
<marostegui> |
Disable temporarily puppet on install1002 and install2002 - T242481 |
[production] |
16:08 |
<volans@deploy1001> |
Finished deploy [debmonitor/deploy@e72911c]: Release v0.2.4 (duration: 01m 09s) |
[production] |
16:07 |
<volans@deploy1001> |
Started deploy [debmonitor/deploy@e72911c]: Release v0.2.4 |
[production] |
15:47 |
<liw@deploy1001> |
Started scap: testwiki to php-1.34.0-wmf.15 and rebuild l10n cache (try 2) |
[production] |
15:02 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
15:02 |
<marostegui> |
Copy data from db1080 to db1107 T242702 |
[production] |
15:02 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1080 for tranfer', diff saved to https://phabricator.wikimedia.org/P10144 and previous config saved to /var/cache/conftool/dbconfig/20200114-150223-marostegui.json |
[production] |
15:00 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
14:51 |
<liw@deploy1001> |
scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_44869219" --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 03m 55s) |
[production] |
14:47 |
<liw@deploy1001> |
Started scap: testwiki to php-1.35.0-wmf.15 and rebuild l10n cache |
[production] |
14:43 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10143 and previous config saved to /var/cache/conftool/dbconfig/20200114-144341-marostegui.json |
[production] |
14:26 |
<marostegui> |
Move db1114 under db1080 |
[production] |
14:24 |
<marostegui> |
Stop db1080 and db1107 replication in sync |
[production] |
14:21 |
<XioNoX> |
push firewall policies to pfw3-eqiad - T242681 |
[production] |
14:15 |
<XioNoX> |
push firewall policies to pfw3-codfw - T242681 |
[production] |