2019-07-17
§
|
09:33 |
<gehel@cumin1001> |
START - Cookbook sre.wdqs.data-transfer |
[production] |
09:33 |
<gehel@cumin1001> |
END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) |
[production] |
09:23 |
<moritzm> |
rebooting grafana1001 to pick up MDS-enabled qemu |
[production] |
09:21 |
<ema> |
cp-ats: upgrade fifo-log-demux to 0.3 T227668 |
[production] |
09:21 |
<marostegui@deploy1001> |
Synchronized wmf-config/db-codfw.php: Depool and clarify db2045 status T227862 (duration: 00m 55s) |
[production] |
09:19 |
<jmm@cumin2001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
09:19 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.downtime |
[production] |
09:15 |
<gehel@cumin1001> |
START - Cookbook sre.wdqs.data-transfer |
[production] |
09:07 |
<ema> |
upload fifo-log-demux 0.3 to stretch-wikimedia T227668 |
[production] |
08:51 |
<jmm@cumin2001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
08:51 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.downtime |
[production] |
08:36 |
<jijiki> |
Disable puppet on thumbor* in eqiad, depool and pool back to apply 523728 - T224572 |
[production] |
08:17 |
<jijiki> |
Pool mw1239 - T227867 |
[production] |
07:48 |
<godog> |
swift eqiad-prod: put back ms-be1043 sdk1 - T218544 |
[production] |
07:46 |
<ema> |
cp-esams: varnish frontend rolling restarts for 5.1.3-1wm11 upgrades T227672 |
[production] |
07:33 |
<moritzm> |
reimaging sarin for some tests |
[production] |
06:59 |
<elukey> |
apply mcrouter async replication to mw2224 - T225642 |
[production] |
06:25 |
<elukey> |
reboot analytics1072 as attempt to clear the megacli's config (and add a new disk) |
[production] |
06:20 |
<elukey> |
sudo -i /usr/local/sbin/restart-php7.2-fpm on mwdebug* to reset opcache |
[production] |
05:26 |
<marostegui> |
Stop MySQL on db1065 for decommissioning - T227560 |
[production] |
05:24 |
<marostegui> |
Remove db1065 from tendril and zarcillo - T227560 |
[production] |
03:46 |
<tstarling@deploy1001> |
Synchronized php-1.34.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialMultiLock.php: T227772 (duration: 00m 54s) |
[production] |
03:42 |
<tstarling@deploy1001> |
Synchronized php-1.34.0-wmf.13/extensions/CentralAuth/includes/specials/SpecialMultiLock.php: T227772 (duration: 00m 56s) |
[production] |
03:00 |
<tstarling@deploy1001> |
Synchronized php-1.34.0-wmf.13/includes/Permissions/PermissionManager.php: (no justification provided) (duration: 00m 54s) |
[production] |
02:58 |
<tstarling@deploy1001> |
Synchronized php-1.34.0-wmf.14/includes/Permissions/PermissionManager.php: (no justification provided) (duration: 00m 57s) |
[production] |
00:50 |
<mutante> |
wikitech-static commented out cert renewal cron job out of caution - still needs fixing but continue tomorrow |
[production] |
00:12 |
<mutante> |
wikitech-static - adding (undocumented!) option webroot-map to certbot config to use webroot authenticator with different document roots per domain while using the config file and not cli params (T214640) |
[production] |
00:01 |
<mutante> |
wikitech-static certbot --dry-run renew (T214640) |
[production] |
00:01 |
<mutante> |
wikitech-static changing certbot renewalparams: authenticator = webroot (changed from standalone), install = apache (unchanged) (T214640) |
[production] |
2019-07-16
§
|
23:53 |
<RoanKattouw> |
Deployed patch for T207094 |
[production] |
23:27 |
<catrope@deploy1001> |
Synchronized php-1.34.0-wmf.14/skins/MinervaNeue/: Do not load main menu icons in critical path (T227929) (duration: 00m 55s) |
[production] |
23:26 |
<catrope@deploy1001> |
Synchronized php-1.34.0-wmf.13/skins/MinervaNeue/: Do not load main menu icons in critical path (T227929) (duration: 00m 56s) |
[production] |
23:26 |
<mutante> |
wikitech-static - current status with method 'standalone' is that it's broken on cert renewal and gets fixed by restarting apache, which makes no sense since the previous fixes were the straight opposite and the ticket claims the fix was moving back from apache to standalone (T214640) |
[production] |
23:26 |
<fsero> |
repool ms-fe2005 T228196 |
[production] |
23:23 |
<mutante> |
wikitech-static - testing cert renewal with dry-run option - getting some temp icinga alerts is now expected again because renewal method was changed back from 'apache' to 'standalone' (not by me -> T204840#5243222 i previously did the opposite change in T214640#4907685 to fix it) and that takes down apache during the renewal (T214640) |
[production] |
23:20 |
<mutante> |
wikitech-static - testing cert renewal with dry-run option - getting some temp icinga alerts is now expected again because renewal method was changed back from 'apache' to 'standalone' (not by me) and that takes down apache during the renewal |
[production] |
23:17 |
<catrope@deploy1001> |
Synchronized php-1.34.0-wmf.14/extensions/GrowthExperiments/: Don't use timestamp in help panel questions in Flow (T212433) (duration: 00m 56s) |
[production] |
23:09 |
<mutante> |
wikitech-static got ssl config files in sync with the repo, the difference was really just that space on one line each though (T225258) |
[production] |
22:35 |
<fsero> |
uploading only blobs on docker-registry-codfw from a backup on ms-fe2005 T228196 |
[production] |
22:29 |
<mutante> |
wikitech-static the diff between the ssl config files in the repo and on server were just a space at the end of the ServerAdmin line .... T225258 |
[production] |
22:28 |
<fsero> |
depooling ms-fe2005 for swift upload for registry T228196 |
[production] |
22:26 |
<mutante> |
wikitech-static ran certbot with --dry-run renew to confirm cert renewal works and it was just fine .. 2 minutes later apache errors which were fixed by restarting apache2 (T214640) |
[production] |
22:24 |
<mutante> |
wikitech-static restarted apache |
[production] |
22:11 |
<mutante> |
wikitech-static: turn /etc/apache2/sites-available/wikitech-static.wikimedia.org-ssl.conf and status.wikimedia.org-ssl.conf into symlinks to /wikitech-static/apache/ to match config for http vhosts (T225258) |
[production] |
22:06 |
<mutante> |
wikitech-static: move /etc/apache2/sites-available/000-default.conf and default-ssl.conf out of directory and reload apache to confirm they are not used and get us in sync with the repo contents again (T225258) |
[production] |
21:17 |
<bd808@deploy1001> |
Finished deploy [striker/deploy@247a8a6]: Fixes for ssh key management, git repo creation, and Django upgrade (T221657, T227508) (duration: 01m 08s) |
[production] |
21:15 |
<bd808@deploy1001> |
Started deploy [striker/deploy@247a8a6]: Fixes for ssh key management, git repo creation, and Django upgrade (T221657, T227508) |
[production] |
20:55 |
<SMalyshev> |
repooled wdqs2004 and wdqs2001 - reload done |
[production] |
20:26 |
<mutante> |
ganeti1001 - gnt-instance remove netmon1003.wikimedia.org (T220355) |
[production] |
19:59 |
<XioNoX> |
update ACLs on pfw3-eqiad/codfw - T228205 |
[production] |