2019-07-16
§
|
23:53 |
<RoanKattouw> |
Deployed patch for T207094 |
[production] |
23:27 |
<catrope@deploy1001> |
Synchronized php-1.34.0-wmf.14/skins/MinervaNeue/: Do not load main menu icons in critical path (T227929) (duration: 00m 55s) |
[production] |
23:26 |
<catrope@deploy1001> |
Synchronized php-1.34.0-wmf.13/skins/MinervaNeue/: Do not load main menu icons in critical path (T227929) (duration: 00m 56s) |
[production] |
23:26 |
<mutante> |
wikitech-static - current status with method 'standalone' is that it's broken on cert renewal and gets fixed by restarting apache, which makes no sense since the previous fixes were the straight opposite and the ticket claims the fix was moving back from apache to standalone (T214640) |
[production] |
23:26 |
<fsero> |
repool ms-fe2005 T228196 |
[production] |
23:23 |
<mutante> |
wikitech-static - testing cert renewal with dry-run option - getting some temp icinga alerts is now expected again because renewal method was changed back from 'apache' to 'standalone' (not by me -> T204840#5243222 i previously did the opposite change in T214640#4907685 to fix it) and that takes down apache during the renewal (T214640) |
[production] |
23:20 |
<mutante> |
wikitech-static - testing cert renewal with dry-run option - getting some temp icinga alerts is now expected again because renewal method was changed back from 'apache' to 'standalone' (not by me) and that takes down apache during the renewal |
[production] |
23:17 |
<catrope@deploy1001> |
Synchronized php-1.34.0-wmf.14/extensions/GrowthExperiments/: Don't use timestamp in help panel questions in Flow (T212433) (duration: 00m 56s) |
[production] |
23:09 |
<mutante> |
wikitech-static got ssl config files in sync with the repo, the difference was really just that space on one line each though (T225258) |
[production] |
22:35 |
<fsero> |
uploading only blobs on docker-registry-codfw from a backup on ms-fe2005 T228196 |
[production] |
22:29 |
<mutante> |
wikitech-static the diff between the ssl config files in the repo and on server were just a space at the end of the ServerAdmin line .... T225258 |
[production] |
22:28 |
<fsero> |
depooling ms-fe2005 for swift upload for registry T228196 |
[production] |
22:26 |
<mutante> |
wikitech-static ran certbot with --dry-run renew to confirm cert renewal works and it was just fine .. 2 minutes later apache errors which were fixed by restarting apache2 (T214640) |
[production] |
22:24 |
<mutante> |
wikitech-static restarted apache |
[production] |
22:11 |
<mutante> |
wikitech-static: turn /etc/apache2/sites-available/wikitech-static.wikimedia.org-ssl.conf and status.wikimedia.org-ssl.conf into symlinks to /wikitech-static/apache/ to match config for http vhosts (T225258) |
[production] |
22:06 |
<mutante> |
wikitech-static: move /etc/apache2/sites-available/000-default.conf and default-ssl.conf out of directory and reload apache to confirm they are not used and get us in sync with the repo contents again (T225258) |
[production] |
21:17 |
<bd808@deploy1001> |
Finished deploy [striker/deploy@247a8a6]: Fixes for ssh key management, git repo creation, and Django upgrade (T221657, T227508) (duration: 01m 08s) |
[production] |
21:15 |
<bd808@deploy1001> |
Started deploy [striker/deploy@247a8a6]: Fixes for ssh key management, git repo creation, and Django upgrade (T221657, T227508) |
[production] |
20:55 |
<SMalyshev> |
repooled wdqs2004 and wdqs2001 - reload done |
[production] |
20:26 |
<mutante> |
ganeti1001 - gnt-instance remove netmon1003.wikimedia.org (T220355) |
[production] |
19:59 |
<XioNoX> |
update ACLs on pfw3-eqiad/codfw - T228205 |
[production] |
19:52 |
<gehel@cumin1001> |
END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) |
[production] |
19:51 |
<fsero> |
republishing base images for wikimedia-(stretch,jessie and buster) T228196 |
[production] |
18:58 |
<gehel@cumin1001> |
START - Cookbook sre.wdqs.data-transfer |
[production] |
18:58 |
<gehel@cumin1001> |
END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) |
[production] |
18:58 |
<gehel@cumin1001> |
START - Cookbook sre.wdqs.data-transfer |
[production] |
18:54 |
<gehel> |
data copy from wdqs2004 to wdqs2001 - T228122 |
[production] |
18:46 |
<otto@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: retry - Produce revision-create stream to eventgate-main - T211248 (duration: 00m 54s) |
[production] |
18:22 |
<otto@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Produce revision-create stream to eventgate-main - T211248 (duration: 00m 54s) |
[production] |
18:08 |
<jforrester@deploy1001> |
Synchronized wmf-config/CommonSettings.php: Update ExtensionDistributor config to point to REL1_33 as the released version (duration: 00m 54s) |
[production] |
18:05 |
<fsero> |
republishing base images for nodejs-slim due to registry T228196 |
[production] |
18:02 |
<andrewbogott> |
rebooting cloudcontrol2003-dev, cloudweb2001-dev, cloudcontrol1004 for T225713 |
[production] |
17:39 |
<otto@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Produce centralnotice.campaign-* streams to eventgate-main - T211248 (duration: 00m 55s) |
[production] |
17:23 |
<bsitzmann@deploy1001> |
Finished deploy [mobileapps/deploy@cb6e7bc]: Update mobileapps to 334a4c4 (T227907) (duration: 04m 51s) |
[production] |
17:19 |
<bsitzmann@deploy1001> |
Started deploy [mobileapps/deploy@cb6e7bc]: Update mobileapps to 334a4c4 (T227907) |
[production] |
16:55 |
<mutante> |
netmon1003: shutdown -h now | ganeti1001: gnt-instance shutdown netmon1003.wikmedia.org - removed from icinga T198939 T220355 |
[production] |
16:36 |
<jiji@deploy1001> |
Finished deploy [cpjobqueue/deploy@5d8128e]: Migrating videoscaling jobs to PHP7 - T219150 (duration: 00m 50s) |
[production] |
16:35 |
<jiji@deploy1001> |
Started deploy [cpjobqueue/deploy@5d8128e]: Migrating videoscaling jobs to PHP7 - T219150 |
[production] |
16:28 |
<dcausse> |
reindexing wikidata (elastic@eqiad) T227136 |
[production] |
15:57 |
<tarrow@> |
helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' . |
[production] |
15:37 |
<elukey> |
reboot analytics1072 as attempt to force the raid controller to set a drive failed - T226467 |
[production] |
15:12 |
<elukey> |
start mariadb on db1107 and re-enable mysql consumers on eventlog1002 and replication on db1108 |
[production] |
14:53 |
<elukey> |
stop mariadb on db1107 to allow maintenance |
[production] |
14:53 |
<elukey> |
stop eventlogging mysql consumers on eventlog1002 and eventlogging_sync on db1108 to allow db1107 maintenance |
[production] |
14:52 |
<jbond42> |
will restart redis on oresdb at 16:00 UTC - T228045 |
[production] |
14:51 |
<jbond42> |
enable puppet accross the fleat |
[production] |
14:50 |
<jbond@cumin1001> |
conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org |
[production] |
14:43 |
<jbond@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
14:43 |
<jbond@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |