2019-12-05
§
|
22:03 |
<mutante> |
phabricator - git-ssh.wikimedia.org has been fixed and is up again (T238956) |
[production] |
22:01 |
<mutante> |
phab1001 - restarting ssh-phab to listen on additional LVS IP |
[production] |
22:00 |
<krinkle@deploy1001> |
Synchronized php-1.35.0-wmf.8/includes/libs/rdbms/database/: T233342 (duration: 01m 02s) |
[production] |
21:55 |
<twentyafterfour> |
stopping phd on phab1003 and starting on phab1001 |
[production] |
21:50 |
<mutante> |
phab1003 - remove IPv6 service IP for git-ssh from lo:LVS |
[production] |
21:34 |
<mutante> |
puppetmaster2001: deleting /var/run/confd-template/.git-ssh*.err to fix confd template compilation alerts |
[production] |
21:33 |
<mutante> |
puppetmaster1001: deleting /var/run/confd-template/.git-ssh*.err to fix confd template compilation alerts |
[production] |
21:19 |
<mutante> |
phab1001 - systemctl restart ssh-phab (to make it listen on IPv6, race between puppet adding the IP and starting the service) |
[production] |
21:09 |
<bblack> |
ns0.wikimedia.org: restore routing to authdns1001 |
[production] |
21:03 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=phab1001-vcs.eqiad.wmnet |
[production] |
21:00 |
<mutante> |
phab1001 - reload apache2, removed /ws/ rewrite for wstunnel for aphlict |
[production] |
21:00 |
<bblack@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
20:58 |
<bblack@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
20:56 |
<bblack> |
cr[12]-eqiad: delete leftover static route of ns2->authdns1001 from esams work, which was blinding icinga to the real ns2 :P |
[production] |
20:49 |
<mholloway-shell@deploy1001> |
helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' . |
[production] |
20:48 |
<twentyafterfour> |
successfully migrated to phab1001 with no apparent user impact! |
[production] |
20:47 |
<mholloway-shell@deploy1001> |
helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' . |
[production] |
20:46 |
<mholloway-shell@deploy1001> |
helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' . |
[production] |
20:43 |
<bblack> |
ns0.wikimedia.org: re-routing auth traffic from authdns1001 (reimaging) to dns1001 |
[production] |
20:41 |
<mutante> |
running puppet on all cp* for phab change |
[production] |
20:36 |
<volker-e@deploy1001> |
Finished deploy [design/style-guide@437023f]: Deploy design/style-guide: (duration: 00m 08s) |
[production] |
20:36 |
<volker-e@deploy1001> |
Started deploy [design/style-guide@437023f]: Deploy design/style-guide: |
[production] |
20:29 |
<twentyafterfour> |
migrating back to phab1001, minimal downtime expected |
[production] |
20:12 |
<mutante> |
phab1001 - rebooting to hopefully clear "microcode vuln" icinga alert |
[production] |
20:11 |
<onimisionipe> |
ban cloudelastic1002 from shard allocation - T230088 |
[production] |
20:10 |
<bblack> |
ns1.wikimedia.org: restoring normal routing to the newly-reimaged authdns2001 |
[production] |
19:56 |
<bblack@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
19:53 |
<bblack@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
19:47 |
<urbanecm@deploy1001> |
Synchronized php-1.35.0-wmf.8/extensions/Linter/extension.json: SWAT: afcfdce: Revert "Revert "Implement ParserLogLinterData hook"" (3/3, T238456) (duration: 01m 00s) |
[production] |
19:46 |
<urbanecm@deploy1001> |
Synchronized php-1.35.0-wmf.8/extensions/Linter/includes/ApiRecordLint.php: SWAT: afcfdce: Revert "Revert "Implement ParserLogLinterData hook"" (2/3, T238456) (duration: 01m 09s) |
[production] |
19:44 |
<urbanecm@deploy1001> |
Synchronized php-1.35.0-wmf.8/extensions/Linter/includes/Hooks.php: SWAT: afcfdce: Revert "Revert "Implement ParserLogLinterData hook"" (1/3, T238456) (duration: 01m 11s) |
[production] |
19:41 |
<urbanecm@deploy1001> |
Synchronized php-1.35.0-wmf.5/extensions/Linter/includes/ApiRecordLint.php: SWAT: 7b7f326: Implement ParserLogLinterData hook (3/3, T238456) (duration: 01m 04s) |
[production] |
19:39 |
<urbanecm@deploy1001> |
Synchronized php-1.35.0-wmf.5/extensions/Linter/extension.json: SWAT: 7b7f326: Implement ParserLogLinterData hook (2/3, T238456) (duration: 01m 05s) |
[production] |
19:37 |
<urbanecm@deploy1001> |
Synchronized php-1.35.0-wmf.5/extensions/Linter/includes/Hooks.php: SWAT: 7b7f326: Implement ParserLogLinterData hook (1/3, T238456) (duration: 01m 09s) |
[production] |
19:35 |
<mutante> |
Icinga: delete all downtimes for mw2259. Scheduling Icinga downtimes is tricky business. If you add some for hardware failure and they are too short you cause Icinga spam, if they are too long and the dcops operator is amazingly fast like Papaul then your server is back in production but not monitored and you have to click a million times in the web UI to remove them to avoid that. |
[production] |
19:34 |
<bblack> |
ns1.wikimedia.org: re-route authdns traffic from authdns2001 (to be reimaged) -> dns2001 temporarily - T239667 |
[production] |
19:28 |
<urbanecm@deploy1001> |
Synchronized php-1.35.0-wmf.8/extensions/Linter: SWAT: e0a2059: Revert "Implement ParserLogLinterData hook" (duration: 01m 01s) |
[production] |
19:19 |
<urbanecm@deploy1001> |
Synchronized php-1.35.0-wmf.5/extensions/Linter/: SWAT: b376528: Revert "Implement ParserLogLinterData hook" (duration: 01m 01s) |
[production] |
19:15 |
<urbanecm@deploy1001> |
scap failed: average error rate on 3/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details) |
[production] |
19:14 |
<urbanecm@deploy1001> |
Synchronized php-1.35.0-wmf.8/extensions/Linter: SWAT: 839c383: Implement ParserLogLinterData hook (T238456) (duration: 01m 02s) |
[production] |
18:40 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2259.codfw.wmnet |
[production] |
18:25 |
<kevinbazira@deploy1001> |
Finished deploy [ores/deploy@6dd1fef]: T238839 (duration: 17m 20s) |
[production] |
18:08 |
<kevinbazira@deploy1001> |
Started deploy [ores/deploy@6dd1fef]: T238839 |
[production] |
17:38 |
<jiji@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
17:36 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
17:31 |
<ebernhardson@deploy1001> |
Finished deploy [wikimedia/discovery/analytics@c29a758]: deploy repo to search-airflow dsh group (duration: 00m 13s) |
[production] |
17:30 |
<ebernhardson@deploy1001> |
Started deploy [wikimedia/discovery/analytics@c29a758]: deploy repo to search-airflow dsh group |
[production] |
17:23 |
<cdanis> |
✔️ cdanis@install1002.wikimedia.org ~ 🕧☕ sudo -E reprepro -C main include stretch-wikimedia prometheus-atlas-exporter_1.0+git20191204.ffafab7-1_amd64.changes |
[production] |
17:18 |
<effie> |
reimage mw2260, yes again |
[production] |
16:47 |
<ebernhardson@deploy1001> |
Finished deploy [wikimedia/discovery/analytics@87b25f2]: initial airflow dags/plugins (duration: 00m 06s) |
[production] |