2019-12-05
§
|
20:46 |
<mholloway-shell@deploy1001> |
helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' . |
[production] |
20:43 |
<bblack> |
ns0.wikimedia.org: re-routing auth traffic from authdns1001 (reimaging) to dns1001 |
[production] |
20:41 |
<mutante> |
running puppet on all cp* for phab change |
[production] |
20:36 |
<volker-e@deploy1001> |
Finished deploy [design/style-guide@437023f]: Deploy design/style-guide: (duration: 00m 08s) |
[production] |
20:36 |
<volker-e@deploy1001> |
Started deploy [design/style-guide@437023f]: Deploy design/style-guide: |
[production] |
20:29 |
<twentyafterfour> |
migrating back to phab1001, minimal downtime expected |
[production] |
20:12 |
<mutante> |
phab1001 - rebooting to hopefully clear "microcode vuln" icinga alert |
[production] |
20:11 |
<onimisionipe> |
ban cloudelastic1002 from shard allocation - T230088 |
[production] |
20:10 |
<bblack> |
ns1.wikimedia.org: restoring normal routing to the newly-reimaged authdns2001 |
[production] |
19:56 |
<bblack@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
19:53 |
<bblack@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
19:47 |
<urbanecm@deploy1001> |
Synchronized php-1.35.0-wmf.8/extensions/Linter/extension.json: SWAT: afcfdce: Revert "Revert "Implement ParserLogLinterData hook"" (3/3, T238456) (duration: 01m 00s) |
[production] |
19:46 |
<urbanecm@deploy1001> |
Synchronized php-1.35.0-wmf.8/extensions/Linter/includes/ApiRecordLint.php: SWAT: afcfdce: Revert "Revert "Implement ParserLogLinterData hook"" (2/3, T238456) (duration: 01m 09s) |
[production] |
19:44 |
<urbanecm@deploy1001> |
Synchronized php-1.35.0-wmf.8/extensions/Linter/includes/Hooks.php: SWAT: afcfdce: Revert "Revert "Implement ParserLogLinterData hook"" (1/3, T238456) (duration: 01m 11s) |
[production] |
19:41 |
<urbanecm@deploy1001> |
Synchronized php-1.35.0-wmf.5/extensions/Linter/includes/ApiRecordLint.php: SWAT: 7b7f326: Implement ParserLogLinterData hook (3/3, T238456) (duration: 01m 04s) |
[production] |
19:39 |
<urbanecm@deploy1001> |
Synchronized php-1.35.0-wmf.5/extensions/Linter/extension.json: SWAT: 7b7f326: Implement ParserLogLinterData hook (2/3, T238456) (duration: 01m 05s) |
[production] |
19:37 |
<urbanecm@deploy1001> |
Synchronized php-1.35.0-wmf.5/extensions/Linter/includes/Hooks.php: SWAT: 7b7f326: Implement ParserLogLinterData hook (1/3, T238456) (duration: 01m 09s) |
[production] |
19:35 |
<mutante> |
Icinga: delete all downtimes for mw2259. Scheduling Icinga downtimes is tricky business. If you add some for hardware failure and they are too short you cause Icinga spam, if they are too long and the dcops operator is amazingly fast like Papaul then your server is back in production but not monitored and you have to click a million times in the web UI to remove them to avoid that. |
[production] |
19:34 |
<bblack> |
ns1.wikimedia.org: re-route authdns traffic from authdns2001 (to be reimaged) -> dns2001 temporarily - T239667 |
[production] |
19:28 |
<urbanecm@deploy1001> |
Synchronized php-1.35.0-wmf.8/extensions/Linter: SWAT: e0a2059: Revert "Implement ParserLogLinterData hook" (duration: 01m 01s) |
[production] |
19:19 |
<urbanecm@deploy1001> |
Synchronized php-1.35.0-wmf.5/extensions/Linter/: SWAT: b376528: Revert "Implement ParserLogLinterData hook" (duration: 01m 01s) |
[production] |
19:15 |
<urbanecm@deploy1001> |
scap failed: average error rate on 3/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details) |
[production] |
19:14 |
<urbanecm@deploy1001> |
Synchronized php-1.35.0-wmf.8/extensions/Linter: SWAT: 839c383: Implement ParserLogLinterData hook (T238456) (duration: 01m 02s) |
[production] |
18:40 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2259.codfw.wmnet |
[production] |
18:25 |
<kevinbazira@deploy1001> |
Finished deploy [ores/deploy@6dd1fef]: T238839 (duration: 17m 20s) |
[production] |
18:08 |
<kevinbazira@deploy1001> |
Started deploy [ores/deploy@6dd1fef]: T238839 |
[production] |
17:38 |
<jiji@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
17:36 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
17:31 |
<ebernhardson@deploy1001> |
Finished deploy [wikimedia/discovery/analytics@c29a758]: deploy repo to search-airflow dsh group (duration: 00m 13s) |
[production] |
17:30 |
<ebernhardson@deploy1001> |
Started deploy [wikimedia/discovery/analytics@c29a758]: deploy repo to search-airflow dsh group |
[production] |
17:23 |
<cdanis> |
✔️ cdanis@install1002.wikimedia.org ~ 🕧☕ sudo -E reprepro -C main include stretch-wikimedia prometheus-atlas-exporter_1.0+git20191204.ffafab7-1_amd64.changes |
[production] |
17:18 |
<effie> |
reimage mw2260, yes again |
[production] |
16:47 |
<ebernhardson@deploy1001> |
Finished deploy [wikimedia/discovery/analytics@87b25f2]: initial airflow dags/plugins (duration: 00m 06s) |
[production] |
16:47 |
<ebernhardson@deploy1001> |
Started deploy [wikimedia/discovery/analytics@87b25f2]: initial airflow dags/plugins |
[production] |
16:40 |
<brion> |
running `requeueTranscodes.php --error --throttle` on mwmaint1002 to clean up T239831-related broken video transcodes. will raise usage on video scalers for a while. |
[production] |
16:33 |
<elukey> |
execute clear bfd session address fe80::5e5e:ab00:d3d:85ce on cr3-knams |
[production] |
16:32 |
<elukey> |
execute clear bfd session address fe80::7a4f:9b00:d4e:8004 on cr1-eqiad |
[production] |
16:20 |
<elukey> |
execute clear bfd session address 208.80.154.208 on cr2-eqord |
[production] |
16:20 |
<elukey> |
elukey@cr2-eqord> clear bfd session 208.80.154.208 |
[production] |
15:50 |
<anomie@deploy1001> |
Finished scap: Backporting fix for T239428 (duration: 33m 20s) |
[production] |
15:49 |
<ejegg> |
re-enabled creating CiviMail activities when sending Thank You emails |
[production] |
15:44 |
<jynus> |
restart backup1001, overloaded T234900 |
[production] |
15:43 |
<akosiaris@deploy1001> |
helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' . |
[production] |
15:43 |
<moritzm> |
upgrading the reimaged video scalers back to the row-mt enabled ffmpeg T239831 |
[production] |
15:41 |
<ejegg> |
updated Fundraising CiviCRM from 4a72ad4e63 to 30cdc5fa59 |
[production] |
15:17 |
<anomie@deploy1001> |
Started scap: Backporting fix for T239428 |
[production] |
15:16 |
<onimisionipe> |
run osm-import on maps1004 - T239728 |
[production] |
14:52 |
<cdanis@deploy1001> |
Synchronized src/Noc/WmfClusters.php: c0fe7c410 clarify loads output (earlier push was 7963fdcd2 sort clusters naturally) (duration: 00m 59s) |
[production] |
14:52 |
<onimisionipe> |
disable puppet on maps100[1-3].eqiad.wmnet - T239728 |
[production] |
14:51 |
<onimisionipe> |
disable tilerator on maps100[1-3].eqiad.wmnet - T239728 |
[production] |