2020-03-25
§
|
10:39 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1137', diff saved to https://phabricator.wikimedia.org/P10760 and previous config saved to /var/cache/conftool/dbconfig/20200325-103938-marostegui.json |
[production] |
10:37 |
<XioNoX> |
change aggregate policy for 2620:0:862::/48 on cr3-knams - T236785 |
[production] |
10:19 |
<XioNoX> |
change aggregate policy for v4 prefixes on cr2-eqdfw - T236785 |
[production] |
10:04 |
<oblivian@deploy1001> |
helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' . |
[production] |
10:04 |
<oblivian@deploy1001> |
helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'production' . |
[production] |
09:56 |
<XioNoX> |
change aggregate policy for 2620:0:860::/46 on cr2-eqdfw - T236785 |
[production] |
09:54 |
<vgutierrez> |
Enable inbound TLSv1.3 on upload@eqsin - T170567 |
[production] |
09:27 |
<jmm@cumin2001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) |
[production] |
09:23 |
<vgutierrez> |
upgrade ATS to 8.0.6-1wm3 on upload@eqsin - T170567 |
[production] |
09:14 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1137', diff saved to https://phabricator.wikimedia.org/P10759 and previous config saved to /var/cache/conftool/dbconfig/20200325-091421-marostegui.json |
[production] |
09:02 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1137', diff saved to https://phabricator.wikimedia.org/P10758 and previous config saved to /var/cache/conftool/dbconfig/20200325-090227-marostegui.json |
[production] |
08:55 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
08:53 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
08:38 |
<marostegui> |
Reimage db1137 |
[production] |
08:18 |
<marostegui> |
Reboot db1117 for full-upgrade |
[production] |
08:15 |
<oblivian@deploy1001> |
helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' . |
[production] |
08:15 |
<oblivian@deploy1001> |
helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'production' . |
[production] |
08:14 |
<_joe_> |
upgrading all eventgate-main to envoy 1.13.1 T246868 |
[production] |
08:12 |
<marostegui> |
Stop all mysql daemons on db1117 |
[production] |
07:50 |
<oblivian@deploy1001> |
helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' . |
[production] |
07:50 |
<oblivian@deploy1001> |
helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'production' . |
[production] |
07:42 |
<XioNoX> |
reboot scs-eqsin for CPU usage |
[production] |
07:20 |
<jmm@cumin2001> |
START - Cookbook sre.ganeti.makevm |
[production] |
07:09 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1137 for upgrade', diff saved to https://phabricator.wikimedia.org/P10757 and previous config saved to /var/cache/conftool/dbconfig/20200325-070946-marostegui.json |
[production] |
06:57 |
<marostegui> |
Deploy schema change on db2129 (s6 codfw master) |
[production] |
06:15 |
<marostegui> |
Rename tables on db1133 (m5 master) nova_api database - T248313 |
[production] |
06:13 |
<marostegui> |
Remove grants 'nova'@'208.80.154.23' on nova.* - T248313 |
[production] |
2020-03-24
§
|
20:53 |
<cdanis> |
repool eqsin |
[production] |
20:52 |
<jforrester@deploy1001> |
Synchronized wmf-config/CommonSettings.php: Don't hard-set wgTmhUseBetaFeatures to true, let it vary by wiki (duration: 01m 07s) |
[production] |
20:50 |
<jforrester@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 07s) |
[production] |
20:49 |
<jforrester@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Set wgTmhUseBetaFeatures to vary by wiki (duration: 01m 06s) |
[production] |
20:35 |
<twentyafterfour@deploy1001> |
rebuilt and synchronized wikiversions files: Attempt #2: group0 wikis to 1.35.0-wmf.25 refs T233873 |
[production] |
20:32 |
<twentyafterfour@deploy1001> |
Synchronized wmf-config: Now touch and sync again because of settings cache rache condition. refs T248409 (duration: 00m 59s) |
[production] |
20:31 |
<cdanis> |
rebooting cr2-eqsin T248394 |
[production] |
20:30 |
<twentyafterfour@deploy1001> |
Synchronized wmf-config: Now sync InitializeSettings* refs T248409 (duration: 00m 59s) |
[production] |
20:28 |
<twentyafterfour@deploy1001> |
Synchronized wmf-config/CommonSettings.php: sync CommonSettings before InitialiseSettings refs T248409 (duration: 00m 58s) |
[production] |
20:27 |
<volans> |
force rebooting analytics1044 from console, host down and unreachable (ping, ssh, console) |
[production] |
20:26 |
<cdanis> |
commit flow-table-size on cr2-eqsin T248394 |
[production] |
20:19 |
<cdanis> |
eqsin depooled for router maintenance at 16:15 |
[production] |
19:29 |
<twentyafterfour@deploy1001> |
scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details) |
[production] |
19:29 |
<twentyafterfour> |
rolling back to wmf.24 due to high error rate refs T233873 |
[production] |
19:28 |
<twentyafterfour@deploy1001> |
scap failed: average error rate on 7/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details) |
[production] |
18:49 |
<gehel> |
repooling wdqs1006, catched up on lag |
[production] |
17:12 |
<hashar@deploy1001> |
Finished scap: testwiki to 1.35.0-wmf.25 and rebuild l10n cache # T233873 (duration: 77m 52s) |
[production] |
17:10 |
<ebernhardson> |
update cloudelastic-chi replica counts from 2 to 1 T231517 |
[production] |
16:41 |
<moritzm> |
installing linux-perf updates on stretch |
[production] |
16:31 |
<moritzm> |
installing linux-perf-4.19 updates on buster |
[production] |
15:58 |
<mutante> |
installing OS on otrs1001.eqiad.wmnet (T248028) |
[production] |
15:54 |
<hashar@deploy1001> |
Started scap: testwiki to 1.35.0-wmf.25 and rebuild l10n cache # T233873 |
[production] |
15:35 |
<hnowlan@deploy1001> |
helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' . |
[production] |