251-300 of 10000 results (79ms)
2020-03-25 §
11:21 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime [production]
11:20 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw123[2-5].eqiad.wmnet [production]
11:20 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw125[0-3].eqiad.wmnet [production]
11:19 <urbanecm@deploy1001> Synchronized wmf-config/CommonSettings.php: SWAT: 59412db: Add gwtoolset to available rights to allow granting to global groups (duration: 01m 07s) [production]
11:12 <urbanecm@deploy1001> Synchronized wmf-config/CommonSettings.php: SWAT: 7b8d7c5: TwoColConflict: Limited default deployment CommonSettings.php (T244863) (duration: 01m 06s) [production]
11:10 <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: 81cda0f: TwoColConflict: Limited default deployment InitialiseSettings.php (T244863; take II) (duration: 01m 06s) [production]
11:08 <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: 81cda0f: TwoColConflict: Limited default deployment InitialiseSettings.php (T244863) (duration: 01m 17s) [production]
11:08 <jynus@cumin1001> dbctl commit (dc=all): 'Reduce db1091 load, increase main traffic on all other s4 instances', diff saved to https://phabricator.wikimedia.org/P10762 and previous config saved to /var/cache/conftool/dbconfig/20200325-110821-jynus.json [production]
10:55 <marostegui@cumin1001> dbctl commit (dc=all): 'Fully repool db1137', diff saved to https://phabricator.wikimedia.org/P10761 and previous config saved to /var/cache/conftool/dbconfig/20200325-105503-marostegui.json [production]
10:39 <marostegui@cumin1001> dbctl commit (dc=all): 'Slowly repool db1137', diff saved to https://phabricator.wikimedia.org/P10760 and previous config saved to /var/cache/conftool/dbconfig/20200325-103938-marostegui.json [production]
10:37 <XioNoX> change aggregate policy for 2620:0:862::/48 on cr3-knams - T236785 [production]
10:19 <XioNoX> change aggregate policy for v4 prefixes on cr2-eqdfw - T236785 [production]
10:04 <oblivian@deploy1001> helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' . [production]
10:04 <oblivian@deploy1001> helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'production' . [production]
09:56 <XioNoX> change aggregate policy for 2620:0:860::/46 on cr2-eqdfw - T236785 [production]
09:54 <vgutierrez> Enable inbound TLSv1.3 on upload@eqsin - T170567 [production]
09:27 <jmm@cumin2001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [production]
09:23 <vgutierrez> upgrade ATS to 8.0.6-1wm3 on upload@eqsin - T170567 [production]
09:14 <marostegui@cumin1001> dbctl commit (dc=all): 'Slowly repool db1137', diff saved to https://phabricator.wikimedia.org/P10759 and previous config saved to /var/cache/conftool/dbconfig/20200325-091421-marostegui.json [production]
09:02 <marostegui@cumin1001> dbctl commit (dc=all): 'Slowly repool db1137', diff saved to https://phabricator.wikimedia.org/P10758 and previous config saved to /var/cache/conftool/dbconfig/20200325-090227-marostegui.json [production]
08:55 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
08:53 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime [production]
08:38 <marostegui> Reimage db1137 [production]
08:18 <marostegui> Reboot db1117 for full-upgrade [production]
08:15 <oblivian@deploy1001> helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' . [production]
08:15 <oblivian@deploy1001> helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'production' . [production]
08:14 <_joe_> upgrading all eventgate-main to envoy 1.13.1 T246868 [production]
08:12 <marostegui> Stop all mysql daemons on db1117 [production]
07:50 <oblivian@deploy1001> helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' . [production]
07:50 <oblivian@deploy1001> helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'production' . [production]
07:42 <XioNoX> reboot scs-eqsin for CPU usage [production]
07:20 <jmm@cumin2001> START - Cookbook sre.ganeti.makevm [production]
07:09 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1137 for upgrade', diff saved to https://phabricator.wikimedia.org/P10757 and previous config saved to /var/cache/conftool/dbconfig/20200325-070946-marostegui.json [production]
06:57 <marostegui> Deploy schema change on db2129 (s6 codfw master) [production]
06:15 <marostegui> Rename tables on db1133 (m5 master) nova_api database - T248313 [production]
06:13 <marostegui> Remove grants 'nova'@'208.80.154.23' on nova.* - T248313 [production]
2020-03-24 §
20:53 <cdanis> repool eqsin [production]
20:52 <jforrester@deploy1001> Synchronized wmf-config/CommonSettings.php: Don't hard-set wgTmhUseBetaFeatures to true, let it vary by wiki (duration: 01m 07s) [production]
20:50 <jforrester@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 07s) [production]
20:49 <jforrester@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Set wgTmhUseBetaFeatures to vary by wiki (duration: 01m 06s) [production]
20:35 <twentyafterfour@deploy1001> rebuilt and synchronized wikiversions files: Attempt #2: group0 wikis to 1.35.0-wmf.25 refs T233873 [production]
20:32 <twentyafterfour@deploy1001> Synchronized wmf-config: Now touch and sync again because of settings cache rache condition. refs T248409 (duration: 00m 59s) [production]
20:31 <cdanis> rebooting cr2-eqsin T248394 [production]
20:30 <twentyafterfour@deploy1001> Synchronized wmf-config: Now sync InitializeSettings* refs T248409 (duration: 00m 59s) [production]
20:28 <twentyafterfour@deploy1001> Synchronized wmf-config/CommonSettings.php: sync CommonSettings before InitialiseSettings refs T248409 (duration: 00m 58s) [production]
20:27 <volans> force rebooting analytics1044 from console, host down and unreachable (ping, ssh, console) [production]
20:26 <cdanis> commit flow-table-size on cr2-eqsin T248394 [production]
20:19 <cdanis> eqsin depooled for router maintenance at 16:15 [production]
19:29 <twentyafterfour@deploy1001> scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details) [production]
19:29 <twentyafterfour> rolling back to wmf.24 due to high error rate refs T233873 [production]