production SAL

251-300 of 10000 results (59ms)

2020-03-25 §
11:21	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
11:20	<dzahn@cumin1001>	conftool action : set/pooled=no; selector: name=mw123[2-5].eqiad.wmnet	[production]
11:20	<dzahn@cumin1001>	conftool action : set/pooled=no; selector: name=mw125[0-3].eqiad.wmnet	[production]
11:19	<urbanecm@deploy1001>	Synchronized wmf-config/CommonSettings.php: SWAT: 59412db: Add gwtoolset to available rights to allow granting to global groups (duration: 01m 07s)	[production]
11:12	<urbanecm@deploy1001>	Synchronized wmf-config/CommonSettings.php: SWAT: 7b8d7c5: TwoColConflict: Limited default deployment CommonSettings.php (T244863) (duration: 01m 06s)	[production]
11:10	<urbanecm@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: SWAT: 81cda0f: TwoColConflict: Limited default deployment InitialiseSettings.php (T244863; take II) (duration: 01m 06s)	[production]
11:08	<urbanecm@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: SWAT: 81cda0f: TwoColConflict: Limited default deployment InitialiseSettings.php (T244863) (duration: 01m 17s)	[production]
11:08	<jynus@cumin1001>	dbctl commit (dc=all): 'Reduce db1091 load, increase main traffic on all other s4 instances', diff saved to https://phabricator.wikimedia.org/P10762 and previous config saved to /var/cache/conftool/dbconfig/20200325-110821-jynus.json	[production]
10:55	<marostegui@cumin1001>	dbctl commit (dc=all): 'Fully repool db1137', diff saved to https://phabricator.wikimedia.org/P10761 and previous config saved to /var/cache/conftool/dbconfig/20200325-105503-marostegui.json	[production]
10:39	<marostegui@cumin1001>	dbctl commit (dc=all): 'Slowly repool db1137', diff saved to https://phabricator.wikimedia.org/P10760 and previous config saved to /var/cache/conftool/dbconfig/20200325-103938-marostegui.json	[production]
10:37	<XioNoX>	change aggregate policy for 2620:0:862::/48 on cr3-knams - T236785	[production]
10:19	<XioNoX>	change aggregate policy for v4 prefixes on cr2-eqdfw - T236785	[production]
10:04	<oblivian@deploy1001>	helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .	[production]
10:04	<oblivian@deploy1001>	helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .	[production]
09:56	<XioNoX>	change aggregate policy for 2620:0:860::/46 on cr2-eqdfw - T236785	[production]
09:54	<vgutierrez>	Enable inbound TLSv1.3 on upload@eqsin - T170567	[production]
09:27	<jmm@cumin2001>	END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)	[production]
09:23	<vgutierrez>	upgrade ATS to 8.0.6-1wm3 on upload@eqsin - T170567	[production]
09:14	<marostegui@cumin1001>	dbctl commit (dc=all): 'Slowly repool db1137', diff saved to https://phabricator.wikimedia.org/P10759 and previous config saved to /var/cache/conftool/dbconfig/20200325-091421-marostegui.json	[production]
09:02	<marostegui@cumin1001>	dbctl commit (dc=all): 'Slowly repool db1137', diff saved to https://phabricator.wikimedia.org/P10758 and previous config saved to /var/cache/conftool/dbconfig/20200325-090227-marostegui.json	[production]
08:55	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
08:53	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
08:38	<marostegui>	Reimage db1137	[production]
08:18	<marostegui>	Reboot db1117 for full-upgrade	[production]
08:15	<oblivian@deploy1001>	helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .	[production]
08:15	<oblivian@deploy1001>	helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .	[production]
08:14	<_joe_>	upgrading all eventgate-main to envoy 1.13.1 T246868	[production]
08:12	<marostegui>	Stop all mysql daemons on db1117	[production]
07:50	<oblivian@deploy1001>	helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .	[production]
07:50	<oblivian@deploy1001>	helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .	[production]
07:42	<XioNoX>	reboot scs-eqsin for CPU usage	[production]
07:20	<jmm@cumin2001>	START - Cookbook sre.ganeti.makevm	[production]
07:09	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1137 for upgrade', diff saved to https://phabricator.wikimedia.org/P10757 and previous config saved to /var/cache/conftool/dbconfig/20200325-070946-marostegui.json	[production]
06:57	<marostegui>	Deploy schema change on db2129 (s6 codfw master)	[production]
06:15	<marostegui>	Rename tables on db1133 (m5 master) nova_api database - T248313	[production]
06:13	<marostegui>	Remove grants 'nova'@'208.80.154.23' on nova.* - T248313	[production]
2020-03-24 §
20:53	<cdanis>	repool eqsin	[production]
20:52	<jforrester@deploy1001>	Synchronized wmf-config/CommonSettings.php: Don't hard-set wgTmhUseBetaFeatures to true, let it vary by wiki (duration: 01m 07s)	[production]
20:50	<jforrester@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 07s)	[production]
20:49	<jforrester@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: Set wgTmhUseBetaFeatures to vary by wiki (duration: 01m 06s)	[production]
20:35	<twentyafterfour@deploy1001>	rebuilt and synchronized wikiversions files: Attempt #2: group0 wikis to 1.35.0-wmf.25 refs T233873	[production]
20:32	<twentyafterfour@deploy1001>	Synchronized wmf-config: Now touch and sync again because of settings cache rache condition. refs T248409 (duration: 00m 59s)	[production]
20:31	<cdanis>	rebooting cr2-eqsin T248394	[production]
20:30	<twentyafterfour@deploy1001>	Synchronized wmf-config: Now sync InitializeSettings* refs T248409 (duration: 00m 59s)	[production]
20:28	<twentyafterfour@deploy1001>	Synchronized wmf-config/CommonSettings.php: sync CommonSettings before InitialiseSettings refs T248409 (duration: 00m 58s)	[production]
20:27	<volans>	force rebooting analytics1044 from console, host down and unreachable (ping, ssh, console)	[production]
20:26	<cdanis>	commit flow-table-size on cr2-eqsin T248394	[production]
20:19	<cdanis>	eqsin depooled for router maintenance at 16:15	[production]
19:29	<twentyafterfour@deploy1001>	scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)	[production]
19:29	<twentyafterfour>	rolling back to wmf.24 due to high error rate refs T233873	[production]