production SAL

5901-5950 of 10000 results (118ms)

2020-03-25 §
09:02	<marostegui@cumin1001>	dbctl commit (dc=all): 'Slowly repool db1137', diff saved to https://phabricator.wikimedia.org/P10758 and previous config saved to /var/cache/conftool/dbconfig/20200325-090227-marostegui.json	[production]
08:55	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
08:53	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
08:38	<marostegui>	Reimage db1137	[production]
08:18	<marostegui>	Reboot db1117 for full-upgrade	[production]
08:15	<oblivian@deploy1001>	helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .	[production]
08:15	<oblivian@deploy1001>	helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .	[production]
08:14	<_joe_>	upgrading all eventgate-main to envoy 1.13.1 T246868	[production]
08:12	<marostegui>	Stop all mysql daemons on db1117	[production]
07:50	<oblivian@deploy1001>	helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .	[production]
07:50	<oblivian@deploy1001>	helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .	[production]
07:42	<XioNoX>	reboot scs-eqsin for CPU usage	[production]
07:20	<jmm@cumin2001>	START - Cookbook sre.ganeti.makevm	[production]
07:09	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1137 for upgrade', diff saved to https://phabricator.wikimedia.org/P10757 and previous config saved to /var/cache/conftool/dbconfig/20200325-070946-marostegui.json	[production]
06:57	<marostegui>	Deploy schema change on db2129 (s6 codfw master)	[production]
06:15	<marostegui>	Rename tables on db1133 (m5 master) nova_api database - T248313	[production]
06:13	<marostegui>	Remove grants 'nova'@'208.80.154.23' on nova.* - T248313	[production]
2020-03-24 §
20:53	<cdanis>	repool eqsin	[production]
20:52	<jforrester@deploy1001>	Synchronized wmf-config/CommonSettings.php: Don't hard-set wgTmhUseBetaFeatures to true, let it vary by wiki (duration: 01m 07s)	[production]
20:50	<jforrester@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 07s)	[production]
20:49	<jforrester@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: Set wgTmhUseBetaFeatures to vary by wiki (duration: 01m 06s)	[production]
20:35	<twentyafterfour@deploy1001>	rebuilt and synchronized wikiversions files: Attempt #2: group0 wikis to 1.35.0-wmf.25 refs T233873	[production]
20:32	<twentyafterfour@deploy1001>	Synchronized wmf-config: Now touch and sync again because of settings cache rache condition. refs T248409 (duration: 00m 59s)	[production]
20:31	<cdanis>	rebooting cr2-eqsin T248394	[production]
20:30	<twentyafterfour@deploy1001>	Synchronized wmf-config: Now sync InitializeSettings* refs T248409 (duration: 00m 59s)	[production]
20:28	<twentyafterfour@deploy1001>	Synchronized wmf-config/CommonSettings.php: sync CommonSettings before InitialiseSettings refs T248409 (duration: 00m 58s)	[production]
20:27	<volans>	force rebooting analytics1044 from console, host down and unreachable (ping, ssh, console)	[production]
20:26	<cdanis>	commit flow-table-size on cr2-eqsin T248394	[production]
20:19	<cdanis>	eqsin depooled for router maintenance at 16:15	[production]
19:29	<twentyafterfour@deploy1001>	scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)	[production]
19:29	<twentyafterfour>	rolling back to wmf.24 due to high error rate refs T233873	[production]
19:28	<twentyafterfour@deploy1001>	scap failed: average error rate on 7/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)	[production]
18:49	<gehel>	repooling wdqs1006, catched up on lag	[production]
17:12	<hashar@deploy1001>	Finished scap: testwiki to 1.35.0-wmf.25 and rebuild l10n cache # T233873 (duration: 77m 52s)	[production]
17:10	<ebernhardson>	update cloudelastic-chi replica counts from 2 to 1 T231517	[production]
16:41	<moritzm>	installing linux-perf updates on stretch	[production]
16:31	<moritzm>	installing linux-perf-4.19 updates on buster	[production]
15:58	<mutante>	installing OS on otrs1001.eqiad.wmnet (T248028)	[production]
15:54	<hashar@deploy1001>	Started scap: testwiki to 1.35.0-wmf.25 and rebuild l10n cache # T233873	[production]
15:35	<hnowlan@deploy1001>	helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .	[production]
15:31	<hashar@deploy1001>	Pruned MediaWiki: 1.35.0-wmf.22 (duration: 02m 02s)	[production]
15:29	<hashar@deploy1001>	Pruned MediaWiki: 1.35.0-wmf.21 (duration: 24m 00s)	[production]
15:17	<hashar>	Cleaning old MediaWiki deployments # T233873	[production]
15:03	<hashar>	Applied patches to 1.35.0-wmf.25 # T233873	[production]
14:59	<hashar>	scap prep 1.35.0-wmf.25 # T233873	[production]
14:55	<gehel>	depooling wdqs1006 to catch up on lag	[production]
14:28	<marostegui>	Deploy schema change on db2117 (s6)	[production]
14:26	<hashar>	Branching wmf/1.35.0-wmf.25 # T233873	[production]
13:22	<moritzm>	installing glib2.0 updates from Stretch point release	[production]
13:04	<moritzm>	installing maridb-10.1 updates from Stretch point release (client/tools/libraries as packaged by Debian, different from wmf-mariadb)	[production]