production SAL

251-300 of 10000 results (21ms)

2020-06-04 §
08:59	<akosiaris@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
08:58	<akosiaris@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
08:50	<marostegui>	Repool labsdb1009 after running maintain-views T252219	[production]
08:42	<moritzm>	restarting archiva to pick up Java security updates	[production]
08:15	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1107 to clone db1091 on s1 T253217', diff saved to https://phabricator.wikimedia.org/P11392 and previous config saved to /var/cache/conftool/dbconfig/20200604-081545-marostegui.json	[production]
08:14	<marostegui>	Run sudo /usr/local/sbin/maintain-views --all-databases --replace-all on labsdb1009 - T252219	[production]
07:49	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
07:45	<marostegui>	Depool labsdb1009 - T252219	[production]
07:45	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
07:33	<oblivian@puppetmaster1001>	conftool action : set/weight=10; selector: dc=eqiad,cluster=labweb,service=labweb-ssl	[production]
07:32	<oblivian@puppetmaster1001>	conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=cloudceph,service=cloudceph	[production]
06:52	<mutante>	mwmaint1002 started mediawiki_job_cirrus_build_completion_indices_eqiad.service	[production]
06:06	<oblivian@puppetmaster1001>	conftool action : set/weight=10; selector: name=logstash200.*	[production]
06:05	<oblivian@puppetmaster1001>	conftool action : set/weight=10; selector: name=logstash100.*	[production]
06:04	<oblivian@puppetmaster1001>	conftool action : set/weight=10; selector: cluster=eventschemas,service=eventschemas	[production]
06:02	<oblivian@puppetmaster1001>	conftool action : set/weight=10; selector: dc=codfw,cluster=elasticsearch,service=elasticsearch.*	[production]
06:01	<oblivian@puppetmaster1001>	conftool action : set/weight=10; selector: dc=codfw,cluster=elasticsearch,service=elasticsearch	[production]
05:59	<_joe_>	fixing weights of cp2040 T245594	[production]
05:31	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
05:28	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime	[production]
00:36	<reedy@deploy1001>	Synchronized php-1.35.0-wmf.35/includes/specials/SpecialUserrights.php: T254417 T251534 (duration: 01m 06s)	[production]
2020-06-03 §
23:08	<reedy@deploy1001>	Synchronized wmf-config/CommonSettings-labs.php: T249834 (duration: 01m 06s)	[production]
23:06	<reedy@deploy1001>	Synchronized wmf-config/InitialiseSettings-labs.php: T249834 (duration: 01m 06s)	[production]
22:22	<ryankemper@cumin2001>	END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)	[production]
21:54	<jforrester@deploy1001>	rebuilt and synchronized wikiversions files: Re-rolling group1 to 1.35.0-wmf.35 for T253023	[production]
21:49	<jforrester@deploy1001>	Synchronized php-1.35.0-wmf.35/extensions/EventStreamConfig/includes/ApiStreamConfigs.php: T254390 ApiStreamConfigs: If the 'constraints' parameter is unset, don't explode (duration: 01m 06s)	[production]
21:43	<cstone>	civicrm revision changed from 63508b01b9 to 11b0e7c7e5	[production]
21:16	<ryankemper@cumin2001>	START - Cookbook sre.elasticsearch.rolling-upgrade	[production]
21:15	<ryankemper>	The previously ran `_cluster/reroute?retry_failed=true` command worked as intended, the two shards in question have recovered and we're back to green cluster status. We're now in a known state and ready to proceed with the eqiad rolling upgrade	[production]
21:13	<ryankemper>	Ran `curl -X POST "https://localhost:9243/_cluster/reroute?pretty&retry_failed=true&explain=true" -H 'Content-Type: application/json' -d '{}' --insecure` via the ssh tunnel `ssh bast4002.wikimedia.org -L 9243:search.svc.eqiad.wmnet:9243 -L 9443:search.svc.eqiad.wmnet:9443 -L 9643:search.svc.eqiad.wmnet:9643`, two unassigned shards are now initializing	[production]
21:05	<ryankemper>	Elasticsearch Eqiad was in yellow cluster status before starting the above cookbook run (therefore the run was a no-op until I ctlr+C'd), going to try unsticking the two unassigned shards via `/_cluster/reroute?retry_failed=true`	[production]
21:03	<ryankemper@cumin2001>	END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)	[production]
20:58	<ryankemper@cumin2001>	START - Cookbook sre.elasticsearch.rolling-upgrade	[production]
20:52	<ryankemper@cumin2001>	END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)	[production]
20:49	<eileen>	civicrm revision changed from eb156dffa4 to 63508b01b9, config revision is 95dcdb0a8a	[production]
20:47	<ryankemper@cumin2001>	START - Cookbook sre.elasticsearch.rolling-upgrade	[production]
20:19	<gehel>	elasticsearch cluster restart stopped	[production]
20:18	<ryankemper@cumin2001>	END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)	[production]
19:35	<ppchelko@deploy1001>	helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .	[production]
19:35	<ppchelko@deploy1001>	helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .	[production]
19:33	<ppchelko@deploy1001>	helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .	[production]
19:32	<ppchelko@deploy1001>	helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .	[production]
19:30	<ryankemper@cumin2001>	START - Cookbook sre.elasticsearch.rolling-upgrade	[production]
19:29	<ppchelko@deploy1001>	helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .	[production]
19:29	<ppchelko@deploy1001>	helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .	[production]
19:20	<jforrester@deploy1001>	rebuilt and synchronized wikiversions files: Revert group1 wikis to wmf.34 T253023	[production]
19:16	<hnowlan@deploy1001>	helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .	[production]
19:15	<hnowlan@deploy1001>	helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .	[production]
19:14	<jforrester@deploy1001>	Synchronized php: group1 wikis to 1.35.0-wmf.35 (duration: 01m 05s)	[production]
19:13	<jforrester@deploy1001>	rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.35	[production]