production SAL

4001-4050 of 10000 results (74ms)

2019-09-10 §
15:37	<marostegui>	Start pre-switchover for m1 steps T231403	[production]
15:35	<hashar@deploy1001>	Synchronized php-1.34.0-wmf.22/includes/libs/http/MultiHttpClient.php: Revert "Improve MultiHttpClient connection concurrency and reuse" - T232487 (duration: 00m 55s)	[production]
15:33	<reedy@deploy1001>	Synchronized php-1.34.0-wmf.22/includes/libs/http/MultiHttpClient.php: T232487 (duration: 00m 55s)	[production]
15:13	<hashar@deploy1001>	rebuilt and synchronized wikiversions files: Revert group0 to 1.34.0-wmf.22 # T220747	[production]
14:48	<hashar@deploy1001>	scap failed: average error rate on 3/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)	[production]
14:45	<akosiaris>	repool cp1075 ats-be, releases cert updated	[production]
14:44	<akosiaris@puppetmaster1001>	conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,dc=eqiad,cluster=cache_text,service=ats-be	[production]
14:44	<XioNoX>	depool ulsfo for DC UPS power maintenance (see maint-announce)	[production]
14:36	<@>	helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .	[production]
14:32	<hashar@deploy1001>	Finished scap: testwiki to php-1.34.0-wmf.22 and rebuild l10n cache # T220747 (duration: 34m 03s)	[production]
14:31	<@>	helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .	[production]
14:29	<@>	helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .	[production]
14:26	<@>	helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .	[production]
14:20	<@>	helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .	[production]
14:18	<@>	helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .	[production]
14:18	<ottomata>	increasing max_body_size to 10mb for all eventgate services - T232362	[production]
14:14	<akosiaris>	depool cp1075 ats-be to test helmfile sync	[production]
14:14	<akosiaris@puppetmaster1001>	conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,dc=eqiad,cluster=cache_text,service=ats-be	[production]
13:58	<hashar@deploy1001>	Started scap: testwiki to php-1.34.0-wmf.22 and rebuild l10n cache # T220747	[production]
13:56	<hashar>	Applied security patches to 1.34.0-wmf.22 # T220747	[production]
13:53	<hashar>	scap prep 1.34.0-wmf.22 # T220747	[production]
13:34	<elukey>	reboot stat1005 to clear incosistent process state after tensorflow tests	[production]
13:23	<hashar>	./make-wmf-branch -n 1.34.0-wmf.22 -o master -c extensions/CharInsert # T220747	[production]
13:12	<thcipriani>	restarting gerrit	[production]
13:11	<hashar>	Gerrit experimenting difficulty due to ongoing wmf branch cut - T231872	[production]
13:01	<moritzm>	copied prometheus-jmx-exporter to buster-wikimedia (from stretch-wikimedia, just a package with some jars)	[production]
12:40	<cmjohnson1>	the new pdus are racked in b6	[production]
12:14	<cmjohnson1>	removing power from ps1-b6 side B...mgmt should not be affected	[production]
11:20	<cmjohnson1>	swapping the PDU in rack B6 eqiad T227541	[production]
11:09	<Urbanecm>	EU SWAT done	[production]
11:08	<urbanecm@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: SWAT: c780fa4: Bump MobileWebUIActionsTracking sampling rate to 10 percent (T220016) (duration: 00m 55s)	[production]
11:07	<ema@puppetmaster1001>	conftool action : set/weight=100; selector: service=ats-be,dc=eqiad,name=cp1075.eqiad.wmnet	[production]
11:06	<ema>	cp1075: set weight in etcd back to 100	[production]
11:06	<urbanecm@deploy1001>	Synchronized wmf-config/InitialiseSettings.php: SWAT: 6afe963: Set items term store on write both for all of Wikidata (T225055) (duration: 00m 55s)	[production]
10:51	<akosiaris@>	helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .	[production]
10:45	<akosiaris@>	helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .	[production]
10:45	<akosiaris@>	helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .	[production]
10:34	<akosiaris@>	helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .	[production]
10:34	<akosiaris@>	helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .	[production]
10:34	<akosiaris@>	helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .	[production]
10:34	<@>	helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .	[production]
10:34	<@>	helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .	[production]
10:34	<@>	helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .	[production]
10:32	<vgutierrez>	repool cp5001 with ats-tls collecting memory usage details every hour - T232298	[production]
09:56	<elukey>	restart archiva on archiva1001 - UI not working (probably due to connections to maven central being stuck)	[production]
09:50	<moritzm>	installing ghostscript security updates on jessie	[production]
09:37	<moritzm>	added jbond as chanserv ops for #wikimedia-operations	[production]
08:08	<jmm@cumin2001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)	[production]
08:06	<jmm@cumin2001>	START - Cookbook sre.hosts.downtime	[production]
07:42	<moritzm>	reimaging mw2231 after hardware maintenance T231192	[production]