production SAL

5151-5200 of 10000 results (66ms)

2022-12-13 §
15:19	<hnowlan@deploy1002>	helmfile [staging] DONE helmfile.d/services/api-gateway: sync	[production]
15:19	<hnowlan@deploy1002>	helmfile [staging] START helmfile.d/services/api-gateway: sync	[production]
15:18	<derick@deploy1002>	Started scap: Backport for [[gerrit:867237\|RangeChronologicalPager: Restore the compatibility with derived classes (T228431 T325034)]]	[production]
15:02	<derick@deploy1002>	Finished scap: Backport for [[gerrit:867274\|Log linter data while parsing full pages (T246403)]] (duration: 10m 28s)	[production]
14:53	<derick@deploy1002>	derick and arlolra: Backport for [[gerrit:867274\|Log linter data while parsing full pages (T246403)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet	[production]
14:52	<derick@deploy1002>	Started scap: Backport for [[gerrit:867274\|Log linter data while parsing full pages (T246403)]]	[production]
14:50	<derick@deploy1002>	backport aborted: (duration: 07m 09s)	[production]
14:41	<derick@deploy1002>	Finished scap: Backport for [[gerrit:866627\|hewiki: set VisualEditor to direct mode (T320529)]] (duration: 14m 34s)	[production]
14:32	<btullis@cumin1001>	START - Cookbook sre.hosts.reimage for host kafka-stretch2002.codfw.wmnet with OS bullseye	[production]
14:31	<bking@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on wcqs1003.eqiad.wmnet with reason: hardware diagnostics	[production]
14:31	<bking@cumin2002>	START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on wcqs1003.eqiad.wmnet with reason: hardware diagnostics	[production]
14:28	<derick@deploy1002>	derick and daniel: Backport for [[gerrit:866627\|hewiki: set VisualEditor to direct mode (T320529)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet	[production]
14:26	<derick@deploy1002>	Started scap: Backport for [[gerrit:866627\|hewiki: set VisualEditor to direct mode (T320529)]]	[production]
14:22	<moritzm>	added smunene to pwstore	[production]
14:17	<derick@deploy1002>	Backport cancelled.	[production]
14:04	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-stretch2001.codfw.wmnet with reason: Accessing BIOS on kafka-stretch2001	[production]
14:03	<btullis@cumin1001>	START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-stretch2001.codfw.wmnet with reason: Accessing BIOS on kafka-stretch2001	[production]
13:59	<jayme@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply	[production]
13:59	<jayme@deploy1002>	helmfile [eqiad] START helmfile.d/services/sessionstore: apply	[production]
13:57	<jayme@deploy1002>	helmfile [codfw] DONE helmfile.d/services/sessionstore: apply	[production]
13:57	<jayme@deploy1002>	helmfile [codfw] START helmfile.d/services/sessionstore: apply	[production]
13:57	<jayme@deploy1002>	helmfile [staging] DONE helmfile.d/services/sessionstore: apply	[production]
13:49	<jayme@deploy1002>	helmfile [staging] START helmfile.d/services/sessionstore: apply	[production]
13:49	<jayme@deploy1002>	helmfile [codfw] DONE helmfile.d/services/sessionstore: apply	[production]
13:48	<jayme@deploy1002>	helmfile [codfw] START helmfile.d/services/sessionstore: apply	[production]
13:28	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-stretch2002.codfw.wmnet with reason: Accessing BIOS on kafka-stretch2002	[production]
13:28	<btullis@cumin1001>	START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-stretch2002.codfw.wmnet with reason: Accessing BIOS on kafka-stretch2002	[production]
12:31	<claime>	sessionstore outage being monitored	[production]
12:23	<claime>	sessionstore outage, login functions severely impacted	[production]
12:07	<hashar>	Gerrit now has CI job results represented in the Checks tab which should be a little nicer. The old HTML result table is gone and replaced by little bubbles representing the state of the builds for the latest patchset. Ref: https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/3ULF5NPVC4MSVABZBSXAMDODLZUKFXHS/	[production]
12:00	<jynus@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: restart	[production]
12:00	<jynus@cumin1001>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: restart	[production]
11:57	<hashar>	Restarted Gerrit on gerrit1001	[production]
11:55	<hashar@deploy1002>	Finished deploy [gerrit/gerrit@9ef1a16]: Replace CI result table by Checks API plugin - T214068 (duration: 00m 09s)	[production]
11:55	<hashar@deploy1002>	Started deploy [gerrit/gerrit@9ef1a16]: Replace CI result table by Checks API plugin - T214068	[production]
11:54	<hashar>	Restarted Gerrit on gerrit2002 (replica)	[production]
11:52	<hashar@deploy1002>	Finished deploy [gerrit/gerrit@9ef1a16]: Replace CI result table by Checks API plugin - T214068 (duration: 00m 11s)	[production]
11:52	<hashar@deploy1002>	Started deploy [gerrit/gerrit@9ef1a16]: Replace CI result table by Checks API plugin - T214068	[production]
11:42	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on idp-test1002.wikimedia.org with reason: Various tests which may cause temporary breakage on idp-test.w.o	[production]
11:42	<jmm@cumin2002>	START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on idp-test1002.wikimedia.org with reason: Various tests which may cause temporary breakage on idp-test.w.o	[production]
11:22	<moritzm>	installing paramiko security updates#	[production]
11:17	<claime>	Puppet re-enabled on cp::text nodes - T290536	[production]
10:58	<jgiannelos@deploy1002>	Finished deploy [kartotherian/deploy@27ac6d3] (codfw): Increase codfw mirrored traffic to 100% (duration: 01m 40s)	[production]
10:57	<jgiannelos@deploy1002>	Started deploy [kartotherian/deploy@27ac6d3] (codfw): Increase codfw mirrored traffic to 100%	[production]
10:54	<dcausse@deploy1002>	Finished deploy [wikimedia/discovery/analytics@e988b5e]: Relax sla for the weekly es transfer and subgraph_and_query_metrics (duration: 02m 25s)	[production]
10:51	<dcausse@deploy1002>	Started deploy [wikimedia/discovery/analytics@e988b5e]: Relax sla for the weekly es transfer and subgraph_and_query_metrics	[production]
10:36	<vgutierrez>	clean up stale prometheus target files in prometheus5001	[production]
10:22	<claime>	puppet run on cp4037 - T290536	[production]
10:21	<claime>	puppet disabled on cp hosts for T290536	[production]
10:01	<oblivian@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply	[production]