production SAL

51-100 of 10000 results (29ms)

2021-08-12 §
16:15	<mbsantos@deploy1002>	Started deploy [tilerator/deploy@b88cf50]: maps2009:	[production]
16:14	<elukey@deploy1002>	helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.	[production]
16:14	<elukey@deploy1002>	helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.	[production]
16:14	<mbsantos@deploy1002>	Finished deploy [tilerator/deploy@b88cf50]: maps2010: (duration: 00m 23s)	[production]
16:14	<mbsantos@deploy1002>	Started deploy [tilerator/deploy@b88cf50]: maps2010:	[production]
16:14	<elukey@deploy1002>	helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.	[production]
16:14	<elukey@deploy1002>	helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.	[production]
16:13	<mbsantos@deploy1002>	Finished deploy [tilerator/deploy@b88cf50]: Deploy tilerator 1.1.7-beta.5 (duration: 02m 30s)	[production]
16:10	<mbsantos@deploy1002>	Started deploy [tilerator/deploy@b88cf50]: Deploy tilerator 1.1.7-beta.5	[production]
15:50	<papaul>	powerdown ms-be2060 for relocation	[production]
15:49	<mutante>	netbox - deleted 2620:0:863:1:198:35:26:6/64 (along with 198.35.26.6) due to the previous error when running makevm cookbook (T288630)	[production]
15:47	<mutante>	netbox - deleted 198.35.26.6 (doh4002)	[production]
15:44	<pt1979@cumin2002>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
15:37	<dzahn@cumin1001>	END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh4002.wikimedia.org	[production]
15:36	<pt1979@cumin2002>	START - Cookbook sre.dns.netbox	[production]
15:35	<dzahn@cumin1001>	START - Cookbook sre.ganeti.makevm for new host doh4002.wikimedia.org	[production]
15:33	<moritzm>	importing openjdk-8 8u302-b08-1+deb11u1 to apt.wikimedia.org/component/jdk8 T287960	[production]
15:10	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts druid1002.eqiad.wmnet	[production]
15:07	<filippo@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE	[production]
15:04	<filippo@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE	[production]
15:00	<btullis@cumin1001>	START - Cookbook sre.hosts.decommission for hosts druid1002.eqiad.wmnet	[production]
14:48	<papaul>	reset to factory ps-test-d8-codfw	[production]
14:35	<filippo@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE	[production]
14:33	<filippo@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE	[production]
14:33	<papaul>	reset to factory ps2-test-d8-codfw	[production]
14:25	<hnowlan>	reenabling puppet on P:cassandra	[production]
13:57	<hnowlan>	disabling puppet on P:cassandra to test removal of cassandra-metrics-agent	[production]
13:50	<effie>	disable puppet on mediawiki hosts to merge 705852	[production]
13:39	<hnowlan@cumin1001>	END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001	[production]
13:31	<btullis@cumin1001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts druid1003.eqiad.wmnet	[production]
13:20	<btullis@cumin1001>	START - Cookbook sre.hosts.decommission for hosts druid1003.eqiad.wmnet	[production]
13:03	<hnowlan@cumin1001>	START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001	[production]
12:43	<godog>	upgrade NIC firmware on thanos-be2* / thanos-fe2* - T286722	[production]
12:28	<hnowlan@cumin1001>	END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001	[production]
12:23	<filippo@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE	[production]
12:18	<filippo@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE	[production]
12:09	<godog>	upgrade NIC firmware on thanos-be1* - T286722	[production]
12:08	<godog>	upgrade NIC firmware on thanos-fe100[34] - T286722	[production]
12:04	<godog>	upgrade NIC firmware on thanos-fe100[12] - T286722	[production]
11:56	<moritzm>	installing openexr security updates	[production]
11:47	<moritzm>	installing bluez security updates on buster	[production]
10:22	<jmm@cumin2002>	END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Holger Knust out of all services on: 1743 hosts	[production]
10:22	<jmm@cumin2002>	START - Cookbook sre.idm.logout Logging Holger Knust out of all services on: 1743 hosts	[production]
10:18	<marostegui@cumin1001>	dbctl commit (dc=all): 'Pool db2107 into API', diff saved to https://phabricator.wikimedia.org/P17016 and previous config saved to /var/cache/conftool/dbconfig/20210812-101840-marostegui.json	[production]
10:18	<mvolz@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .	[production]
10:13	<mvolz@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .	[production]
10:08	<mvolz@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .	[production]
09:49	<hnowlan@cumin1001>	START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001	[production]
09:38	<mwdebug-deploy@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
09:36	<mwdebug-deploy@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]