production SAL

701-750 of 10000 results (38ms)

2021-03-10 §
04:58	<ryankemper>	T266470 The above two actions mean that we're ready to generate the new certificate files. Proceeding: `sudo cergen -c 'wdqs.*' --generate --base-path /srv/private/modules/secret/secrets/certificates /srv/private/modules/secret/secrets/certificates/certificate.manifests.d` on `ryankemper@puppetmaster1001:/srv/private`	[production]
04:57	<ryankemper>	T266470 `sudo rm -fv certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.crt.pem certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.csr.pem certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.jks certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.p12 certificates/wdqs.discovery.wmnet/truststore.jks` (full paths not provided to fit the IRC line)	[production]
04:56	<ryankemper>	T266470 In the `/srv/private` repo, `/srv/private/modules/secret/secrets/certificates/certificate.manifests.d/wdqs.certs.yaml` has been edited to add the relevant `alt_names`	[production]
04:55	<ryankemper>	T266470 Certificate revoked: `ryankemper@puppetmaster1001:/srv/private$ sudo puppet cert clean wdqs.discovery.wmnet`	[production]
04:53	<ryankemper>	T266470 `ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo disable-puppet "revoking old cert and generating new one with new alt_names - T266470"'`	[production]
04:53	<ryankemper>	T266470 ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo disable-puppet "revoking old cert and generating new one with new alt_names - T266470"'	[production]
04:52	<ryankemper>	T266470 Temporarily disabling puppet on all `wdqs*` hosts in preparation for `wdqs.discovery.wmnet` certificate revocation	[production]
01:08	<krinkle@deploy1002>	Synchronized php-1.36.0-wmf.34/extensions/NavigationTiming/modules/ext.navigationTiming.js: T276826 Ibd9ddf14d64 (duration: 01m 14s)	[production]
00:02	<robh@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-backup1002.eqiad.wmnet with reason: REIMAGE	[production]
00:00	<robh@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-backup1001.eqiad.wmnet with reason: REIMAGE	[production]
2021-03-09 §
23:59	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup1002.eqiad.wmnet with reason: REIMAGE	[production]
23:58	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup1001.eqiad.wmnet with reason: REIMAGE	[production]
22:04	<mutante>	phab1001 - manually running phab public task dumd script after making changes to redirect stdout	[production]
20:42	<elukey>	reimaged an-worker1091 to buster	[production]
20:41	<bstorm>	depooled labsdb1009 T276980	[production]
20:25	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1091.eqiad.wmnet with reason: REIMAGE	[production]
20:25	<bstorm>	downtimed labsdb1009 so it doesn't keep paging T276980	[production]
20:23	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1091.eqiad.wmnet with reason: REIMAGE	[production]
20:09	<brennen>	train status: 1.36.0-wmf.32 (T274938) on group0 at 20:06:32 UTC; logs initially quiet.	[production]
20:06	<brennen@deploy1002>	rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.34	[production]
19:05	<brennen@deploy1002>	Pruned MediaWiki: 1.36.0-wmf.31 (duration: 03m 34s)	[production]
19:04	<pt1979@cumin2001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
18:59	<pt1979@cumin2001>	START - Cookbook sre.dns.netbox	[production]
18:54	<brennen@deploy1002>	Finished scap: testwikis wikis to 1.36.0-wmf.34 (duration: 47m 25s)	[production]
18:52	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1087.eqiad.wmnet with reason: REIMAGE	[production]
18:49	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1087.eqiad.wmnet with reason: REIMAGE	[production]
18:47	<dcausse>	re-pool wdqs1004	[production]
18:37	<mbsantos@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .	[production]
18:35	<mbsantos@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .	[production]
18:34	<pt1979@cumin2001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
18:29	<pt1979@cumin2001>	START - Cookbook sre.dns.netbox	[production]
18:26	<elukey>	reimage an-worker1087 to buster	[production]
18:16	<mbsantos@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .	[production]
18:13	<mbsantos@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .	[production]
18:12	<brennen@deploy1002>	Started scap: testwikis wikis to 1.36.0-wmf.34	[production]
18:10	<mbsantos@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .	[production]
18:05	<mbsantos@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .	[production]
18:03	<mbsantos@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .	[production]
18:02	<marxarelli>	deleting shut down memc* deployment-prep instances to free up quota for replacement db instances (T276968)	[production]
18:02	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1085.eqiad.wmnet with reason: REIMAGE	[production]
18:00	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1085.eqiad.wmnet with reason: REIMAGE	[production]
17:50	<papaul>	rebooting db2073 for firmware upgrade	[production]
17:01	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1077.eqiad.wmnet with reason: REIMAGE	[production]
17:00	<urbanecm@deploy1002>	Synchronized wmf-config/InitialiseSettings.php: 3119d7a703a38b328fa634db64b2929d54829884: sqwiki: Fix deployment of Growth features (duration: 01m 00s)	[production]
16:59	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1077.eqiad.wmnet with reason: REIMAGE	[production]
16:46	<pt1979@cumin2001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
16:41	<pt1979@cumin2001>	START - Cookbook sre.dns.netbox	[production]
16:40	<elukey>	reimage analytics1077 to buster	[production]
16:33	<aborrero@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1027.eqiad.wmnet	[production]
16:32	<jayme@deploy1002>	helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.	[production]