production SAL

1051-1100 of 10000 results (26ms)

2021-09-17 §
19:00	<hnowlan@cumin1001>	END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)	[production]
17:02	<cmjohnson@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE	[production]
17:02	<hnowlan@cumin1001>	START - Cookbook sre.postgresql.postgres-init	[production]
17:00	<cmjohnson@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE	[production]
16:48	<hnowlan@cumin1001>	END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)	[production]
16:27	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
16:25	<cmjohnson@cumin1001>	START - Cookbook sre.dns.netbox	[production]
16:11	<cmjohnson@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
16:04	<cmjohnson@cumin1001>	START - Cookbook sre.dns.netbox	[production]
14:49	<hnowlan@cumin1001>	START - Cookbook sre.postgresql.postgres-init	[production]
14:29	<hnowlan@cumin1001>	END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)	[production]
13:06	<moritzm>	installing 4.9.272 kernels on stretch hosts (no reboots yet)	[production]
11:28	<hnowlan@cumin1001>	START - Cookbook sre.postgresql.postgres-init	[production]
11:14	<mwdebug-deploy@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
11:09	<mwdebug-deploy@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
09:37	<milimetric@deploy1002>	Finished deploy [analytics/refinery@37e904a] (thin): Only syncing sanitize allowlist, deploying THIN for consistency (duration: 00m 07s)	[production]
09:37	<milimetric@deploy1002>	Started deploy [analytics/refinery@37e904a] (thin): Only syncing sanitize allowlist, deploying THIN for consistency	[production]
09:36	<milimetric@deploy1002>	Finished deploy [analytics/refinery@37e904a]: Only syncing sanitize allowlist (duration: 17m 43s)	[production]
09:19	<milimetric@deploy1002>	Started deploy [analytics/refinery@37e904a]: Only syncing sanitize allowlist	[production]
08:00	<jayme>	restarting php-fpm on wtp1037 and wtp1030	[production]
02:28	<ryankemper>	T290330 [Remove WDQS codfw ~hourly restarts] Successfully rolled out to rest of fleet `sudo cumin 'C:query_service::crontasks' 'sudo run-puppet-agent --force && sudo systemctl reset-failed wdqs-restart-hourly-w-random-delay.timer'`	[production]
02:22	<ryankemper>	T290330 [Remove WDQS codfw ~hourly restarts] `wdqs2001` and `wdqs2004` look fine after running `sudo systemctl reset-failed wdqs-restart-hourly-w-random-delay.timer` to clean up dangling timer	[production]
01:55	<ryankemper>	T290330 [Remove WDQS codfw ~hourly restarts] Testing on arbitrary codfw host: `ryankemper@wdqs2001:~$ sudo run-puppet-agent`	[production]
01:47	<ryankemper>	T290330 [Remove WDQS codfw ~hourly restarts] `sudo cumin 'C:query_service::crontasks' 'sudo disable-puppet "Stop doing wdqs codfw ~hourly restarts - T290330"'`	[production]
00:04	<legoktm@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .	[production]
00:01	<legoktm@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .	[production]
2021-09-16 §
23:58	<legoktm@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .	[production]
23:51	<ryankemper>	T273673 All looks good, re-enabling puppet and running on rest of fleet: `sudo cumin 'R:Class = elasticsearch::log::hot_threads' 'sudo run-puppet-agent --force'`	[production]
23:44	<ryankemper>	T273673 The associated crons are gone and I see the new systemd timers for both gc-cleanup and the hot threads logger	[production]
23:39	<ryankemper>	T273673 Testing elasticsearch cron->systemd timer-job changes on canary instance `ryankemper@elastic1064:~$ sudo run-puppet-agent --force`	[production]
23:37	<ryankemper>	T273673 Disabling puppet on elasticsearch hosts `sudo cumin 'R:Class = elasticsearch::log::hot_threads' 'sudo disable-puppet "https://gerrit.wikimedia.org/r/c/operations/puppet/+/721413 - T273673"'`	[production]
23:21	<legoktm@deploy1002>	helmfile [eqiad] DONE helmfile.d/admin 'apply'.	[production]
23:21	<legoktm@deploy1002>	helmfile [eqiad] START helmfile.d/admin 'apply'.	[production]
23:19	<legoktm@deploy1002>	helmfile [codfw] DONE helmfile.d/admin 'apply'.	[production]
23:18	<legoktm@deploy1002>	helmfile [codfw] START helmfile.d/admin 'apply'.	[production]
23:18	<legoktm@deploy1002>	helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.	[production]
23:17	<legoktm@deploy1002>	helmfile [staging-eqiad] START helmfile.d/admin 'apply'.	[production]
23:17	<legoktm@deploy1002>	helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.	[production]
23:16	<legoktm@deploy1002>	helmfile [staging-codfw] START helmfile.d/admin 'apply'.	[production]
22:45	<mwdebug-deploy@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
22:40	<mwdebug-deploy@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
22:38	<legoktm@deploy1002>	Finished scap: i18n for restoring deprecated token APIs (duration: 15m 30s)	[production]
22:30	<mwdebug-deploy@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
22:25	<mwdebug-deploy@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
22:23	<legoktm@deploy1002>	Started scap: i18n for restoring deprecated token APIs	[production]
22:21	<legoktm@deploy1002>	Synchronized php-1.37.0-wmf.23/includes/api/: Restore deprecated token APIs (3/3) (duration: 00m 56s)	[production]
22:19	<legoktm@deploy1002>	Synchronized php-1.37.0-wmf.23/autoload.php: Restore deprecated token APIs (2/3) (duration: 00m 56s)	[production]
22:16	<legoktm@deploy1002>	Synchronized php-1.37.0-wmf.23/includes/api/ApiTokens.php: Restore deprecated token APIs (1/3) (duration: 00m 56s)	[production]
21:22	<robh@cumin1001>	END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti4004.ulsfo.wmnet with reason: REIMAGE	[production]
21:19	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4004.ulsfo.wmnet with reason: REIMAGE	[production]