1051-1100 of 10000 results (33ms)
2021-09-17 §
19:00 <hnowlan@cumin1001> END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0) [production]
17:02 <cmjohnson@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE [production]
17:02 <hnowlan@cumin1001> START - Cookbook sre.postgresql.postgres-init [production]
17:00 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE [production]
16:48 <hnowlan@cumin1001> END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0) [production]
16:27 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
16:25 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
16:11 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
16:04 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
14:49 <hnowlan@cumin1001> START - Cookbook sre.postgresql.postgres-init [production]
14:29 <hnowlan@cumin1001> END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0) [production]
13:06 <moritzm> installing 4.9.272 kernels on stretch hosts (no reboots yet) [production]
11:28 <hnowlan@cumin1001> START - Cookbook sre.postgresql.postgres-init [production]
11:14 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
11:09 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
09:37 <milimetric@deploy1002> Finished deploy [analytics/refinery@37e904a] (thin): Only syncing sanitize allowlist, deploying THIN for consistency (duration: 00m 07s) [production]
09:37 <milimetric@deploy1002> Started deploy [analytics/refinery@37e904a] (thin): Only syncing sanitize allowlist, deploying THIN for consistency [production]
09:36 <milimetric@deploy1002> Finished deploy [analytics/refinery@37e904a]: Only syncing sanitize allowlist (duration: 17m 43s) [production]
09:19 <milimetric@deploy1002> Started deploy [analytics/refinery@37e904a]: Only syncing sanitize allowlist [production]
08:00 <jayme> restarting php-fpm on wtp1037 and wtp1030 [production]
02:28 <ryankemper> T290330 [Remove WDQS codfw ~hourly restarts] Successfully rolled out to rest of fleet `sudo cumin 'C:query_service::crontasks' 'sudo run-puppet-agent --force && sudo systemctl reset-failed wdqs-restart-hourly-w-random-delay.timer'` [production]
02:22 <ryankemper> T290330 [Remove WDQS codfw ~hourly restarts] `wdqs2001` and `wdqs2004` look fine after running `sudo systemctl reset-failed wdqs-restart-hourly-w-random-delay.timer` to clean up dangling timer [production]
01:55 <ryankemper> T290330 [Remove WDQS codfw ~hourly restarts] Testing on arbitrary codfw host: `ryankemper@wdqs2001:~$ sudo run-puppet-agent` [production]
01:47 <ryankemper> T290330 [Remove WDQS codfw ~hourly restarts] `sudo cumin 'C:query_service::crontasks' 'sudo disable-puppet "Stop doing wdqs codfw ~hourly restarts - T290330"'` [production]
00:04 <legoktm@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' . [production]
00:01 <legoktm@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' . [production]
2021-09-16 §
23:58 <legoktm@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'shellbox-media' for release 'main' . [production]
23:51 <ryankemper> T273673 All looks good, re-enabling puppet and running on rest of fleet: `sudo cumin 'R:Class = elasticsearch::log::hot_threads' 'sudo run-puppet-agent --force'` [production]
23:44 <ryankemper> T273673 The associated crons are gone and I see the new systemd timers for both gc-cleanup and the hot threads logger [production]
23:39 <ryankemper> T273673 Testing elasticsearch cron->systemd timer-job changes on canary instance `ryankemper@elastic1064:~$ sudo run-puppet-agent --force` [production]
23:37 <ryankemper> T273673 Disabling puppet on elasticsearch hosts `sudo cumin 'R:Class = elasticsearch::log::hot_threads' 'sudo disable-puppet "https://gerrit.wikimedia.org/r/c/operations/puppet/+/721413 - T273673"'` [production]
23:21 <legoktm@deploy1002> helmfile [eqiad] DONE helmfile.d/admin 'apply'. [production]
23:21 <legoktm@deploy1002> helmfile [eqiad] START helmfile.d/admin 'apply'. [production]
23:19 <legoktm@deploy1002> helmfile [codfw] DONE helmfile.d/admin 'apply'. [production]
23:18 <legoktm@deploy1002> helmfile [codfw] START helmfile.d/admin 'apply'. [production]
23:18 <legoktm@deploy1002> helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. [production]
23:17 <legoktm@deploy1002> helmfile [staging-eqiad] START helmfile.d/admin 'apply'. [production]
23:17 <legoktm@deploy1002> helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. [production]
23:16 <legoktm@deploy1002> helmfile [staging-codfw] START helmfile.d/admin 'apply'. [production]
22:45 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
22:40 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
22:38 <legoktm@deploy1002> Finished scap: i18n for restoring deprecated token APIs (duration: 15m 30s) [production]
22:30 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
22:25 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
22:23 <legoktm@deploy1002> Started scap: i18n for restoring deprecated token APIs [production]
22:21 <legoktm@deploy1002> Synchronized php-1.37.0-wmf.23/includes/api/: Restore deprecated token APIs (3/3) (duration: 00m 56s) [production]
22:19 <legoktm@deploy1002> Synchronized php-1.37.0-wmf.23/autoload.php: Restore deprecated token APIs (2/3) (duration: 00m 56s) [production]
22:16 <legoktm@deploy1002> Synchronized php-1.37.0-wmf.23/includes/api/ApiTokens.php: Restore deprecated token APIs (1/3) (duration: 00m 56s) [production]
21:22 <robh@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti4004.ulsfo.wmnet with reason: REIMAGE [production]
21:19 <robh@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4004.ulsfo.wmnet with reason: REIMAGE [production]