2021-09-17
§
|
21:28 |
<legoktm@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
21:19 |
<legoktm@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
19:00 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0) |
[production] |
17:02 |
<cmjohnson@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE |
[production] |
17:02 |
<hnowlan@cumin1001> |
START - Cookbook sre.postgresql.postgres-init |
[production] |
17:00 |
<cmjohnson@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE |
[production] |
16:48 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0) |
[production] |
16:27 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
16:25 |
<cmjohnson@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
16:11 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
16:04 |
<cmjohnson@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
14:49 |
<hnowlan@cumin1001> |
START - Cookbook sre.postgresql.postgres-init |
[production] |
14:29 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0) |
[production] |
13:06 |
<moritzm> |
installing 4.9.272 kernels on stretch hosts (no reboots yet) |
[production] |
11:28 |
<hnowlan@cumin1001> |
START - Cookbook sre.postgresql.postgres-init |
[production] |
11:14 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
11:09 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
09:37 |
<milimetric@deploy1002> |
Finished deploy [analytics/refinery@37e904a] (thin): Only syncing sanitize allowlist, deploying THIN for consistency (duration: 00m 07s) |
[production] |
09:37 |
<milimetric@deploy1002> |
Started deploy [analytics/refinery@37e904a] (thin): Only syncing sanitize allowlist, deploying THIN for consistency |
[production] |
09:36 |
<milimetric@deploy1002> |
Finished deploy [analytics/refinery@37e904a]: Only syncing sanitize allowlist (duration: 17m 43s) |
[production] |
09:19 |
<milimetric@deploy1002> |
Started deploy [analytics/refinery@37e904a]: Only syncing sanitize allowlist |
[production] |
08:00 |
<jayme> |
restarting php-fpm on wtp1037 and wtp1030 |
[production] |
02:28 |
<ryankemper> |
T290330 [Remove WDQS codfw ~hourly restarts] Successfully rolled out to rest of fleet `sudo cumin 'C:query_service::crontasks' 'sudo run-puppet-agent --force && sudo systemctl reset-failed wdqs-restart-hourly-w-random-delay.timer'` |
[production] |
02:22 |
<ryankemper> |
T290330 [Remove WDQS codfw ~hourly restarts] `wdqs2001` and `wdqs2004` look fine after running `sudo systemctl reset-failed wdqs-restart-hourly-w-random-delay.timer` to clean up dangling timer |
[production] |
01:55 |
<ryankemper> |
T290330 [Remove WDQS codfw ~hourly restarts] Testing on arbitrary codfw host: `ryankemper@wdqs2001:~$ sudo run-puppet-agent` |
[production] |
01:47 |
<ryankemper> |
T290330 [Remove WDQS codfw ~hourly restarts] `sudo cumin 'C:query_service::crontasks' 'sudo disable-puppet "Stop doing wdqs codfw ~hourly restarts - T290330"'` |
[production] |
00:04 |
<legoktm@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' . |
[production] |
00:01 |
<legoktm@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' . |
[production] |
2021-09-16
§
|
23:58 |
<legoktm@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'shellbox-media' for release 'main' . |
[production] |
23:51 |
<ryankemper> |
T273673 All looks good, re-enabling puppet and running on rest of fleet: `sudo cumin 'R:Class = elasticsearch::log::hot_threads' 'sudo run-puppet-agent --force'` |
[production] |
23:44 |
<ryankemper> |
T273673 The associated crons are gone and I see the new systemd timers for both gc-cleanup and the hot threads logger |
[production] |
23:39 |
<ryankemper> |
T273673 Testing elasticsearch cron->systemd timer-job changes on canary instance `ryankemper@elastic1064:~$ sudo run-puppet-agent --force` |
[production] |
23:37 |
<ryankemper> |
T273673 Disabling puppet on elasticsearch hosts `sudo cumin 'R:Class = elasticsearch::log::hot_threads' 'sudo disable-puppet "https://gerrit.wikimedia.org/r/c/operations/puppet/+/721413 - T273673"'` |
[production] |
23:21 |
<legoktm@deploy1002> |
helmfile [eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
23:21 |
<legoktm@deploy1002> |
helmfile [eqiad] START helmfile.d/admin 'apply'. |
[production] |
23:19 |
<legoktm@deploy1002> |
helmfile [codfw] DONE helmfile.d/admin 'apply'. |
[production] |
23:18 |
<legoktm@deploy1002> |
helmfile [codfw] START helmfile.d/admin 'apply'. |
[production] |
23:18 |
<legoktm@deploy1002> |
helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
23:17 |
<legoktm@deploy1002> |
helmfile [staging-eqiad] START helmfile.d/admin 'apply'. |
[production] |
23:17 |
<legoktm@deploy1002> |
helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. |
[production] |
23:16 |
<legoktm@deploy1002> |
helmfile [staging-codfw] START helmfile.d/admin 'apply'. |
[production] |
22:45 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
22:40 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |