2021-09-20
§
|
07:43 |
<oblivian@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
07:43 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
07:35 |
<marostegui> |
Stop db1168 and db2129 in sync T167973 |
[production] |
07:34 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
07:34 |
<urbanecm@deploy1002> |
Synchronized wmf-config/throttle.php: af9d6e4e29e5f53ad8cf5aa2c235d54500c433bd: Revert "Add throttle rule for Czech wiki course" (duration: 00m 56s) |
[production] |
07:32 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1168 T167973', diff saved to https://phabricator.wikimedia.org/P17299 and previous config saved to /var/cache/conftool/dbconfig/20210920-073256-marostegui.json |
[production] |
07:32 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repool db1096:3316 T167973', diff saved to https://phabricator.wikimedia.org/P17298 and previous config saved to /var/cache/conftool/dbconfig/20210920-073206-marostegui.json |
[production] |
07:31 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1096:3316 T167973', diff saved to https://phabricator.wikimedia.org/P17297 and previous config saved to /var/cache/conftool/dbconfig/20210920-073141-marostegui.json |
[production] |
07:31 |
<moritzm> |
uploaded PHP 7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf2 to apt.wikimedia.org (component/php7.2 for buster-wikimedia) T291052 |
[production] |
07:29 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
07:28 |
<urbanecm@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: 8c1d665b5e83f6b1dd1cc4a9c367cb6881473bba: enwiki: Bump Growth features to 25% (mentorship limited to 20% of those users) (T290927) (duration: 00m 57s) |
[production] |
07:20 |
<urbanecm> |
Revert undeployed config patch (https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/721959); not even pulled to deployment, so assuming it never hit prod (T289771) |
[production] |
06:00 |
<marostegui> |
Upgrade db2071, db2072, db2094 |
[production] |
2021-09-17
§
|
21:28 |
<legoktm@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
21:19 |
<legoktm@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
19:00 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0) |
[production] |
17:02 |
<cmjohnson@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE |
[production] |
17:02 |
<hnowlan@cumin1001> |
START - Cookbook sre.postgresql.postgres-init |
[production] |
17:00 |
<cmjohnson@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE |
[production] |
16:48 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0) |
[production] |
16:27 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
16:25 |
<cmjohnson@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
16:11 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
16:04 |
<cmjohnson@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
14:49 |
<hnowlan@cumin1001> |
START - Cookbook sre.postgresql.postgres-init |
[production] |
14:29 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0) |
[production] |
13:06 |
<moritzm> |
installing 4.9.272 kernels on stretch hosts (no reboots yet) |
[production] |
11:28 |
<hnowlan@cumin1001> |
START - Cookbook sre.postgresql.postgres-init |
[production] |
11:14 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
11:09 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
09:37 |
<milimetric@deploy1002> |
Finished deploy [analytics/refinery@37e904a] (thin): Only syncing sanitize allowlist, deploying THIN for consistency (duration: 00m 07s) |
[production] |
09:37 |
<milimetric@deploy1002> |
Started deploy [analytics/refinery@37e904a] (thin): Only syncing sanitize allowlist, deploying THIN for consistency |
[production] |
09:36 |
<milimetric@deploy1002> |
Finished deploy [analytics/refinery@37e904a]: Only syncing sanitize allowlist (duration: 17m 43s) |
[production] |
09:19 |
<milimetric@deploy1002> |
Started deploy [analytics/refinery@37e904a]: Only syncing sanitize allowlist |
[production] |
08:00 |
<jayme> |
restarting php-fpm on wtp1037 and wtp1030 |
[production] |
02:28 |
<ryankemper> |
T290330 [Remove WDQS codfw ~hourly restarts] Successfully rolled out to rest of fleet `sudo cumin 'C:query_service::crontasks' 'sudo run-puppet-agent --force && sudo systemctl reset-failed wdqs-restart-hourly-w-random-delay.timer'` |
[production] |
02:22 |
<ryankemper> |
T290330 [Remove WDQS codfw ~hourly restarts] `wdqs2001` and `wdqs2004` look fine after running `sudo systemctl reset-failed wdqs-restart-hourly-w-random-delay.timer` to clean up dangling timer |
[production] |
01:55 |
<ryankemper> |
T290330 [Remove WDQS codfw ~hourly restarts] Testing on arbitrary codfw host: `ryankemper@wdqs2001:~$ sudo run-puppet-agent` |
[production] |
01:47 |
<ryankemper> |
T290330 [Remove WDQS codfw ~hourly restarts] `sudo cumin 'C:query_service::crontasks' 'sudo disable-puppet "Stop doing wdqs codfw ~hourly restarts - T290330"'` |
[production] |
00:04 |
<legoktm@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' . |
[production] |
00:01 |
<legoktm@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' . |
[production] |
2021-09-16
§
|
23:58 |
<legoktm@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'shellbox-media' for release 'main' . |
[production] |
23:51 |
<ryankemper> |
T273673 All looks good, re-enabling puppet and running on rest of fleet: `sudo cumin 'R:Class = elasticsearch::log::hot_threads' 'sudo run-puppet-agent --force'` |
[production] |
23:44 |
<ryankemper> |
T273673 The associated crons are gone and I see the new systemd timers for both gc-cleanup and the hot threads logger |
[production] |
23:39 |
<ryankemper> |
T273673 Testing elasticsearch cron->systemd timer-job changes on canary instance `ryankemper@elastic1064:~$ sudo run-puppet-agent --force` |
[production] |
23:37 |
<ryankemper> |
T273673 Disabling puppet on elasticsearch hosts `sudo cumin 'R:Class = elasticsearch::log::hot_threads' 'sudo disable-puppet "https://gerrit.wikimedia.org/r/c/operations/puppet/+/721413 - T273673"'` |
[production] |
23:21 |
<legoktm@deploy1002> |
helmfile [eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
23:21 |
<legoktm@deploy1002> |
helmfile [eqiad] START helmfile.d/admin 'apply'. |
[production] |