7001-7050 of 10000 results (39ms)
2021-09-20 §
07:32 <marostegui@cumin1001> dbctl commit (dc=all): 'Repool db1096:3316 T167973', diff saved to https://phabricator.wikimedia.org/P17298 and previous config saved to /var/cache/conftool/dbconfig/20210920-073206-marostegui.json [production]
07:31 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1096:3316 T167973', diff saved to https://phabricator.wikimedia.org/P17297 and previous config saved to /var/cache/conftool/dbconfig/20210920-073141-marostegui.json [production]
07:31 <moritzm> uploaded PHP 7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf2 to apt.wikimedia.org (component/php7.2 for buster-wikimedia) T291052 [production]
07:29 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
07:28 <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: 8c1d665b5e83f6b1dd1cc4a9c367cb6881473bba: enwiki: Bump Growth features to 25% (mentorship limited to 20% of those users) (T290927) (duration: 00m 57s) [production]
07:20 <urbanecm> Revert undeployed config patch (https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/721959); not even pulled to deployment, so assuming it never hit prod (T289771) [production]
06:00 <marostegui> Upgrade db2071, db2072, db2094 [production]
2021-09-18 §
01:47 <ladsgroup@deploy1002> Synchronized php-1.37.0-wmf.23/includes/libs/rdbms/database/Database.php: (no justification provided) (duration: 00m 57s) [production]
01:01 <ladsgroup@deploy1002> Synchronized php-1.37.0-wmf.23/includes/libs/rdbms/database/Database.php: (no justification provided) (duration: 01m 03s) [production]
2021-09-17 §
21:28 <legoktm@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
21:19 <legoktm@cumin1001> START - Cookbook sre.dns.netbox [production]
19:00 <hnowlan@cumin1001> END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0) [production]
17:02 <cmjohnson@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE [production]
17:02 <hnowlan@cumin1001> START - Cookbook sre.postgresql.postgres-init [production]
17:00 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE [production]
16:48 <hnowlan@cumin1001> END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0) [production]
16:27 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
16:25 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
16:11 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
16:04 <cmjohnson@cumin1001> START - Cookbook sre.dns.netbox [production]
14:49 <hnowlan@cumin1001> START - Cookbook sre.postgresql.postgres-init [production]
14:29 <hnowlan@cumin1001> END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0) [production]
13:06 <moritzm> installing 4.9.272 kernels on stretch hosts (no reboots yet) [production]
11:28 <hnowlan@cumin1001> START - Cookbook sre.postgresql.postgres-init [production]
11:14 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
11:09 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
09:37 <milimetric@deploy1002> Finished deploy [analytics/refinery@37e904a] (thin): Only syncing sanitize allowlist, deploying THIN for consistency (duration: 00m 07s) [production]
09:37 <milimetric@deploy1002> Started deploy [analytics/refinery@37e904a] (thin): Only syncing sanitize allowlist, deploying THIN for consistency [production]
09:36 <milimetric@deploy1002> Finished deploy [analytics/refinery@37e904a]: Only syncing sanitize allowlist (duration: 17m 43s) [production]
09:19 <milimetric@deploy1002> Started deploy [analytics/refinery@37e904a]: Only syncing sanitize allowlist [production]
08:00 <jayme> restarting php-fpm on wtp1037 and wtp1030 [production]
02:28 <ryankemper> T290330 [Remove WDQS codfw ~hourly restarts] Successfully rolled out to rest of fleet `sudo cumin 'C:query_service::crontasks' 'sudo run-puppet-agent --force && sudo systemctl reset-failed wdqs-restart-hourly-w-random-delay.timer'` [production]
02:22 <ryankemper> T290330 [Remove WDQS codfw ~hourly restarts] `wdqs2001` and `wdqs2004` look fine after running `sudo systemctl reset-failed wdqs-restart-hourly-w-random-delay.timer` to clean up dangling timer [production]
01:55 <ryankemper> T290330 [Remove WDQS codfw ~hourly restarts] Testing on arbitrary codfw host: `ryankemper@wdqs2001:~$ sudo run-puppet-agent` [production]
01:47 <ryankemper> T290330 [Remove WDQS codfw ~hourly restarts] `sudo cumin 'C:query_service::crontasks' 'sudo disable-puppet "Stop doing wdqs codfw ~hourly restarts - T290330"'` [production]
00:04 <legoktm@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' . [production]
00:01 <legoktm@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' . [production]
2021-09-16 §
23:58 <legoktm@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'shellbox-media' for release 'main' . [production]
23:51 <ryankemper> T273673 All looks good, re-enabling puppet and running on rest of fleet: `sudo cumin 'R:Class = elasticsearch::log::hot_threads' 'sudo run-puppet-agent --force'` [production]
23:44 <ryankemper> T273673 The associated crons are gone and I see the new systemd timers for both gc-cleanup and the hot threads logger [production]
23:39 <ryankemper> T273673 Testing elasticsearch cron->systemd timer-job changes on canary instance `ryankemper@elastic1064:~$ sudo run-puppet-agent --force` [production]
23:37 <ryankemper> T273673 Disabling puppet on elasticsearch hosts `sudo cumin 'R:Class = elasticsearch::log::hot_threads' 'sudo disable-puppet "https://gerrit.wikimedia.org/r/c/operations/puppet/+/721413 - T273673"'` [production]
23:21 <legoktm@deploy1002> helmfile [eqiad] DONE helmfile.d/admin 'apply'. [production]
23:21 <legoktm@deploy1002> helmfile [eqiad] START helmfile.d/admin 'apply'. [production]
23:19 <legoktm@deploy1002> helmfile [codfw] DONE helmfile.d/admin 'apply'. [production]
23:18 <legoktm@deploy1002> helmfile [codfw] START helmfile.d/admin 'apply'. [production]
23:18 <legoktm@deploy1002> helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. [production]
23:17 <legoktm@deploy1002> helmfile [staging-eqiad] START helmfile.d/admin 'apply'. [production]
23:17 <legoktm@deploy1002> helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. [production]
23:16 <legoktm@deploy1002> helmfile [staging-codfw] START helmfile.d/admin 'apply'. [production]