1-50 of 10000 results (28ms)
2021-10-09 §
05:01 <jiji@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
04:28 <jiji@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
01:32 <ryankemper@cumin1001> END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814 [production]
00:46 <mutante> ms-be2045 - started systemd-timedated which had been killed by something [production]
00:28 <ryankemper@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814 [production]
00:24 <ryankemper@cumin1001> END (FAIL) - Cookbook sre.elasticsearch.force-unfreeze (exit_code=99) [production]
00:23 <ryankemper@cumin1001> START - Cookbook sre.elasticsearch.force-unfreeze [production]
00:13 <ryankemper> T292814 Write queue stuck at 133 events in partition 1 of topic `codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite`, will try again at another time [production]
00:12 <ryankemper@cumin1001> END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814 [production]
2021-10-08 §
23:16 <legoktm> sudo cumin -b 10 C:mediawiki::packages 'apt-get purge lilypond-data -y' [production]
23:10 <ryankemper@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814 [production]
21:38 <mutante> mwmaint2002 - disable-puppet, stop bacula-fd, recovery in progress [production]
21:34 <mutante> disabling puppet on bacula - going through a restore https://wikitech.wikimedia.org/wiki/Bacula#Restore_from_a_non-existent_host_(missing_private_key) [production]
21:30 <legoktm> running puppet across C:mediawiki::packages to uninstall lilypond and ploticus: legoktm@cumin1001:~$ sudo cumin -b 4 C:mediawiki::packages 'run-puppet-agent' [production]
20:12 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE [production]
20:10 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE [production]
20:08 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE [production]
20:08 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE [production]
20:06 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE [production]
20:05 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE [production]
19:46 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE [production]
19:45 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE [production]
19:43 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE [production]
19:42 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE [production]
19:42 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE [production]
19:39 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE [production]
18:15 <cstone> civicrm revision changed from 5cb7d487cb to 598b59b0ee [production]
16:19 <urbanecm> [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=enwiki --force # to measure performance on a large wiki [production]
15:48 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [production]
15:48 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [production]
15:29 <jelto> enable puppet on gitlab1001 again for T283076 [production]
14:05 <jiji@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
14:01 <jiji@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
09:49 <Amir1> wikiadmin@10.64.16.85(wikidatawiki)> delete from wb_changes_subscription where cs_subscriber_id in ('testcommonswiki', 'mowiki'); [production]
09:39 <Emperor> installing stress on ms-be2045 given recent h/w issues T290881 [production]
08:20 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
08:12 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
08:04 <urbanecm> [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=frwiki --force [production]
07:43 <Emperor> reboot ms-be2045 T290881 [production]
07:41 <gehel> manually resuming the data reloads on wdqs1009 and wdqs2008 [production]
06:42 <ayounsi@cumin1001> END (PASS) - Cookbook sre.network.cf (exit_code=0) [production]
06:42 <ayounsi@cumin1001> START - Cookbook sre.network.cf [production]
06:28 <ayounsi@cumin2002> END (PASS) - Cookbook sre.network.cf (exit_code=0) [production]
06:28 <ayounsi@cumin2002> START - Cookbook sre.network.cf [production]
05:35 <ryankemper@cumin1001> END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814 [production]
04:56 <ryankemper> [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good [production]
04:32 <ryankemper> T292814 Beginning rolling restart of `cloudelastic`: `sudo -i cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic restart" --nodes-per-run 1 --start-datetime 2021-10-08T03:53:49 --task-id T292814` on `ryankemper@cumin1001` tmux `elastic` [production]
04:31 <ryankemper@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814 [production]
04:29 <ryankemper> [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'` [production]
04:28 <ryankemper> [WDQS Deploy] Restarted `wdqs-categories` across both test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'` [production]