9501-9550 of 10000 results (39ms)
2021-10-11 §
09:05 <volans@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet [production]
09:01 <godog> bounce swift-object-replicator on ms-be2036 [production]
08:52 <godog> bounce statsite on graphite1004 to apply unit config changes [production]
08:48 <volans@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet [production]
08:41 <volans@cumin2002> START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet [production]
08:38 <moritzm> updated buster d-i image for Bullseye 11.1 point release T292844 [production]
08:38 <moritzm> updated buster d-i image for Buster 10.11 point release T292838 [production]
08:26 <godog> swift eqiad-prod: final weight to ms-be10[64-67] - T290546 [production]
08:25 <moritzm> updated buster d-i image for Buster 10.11 point release T292838 [production]
08:24 <volans@cumin1001> START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet [production]
08:06 <godog> bounce uwsgi on graphite hosts to bump request size limit - T292877 [production]
07:58 <volans> migrating physical hosts DHCP to the new reimage process - T269855 [production]
07:57 <elukey> start kafka topics rebalancing for main-codfw (long running maintenance) - T288825 [production]
07:37 <joal> rerun refine_event for `event`.`mediawiki_content_translation_event` year=2021/month=10/day=10/hour=16 [analytics]
02:08 <Krinkle> Browsers fail to connect with https://upload.wikimedia.beta.wmflabs.org/ (Certificate expired two days ago) [releng]
2021-10-10 §
18:07 <joal> Rerun webrequest-load-wf-text-2021-10-10-10 - failed due to network issue [analytics]
11:20 <wm-bot> <lucaswerkmeister> deployed bf2834c472 (improve error handling) [tools.lexeme-forms]
2021-10-09 §
05:01 <jiji@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
04:28 <jiji@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
01:32 <ryankemper@cumin1001> END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814 [production]
00:46 <mutante> ms-be2045 - started systemd-timedated which had been killed by something [production]
00:28 <ryankemper@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814 [production]
00:24 <ryankemper@cumin1001> END (FAIL) - Cookbook sre.elasticsearch.force-unfreeze (exit_code=99) [production]
00:23 <ryankemper@cumin1001> START - Cookbook sre.elasticsearch.force-unfreeze [production]
00:18 <DeusExMachina> Deleted mars-01 per T292884 [wikisp]
00:13 <ryankemper> T292814 Write queue stuck at 133 events in partition 1 of topic `codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite`, will try again at another time [production]
00:12 <ryankemper@cumin1001> END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814 [production]
2021-10-08 §
23:16 <legoktm> sudo cumin -b 10 C:mediawiki::packages 'apt-get purge lilypond-data -y' [production]
23:10 <ryankemper@cumin1001> START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - T292814 [production]
23:00 <wm-bot> <root> root forced the deletion of job 913897 [tools.gerrit-reviewer-bot]
22:55 <wm-bot> <root> Restarting in the hope of fixing LE cert issues [tools.gerrit-reviewer-bot]
21:38 <mutante> mwmaint2002 - disable-puppet, stop bacula-fd, recovery in progress [production]
21:34 <mutante> disabling puppet on bacula - going through a restore https://wikitech.wikimedia.org/wiki/Bacula#Restore_from_a_non-existent_host_(missing_private_key) [production]
21:30 <legoktm> running puppet across C:mediawiki::packages to uninstall lilypond and ploticus: legoktm@cumin1001:~$ sudo cumin -b 4 C:mediawiki::packages 'run-puppet-agent' [production]
21:17 <hashar> Purging Docker images on all CI agents [releng]
20:12 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE [production]
20:10 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE [production]
20:08 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE [production]
20:08 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE [production]
20:06 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE [production]
20:05 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE [production]
20:00 <hashar> Updating Jenkins jobs for Quibble 1.1.1 # https://gerrit.wikimedia.org/r/c/integration/config/+/728620 [releng]
19:46 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE [production]
19:45 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE [production]
19:43 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE [production]
19:42 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE [production]
19:42 <cmjohnson@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE [production]
19:39 <cmjohnson@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE [production]
19:23 <hashar> Building CI Docker images for Quibble 1.1.1 [releng]
19:08 <hashar> Tag quibble 1.1.1 @ b54af2aa60 [releng]