401-450 of 10000 results (30ms)
2020-12-16 §
11:19 <jbond@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE [production]
11:19 <jiji@deploy1001> Synchronized wmf-config/ProductionServices.php: Swap mc1019 with mc1031 for Redis lock manager - T265643 (duration: 01m 17s) [production]
11:17 <jiji@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2022.codfw.wmnet with reason: REIMAGE [production]
11:15 <jiji@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1022.eqiad.wmnet with reason: REIMAGE [production]
11:15 <jiji@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mc2022.codfw.wmnet with reason: REIMAGE [production]
11:14 <jbond@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE [production]
11:13 <jiji@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mc1022.eqiad.wmnet with reason: REIMAGE [production]
11:10 <jynus> stopping and restarting dbstore1004 to mitigate (short term) T270112 [production]
10:37 <jbond@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [production]
10:37 <jbond@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE [production]
10:35 <jbond42> reboot rpki2001 [production]
10:35 <jbond@cumin1001> START - Cookbook sre.hosts.reboot-single [production]
10:35 <jbond@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE [production]
10:34 <jbond@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [production]
10:30 <jbond42> reboot rpki1001 [production]
10:30 <jbond@cumin1001> START - Cookbook sre.hosts.reboot-single [production]
10:05 <gehel@cumin1001> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [production]
10:02 <jbond@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE [production]
10:00 <jbond@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE [production]
09:49 <godog> swift eqiad-prod: add weight to ms-be106[0-3] - T268435 [production]
09:32 <_joe_> reset-failed for docker report jobs on deneb, failed because of a registry gateway timeout [production]
09:29 <elukey> force execution of cumin-check-aliases.service on cumin[12]001 hosts to clear alarms [production]
08:35 <gehel@cumin1001> START - Cookbook sre.wdqs.data-transfer [production]
08:23 <vgutierrez> acme-chief and acme-chief-api restarts for openssl upgrades (CVE-2020-1971) [production]
07:55 <gehel> depool wdqs1005 (catching up on lag) [production]
07:20 <marostegui> Stop mysql on db2142 to clone db1151 - T269324 [production]
2020-12-15 §
23:47 <dduvall@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' . [production]
23:45 <dduvall@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' . [production]
23:34 <dduvall@deploy1001> helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . [production]
22:10 <mholloway-shell@deploy1001> Synchronized wmf-config/InitialiseSettings.php: WikimediaEvents: Promote SessionTick to group1 T248987 (duration: 01m 04s) [production]
20:29 <marxarelli> group0 to 1.36.0-wmf.22 complete. no new errors or concerning rates (refs T267415) [production]
20:26 <tzatziki> reset email for User:Cnk1220 [production]
20:06 <dduvall@deploy1001> rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.22 [production]
19:32 <joal@deploy1001> Finished deploy [analytics/refinery@2202db5] (thin): Regular analytics weekly train - THIN [analytics/refinery@2202db5] (duration: 00m 08s) [production]
19:32 <joal@deploy1001> Started deploy [analytics/refinery@2202db5] (thin): Regular analytics weekly train - THIN [analytics/refinery@2202db5] [production]
19:31 <joal@deploy1001> Finished deploy [analytics/refinery@2202db5]: Regular analytics weekly train [analytics/refinery@2202db5] (duration: 16m 36s) [production]
19:14 <joal@deploy1001> Started deploy [analytics/refinery@2202db5]: Regular analytics weekly train [analytics/refinery@2202db5] [production]
18:48 <dduvall@deploy1001> Pruned MediaWiki: 1.36.0-wmf.20 (duration: 04m 19s) [production]
18:41 <dduvall@deploy1001> Finished scap: testwikis wikis to 1.36.0-wmf.22 (duration: 46m 41s) [production]
17:55 <dduvall@deploy1001> Started scap: testwikis wikis to 1.36.0-wmf.22 [production]
16:47 <ottomata> bumped eventate-main memory limits from 300M to 600M - T249745 [production]
16:47 <otto@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . [production]
16:47 <otto@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [production]
16:45 <hnowlan@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1265.eqiad.wmnet with reason: REIMAGE [production]
16:44 <otto@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [production]
16:44 <otto@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . [production]
16:43 <hnowlan@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw1265.eqiad.wmnet with reason: REIMAGE [production]
16:41 <otto@deploy1001> helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [production]
16:31 <Amir1> end of rebuilding sites table across wikis (T269443 T269435 T269430 T268461 T268415) [production]
16:18 <hnowlan> reimaging mw1265 to buster [production]