8301-8350 of 10000 results (30ms)
2021-03-02 ยง
11:12 <hnowlan@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' . [production]
11:11 <hnowlan@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' . [production]
10:43 <arturo> moved cloudvirt1013 cloudvirt1032 cloudvirt1037 back into the 'ceph' host aggregate [admin]
10:37 <jiji@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1028.eqiad.wmnet [production]
10:31 <jiji@cumin1001> START - Cookbook sre.hosts.reboot-single for host mc1028.eqiad.wmnet [production]
10:29 <effie> upgrade memcached on mc2024, mc1028 [production]
10:21 <elukey> roll restart druid historicals on druid public to pick up new cache settings (enable segment caching) [analytics]
10:21 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1119.eqiad.wmnet [production]
10:18 <elukey@cumin1001> START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1119.eqiad.wmnet [production]
10:14 <elukey> roll restart druid brokers on druid public to pick up new cache settings (no segment caching, only query caching) [analytics]
10:13 <arturo> moved cloudvirt1023 to 'maintenance' host aggregate. Drain it with `wmcs-drain-hypervisor` to reboot it for T275753 [admin]
10:12 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1119.eqiad.wmnet with reason: REIMAGE [production]
10:09 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1119.eqiad.wmnet with reason: REIMAGE [production]
10:05 <volans@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE [production]
10:03 <volans@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE [production]
09:54 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1130-1131].eqiad.wmnet [production]
09:52 <elukey@cumin1001> START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1130-1131].eqiad.wmnet [production]
09:46 <liw@deploy1002> Finished scap: testwikis wikis to 1.36.0-wmf.33 (duration: 36m 20s) [production]
09:43 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1124-1128].eqiad.wmnet [production]
09:41 <elukey@cumin1001> START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1124-1128].eqiad.wmnet [production]
09:39 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1120-1123].eqiad.wmnet [production]
09:37 <elukey@cumin1001> START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1120-1123].eqiad.wmnet [production]
09:36 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1119.eqiad.wmnet [production]
09:33 <elukey@cumin1001> START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1119.eqiad.wmnet [production]
09:12 <liw@deploy1002> Started scap: testwikis wikis to 1.36.0-wmf.33 [production]
08:58 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1131.eqiad.wmnet with reason: REIMAGE [production]
08:56 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1130.eqiad.wmnet with reason: REIMAGE [production]
08:56 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1131.eqiad.wmnet with reason: REIMAGE [production]
08:54 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1130.eqiad.wmnet with reason: REIMAGE [production]
08:54 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1128.eqiad.wmnet with reason: REIMAGE [production]
08:53 <vgutierrez> rolling restart of ats-tls on ulsfo [production]
08:52 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1128.eqiad.wmnet with reason: REIMAGE [production]
08:39 <kharlan@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' . [production]
08:30 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1127.eqiad.wmnet with reason: REIMAGE [production]
08:28 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1126.eqiad.wmnet with reason: REIMAGE [production]
08:27 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1127.eqiad.wmnet with reason: REIMAGE [production]
08:25 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1126.eqiad.wmnet with reason: REIMAGE [production]
08:25 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1125.eqiad.wmnet with reason: REIMAGE [production]
08:23 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1125.eqiad.wmnet with reason: REIMAGE [production]
08:01 <elukey> manual start of performance-asotranking on stat1007 (requested by Gilles) - T276121 [analytics]
08:00 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1124.eqiad.wmnet with reason: REIMAGE [production]
07:59 <liw> 1.36.0-wmf.33 was branched at 800e1f8cea169fc9c6e72ac1dc197591a06701bd for T274937 [production]
07:58 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1123.eqiad.wmnet with reason: REIMAGE [production]
07:58 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1124.eqiad.wmnet with reason: REIMAGE [production]
07:56 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1122.eqiad.wmnet with reason: REIMAGE [production]
07:56 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1123.eqiad.wmnet with reason: REIMAGE [production]
07:54 <godog> swift eqiad-prod: add weight to ms-be106[0-3] - T268435 [production]
07:54 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1122.eqiad.wmnet with reason: REIMAGE [production]
07:28 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1121.eqiad.wmnet with reason: REIMAGE [production]
07:27 <ryankemper> Pooled `elastic106[0,4]` (Noticed I never re-pooled these hosts after resolving an incident last week) [production]