3351-3400 of 10000 results (36ms)
2021-01-26 §
08:44 <jmm@cumin2001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flerovium.eqiad.wmnet [production]
08:39 <jmm@cumin2001> START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet [production]
08:38 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1119,1131].eqiad.wmnet [production]
08:37 <jmm@cumin2001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet [production]
08:36 <elukey@cumin1001> START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1119,1131].eqiad.wmnet [production]
08:33 <jmm@cumin2001> START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet [production]
08:30 <godog> swift start decom for ms-be20[17,19,21,23,24,25,26,27] - T272837 [production]
08:28 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1119.eqiad.wmnet with reason: REIMAGE [production]
08:26 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1131.eqiad.wmnet with reason: REIMAGE [production]
08:26 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1131.eqiad.wmnet with reason: REIMAGE [production]
08:26 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1119.eqiad.wmnet with reason: REIMAGE [production]
08:19 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE [production]
08:18 <moritzm> upgrading OpenJDK on aqs and Hadoop systems [production]
08:17 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE [production]
07:04 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1081 (s4 old master) - T271427', diff saved to https://phabricator.wikimedia.org/P13955 and previous config saved to /var/cache/conftool/dbconfig/20210126-070443-marostegui.json [production]
07:01 <marostegui@cumin1001> dbctl commit (dc=all): 'Promote db1138 to s4 master and remove read-only from s4 T271427', diff saved to https://phabricator.wikimedia.org/P13954 and previous config saved to /var/cache/conftool/dbconfig/20210126-070152-marostegui.json [production]
07:00 <marostegui@cumin1001> dbctl commit (dc=all): 'Set s4 as read-only for maintenance T271427', diff saved to https://phabricator.wikimedia.org/P13953 and previous config saved to /var/cache/conftool/dbconfig/20210126-070037-marostegui.json [production]
07:00 <marostegui> Starting s4 eqiad failover from db1081 to db1138 - T271427 [production]
06:55 <ryankemper> Restarted `wdqs-blazegraph` on `wdqs1005` - its blazegraph was deadlocked (based on the presence of null values for the blazegraph metrics for that host) [production]
05:43 <marostegui@cumin1001> dbctl commit (dc=all): 'Set candidate master to weight 0 before the failover T271427', diff saved to https://phabricator.wikimedia.org/P13952 and previous config saved to /var/cache/conftool/dbconfig/20210126-054337-marostegui.json [production]
00:48 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw2331.codfw.wmnet [production]
00:47 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw2318.codfw.wmnet [production]
00:47 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw2319.codfw.wmnet [production]
00:46 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw2320.codfw.wmnet [production]
00:44 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw2331.codfw.wmnet [production]
00:43 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw2318.codfw.wmnet [production]
00:43 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw2319.codfw.wmnet [production]
00:42 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw2320.codfw.wmnet [production]
00:34 <legoktm@deploy1001> Synchronized wmf-config/CommonSettings.php: Invalidate configuration cache when logos.php is touched too (duration: 00m 56s) [production]
00:32 <legoktm@deploy1001> Synchronized wmf-config/logos.php: Add script to mostly automate logo management (duration: 00m 55s) [production]
00:16 <legoktm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Split $wmgSiteLogo{1,1_5,2}x to a separate logos.php (1/2) (duration: 01m 00s) [production]
00:14 <legoktm@deploy1001> Synchronized wmf-config/logos.php: Split $wmgSiteLogo{1,1_5,2}x to a separate logos.php (1/2) (duration: 00m 56s) [production]
00:08 <legoktm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T272920: arbcom_enwiki: Change favicon to a renamed copy of arbcom_ruwiki.ico (2/2) (duration: 00m 58s) [production]
00:07 <legoktm@deploy1001> Synchronized static/favicon/arbcom_enwiki.ico: T272920: arbcom_enwiki: Change favicon to a renamed copy of arbcom_ruwiki.ico (1/2) (duration: 01m 00s) [production]
2021-01-25 §
23:09 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2318.codfw.wmnet with reason: REIMAGE [production]
23:07 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2319.codfw.wmnet with reason: REIMAGE [production]
23:06 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw2318.codfw.wmnet with reason: REIMAGE [production]
23:05 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw2319.codfw.wmnet with reason: REIMAGE [production]
23:03 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2331.codfw.wmnet with reason: REIMAGE [production]
23:01 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2320.codfw.wmnet with reason: REIMAGE [production]
23:00 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw2331.codfw.wmnet with reason: REIMAGE [production]
22:59 <dzahn@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw2320.codfw.wmnet with reason: REIMAGE [production]
22:44 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw1338.eqiad.wmnet [production]
22:34 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw2322.codfw.wmnet [production]
22:34 <dzahn@cumin1001> conftool action : set/pooled=yes; selector: name=mw2323.codfw.wmnet [production]
22:30 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw2322.codfw.wmnet [production]
22:29 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw2323.codfw.wmnet [production]
22:29 <dzahn@cumin1001> conftool action : set/pooled=no; selector: name=mw1338.eqiad.wmnet [production]
21:45 <cstone> civicrm revision changed from 3afb54f6f9 to dfb2ea2148 [production]
21:11 <pt1979@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE [production]