551-600 of 10000 results (27ms)
2021-04-28 §
05:01 <marostegui@cumin1001> dbctl commit (dc=all): 'Promote db1163 to s1 master and remove read-only from s1 T278214', diff saved to https://phabricator.wikimedia.org/P15600 and previous config saved to /var/cache/conftool/dbconfig/20210428-050138-marostegui.json [production]
05:00 <marostegui@cumin1001> dbctl commit (dc=all): 'Set s1 as read-only for maintenance T278214', diff saved to https://phabricator.wikimedia.org/P15599 and previous config saved to /var/cache/conftool/dbconfig/20210428-050041-marostegui.json [production]
05:00 <marostegui> Starting s1 eqiad failover from db1083 to db1163 - T278214 [production]
04:14 <ryankemper> T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage` [production]
04:14 <ryankemper@cumin1001> START - Cookbook sre.wdqs.data-transfer [production]
04:13 <ryankemper@cumin1001> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [production]
04:08 <ryankemper> T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage` [production]
04:08 <marostegui> Start replication changes, connect everything to db1163 T278214 [production]
04:08 <ryankemper@cumin1001> START - Cookbook sre.wdqs.data-transfer [production]
04:07 <marostegui@cumin1001> dbctl commit (dc=all): 'Set db1163 with weight 0 before the switchover T278214', diff saved to https://phabricator.wikimedia.org/P15598 and previous config saved to /var/cache/conftool/dbconfig/20210428-040718-marostegui.json [production]
03:53 <ryankemper@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE [production]
03:51 <ryankemper@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE [production]
03:49 <ryankemper@puppetmaster1001> conftool action : set/pooled=no; selector: name=wdqs2007.codfw.wmnet [production]
03:48 <ryankemper@puppetmaster1001> conftool action : set/pooled=no; selector: name=wdqs1013.eqiad.wmnet [production]
03:33 <ryankemper> `sudo systemctl restart wdqs-blazegraph` on `wdqs1012` to clear the `WDQS SPARQL` warning [production]
03:32 <ryankemper> T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2007.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage` [production]
03:32 <ryankemper> T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1013.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` [production]
02:33 <robh@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
02:28 <robh@cumin1001> START - Cookbook sre.dns.netbox [production]
01:06 <robh@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
01:00 <robh@cumin1001> START - Cookbook sre.dns.netbox [production]
00:03 <robh@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on snapshot1015.eqiad.wmnet with reason: REIMAGE [production]
00:01 <robh@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1014.eqiad.wmnet with reason: REIMAGE [production]
2021-04-27 §
23:58 <robh@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1015.eqiad.wmnet with reason: REIMAGE [production]
23:57 <robh@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1013.eqiad.wmnet with reason: REIMAGE [production]
23:57 <robh@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1014.eqiad.wmnet with reason: REIMAGE [production]
23:55 <robh@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1012.eqiad.wmnet with reason: REIMAGE [production]
23:54 <robh@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1013.eqiad.wmnet with reason: REIMAGE [production]
23:53 <robh@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1011.eqiad.wmnet with reason: REIMAGE [production]
23:52 <robh@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1012.eqiad.wmnet with reason: REIMAGE [production]
23:51 <robh@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1011.eqiad.wmnet with reason: REIMAGE [production]
21:07 <legoktm@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[2005-2006].codfw.wmnet [production]
20:55 <legoktm@cumin1001> START - Cookbook sre.hosts.decommission for hosts rdb[2005-2006].codfw.wmnet [production]
20:54 <legoktm@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[2003-2004].codfw.wmnet [production]
20:42 <legoktm@cumin1001> START - Cookbook sre.hosts.decommission for hosts rdb[2003-2004].codfw.wmnet [production]
20:32 <bblack> re-pooling codfw public traffic - T279457 [production]
20:11 <jhuneidi@deploy1002> Synchronized php-1.37.0-wmf.3/includes/rcfeed/IRCColourfulRCFeedFormatter.php: Backport rcfeed: Remove reference assignment (T281226) to 1.37.0-wmf.3 (duration: 01m 12s) [production]
20:08 <herron@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2005.codfw.wmnet with reason: REIMAGE [production]
20:06 <herron@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2005.codfw.wmnet with reason: REIMAGE [production]
19:44 <dzahn@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people1003.eqiad.wmnet [production]
19:37 <herron@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: REIMAGE [production]
19:35 <papaul> powerdown ms-backup2001 for maintenance [production]
19:35 <herron@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: REIMAGE [production]
19:07 <papaul> powerdown logstash2035 for maintenance [production]
19:03 <dzahn@cumin1001> START - Cookbook sre.ganeti.makevm for new host people1003.eqiad.wmnet [production]
19:00 <dzahn@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people1003.eqiad.wmnet [production]
18:50 <mutante> people1003 - destroying VM and recreating again from scratch to test if issue of no console and no access is repeatable [production]
18:50 <dzahn@cumin1001> START - Cookbook sre.hosts.decommission for hosts people1003.eqiad.wmnet [production]
18:37 <herron@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: REIMAGE [production]
18:35 <herron@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: REIMAGE [production]