2021-04-28
§
|
07:10 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
07:05 |
<elukey@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
07:04 |
<elukey> |
add AAAA record for kafka-main2002.codfw.wmnet |
[production] |
07:03 |
<marostegui> |
Deploy schema change on db2089:3316 and db1098:3316 T266486 T268392 T273360 |
[production] |
06:26 |
<legoktm> |
created mailman3 superusers for Administrator (noc@), Ladsgroup and Legoktm |
[production] |
06:23 |
<legoktm> |
legoktm@lists1001:~$ sudo mailman-web set_default_site --name lists.wikimedia.org --domain lists.wikimedia.org |
[production] |
06:14 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15607 and previous config saved to /var/cache/conftool/dbconfig/20210428-061426-root.json |
[production] |
06:00 |
<marostegui> |
Stop MySQL on db2096 (x1 codfw) T281135 |
[production] |
05:59 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15606 and previous config saved to /var/cache/conftool/dbconfig/20210428-055922-root.json |
[production] |
05:51 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Pool db1167 in s8 T258361', diff saved to https://phabricator.wikimedia.org/P15605 and previous config saved to /var/cache/conftool/dbconfig/20210428-055144-marostegui.json |
[production] |
05:44 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15604 and previous config saved to /var/cache/conftool/dbconfig/20210428-054419-root.json |
[production] |
05:29 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15603 and previous config saved to /var/cache/conftool/dbconfig/20210428-052915-root.json |
[production] |
05:15 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P15602 and previous config saved to /var/cache/conftool/dbconfig/20210428-051526-marostegui.json |
[production] |
05:07 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1083 (old s1 master) for schema change', diff saved to https://phabricator.wikimedia.org/P15601 and previous config saved to /var/cache/conftool/dbconfig/20210428-050754-marostegui.json |
[production] |
05:01 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Promote db1163 to s1 master and remove read-only from s1 T278214', diff saved to https://phabricator.wikimedia.org/P15600 and previous config saved to /var/cache/conftool/dbconfig/20210428-050138-marostegui.json |
[production] |
05:00 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Set s1 as read-only for maintenance T278214', diff saved to https://phabricator.wikimedia.org/P15599 and previous config saved to /var/cache/conftool/dbconfig/20210428-050041-marostegui.json |
[production] |
05:00 |
<marostegui> |
Starting s1 eqiad failover from db1083 to db1163 - T278214 |
[production] |
04:14 |
<ryankemper> |
T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage` |
[production] |
04:14 |
<ryankemper@cumin1001> |
START - Cookbook sre.wdqs.data-transfer |
[production] |
04:13 |
<ryankemper@cumin1001> |
END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) |
[production] |
04:08 |
<ryankemper> |
T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage` |
[production] |
04:08 |
<marostegui> |
Start replication changes, connect everything to db1163 T278214 |
[production] |
04:08 |
<ryankemper@cumin1001> |
START - Cookbook sre.wdqs.data-transfer |
[production] |
04:07 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Set db1163 with weight 0 before the switchover T278214', diff saved to https://phabricator.wikimedia.org/P15598 and previous config saved to /var/cache/conftool/dbconfig/20210428-040718-marostegui.json |
[production] |
03:53 |
<ryankemper@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE |
[production] |
03:51 |
<ryankemper@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE |
[production] |
03:49 |
<ryankemper@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=wdqs2007.codfw.wmnet |
[production] |
03:48 |
<ryankemper@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=wdqs1013.eqiad.wmnet |
[production] |
03:33 |
<ryankemper> |
`sudo systemctl restart wdqs-blazegraph` on `wdqs1012` to clear the `WDQS SPARQL` warning |
[production] |
03:32 |
<ryankemper> |
T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2007.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage` |
[production] |
03:32 |
<ryankemper> |
T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1013.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` |
[production] |
02:33 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
02:28 |
<robh@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
01:06 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
01:00 |
<robh@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
00:03 |
<robh@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on snapshot1015.eqiad.wmnet with reason: REIMAGE |
[production] |
00:01 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1014.eqiad.wmnet with reason: REIMAGE |
[production] |
2021-04-27
§
|
23:58 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1015.eqiad.wmnet with reason: REIMAGE |
[production] |
23:57 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1013.eqiad.wmnet with reason: REIMAGE |
[production] |
23:57 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1014.eqiad.wmnet with reason: REIMAGE |
[production] |
23:55 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1012.eqiad.wmnet with reason: REIMAGE |
[production] |
23:54 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1013.eqiad.wmnet with reason: REIMAGE |
[production] |
23:53 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1011.eqiad.wmnet with reason: REIMAGE |
[production] |
23:52 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1012.eqiad.wmnet with reason: REIMAGE |
[production] |
23:51 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1011.eqiad.wmnet with reason: REIMAGE |
[production] |
21:07 |
<legoktm@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[2005-2006].codfw.wmnet |
[production] |
20:55 |
<legoktm@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts rdb[2005-2006].codfw.wmnet |
[production] |
20:54 |
<legoktm@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[2003-2004].codfw.wmnet |
[production] |
20:42 |
<legoktm@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts rdb[2003-2004].codfw.wmnet |
[production] |
20:32 |
<bblack> |
re-pooling codfw public traffic - T279457 |
[production] |