2023-02-08
ยง
|
12:36 |
<marostegui@cumin1001> |
START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1096.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001" |
[production] |
12:34 |
<marostegui@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
12:29 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts db1096.eqiad.wmnet |
[production] |
12:28 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1180 (T328817)', diff saved to https://phabricator.wikimedia.org/P43803 and previous config saved to /var/cache/conftool/dbconfig/20230208-122829-marostegui.json |
[production] |
12:26 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db1180 (T328817)', diff saved to https://phabricator.wikimedia.org/P43802 and previous config saved to /var/cache/conftool/dbconfig/20230208-122620-marostegui.json |
[production] |
12:26 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance |
[production] |
12:26 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance |
[production] |
12:26 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1165 (T328817)', diff saved to https://phabricator.wikimedia.org/P43801 and previous config saved to /var/cache/conftool/dbconfig/20230208-122559-marostegui.json |
[production] |
12:21 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas |
[production] |
12:19 |
<jmm@cumin2002> |
START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas |
[production] |
12:18 |
<eoghan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner1002.eqiad.wmnet with reason: host reimage |
[production] |
12:15 |
<eoghan@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner1002.eqiad.wmnet with reason: host reimage |
[production] |
12:13 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: kafka-stretch2002.codfw.wmnet |
[production] |
12:13 |
<jmm@cumin2002> |
START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: kafka-stretch2002.codfw.wmnet |
[production] |
12:10 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P43800 and previous config saved to /var/cache/conftool/dbconfig/20230208-121053-marostegui.json |
[production] |
12:03 |
<eoghan@cumin1001> |
START - Cookbook sre.hosts.reimage for host gitlab-runner1002.eqiad.wmnet with OS bullseye |
[production] |
11:59 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dse-k8s-worker1001.eqiad.wmnet with reason: Attempting to move some GPUs |
[production] |
11:59 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.downtime for 8:00:00 on dse-k8s-worker1001.eqiad.wmnet with reason: Attempting to move some GPUs |
[production] |
11:58 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-worker1097.eqiad.wmnet with reason: Attempting to move some GPUs |
[production] |
11:57 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.downtime for 8:00:00 on an-worker1097.eqiad.wmnet with reason: Attempting to move some GPUs |
[production] |
11:57 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-worker1096.eqiad.wmnet with reason: Attempting to move some GPUs |
[production] |
11:57 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.downtime for 8:00:00 on an-worker1096.eqiad.wmnet with reason: Attempting to move some GPUs |
[production] |
11:56 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: moss-be1001.eqiad.wmnet |
[production] |
11:56 |
<jmm@cumin2002> |
START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: moss-be1001.eqiad.wmnet |
[production] |
11:55 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P43799 and previous config saved to /var/cache/conftool/dbconfig/20230208-115546-marostegui.json |
[production] |
11:53 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: flowspec1001.eqiad.wmnet |
[production] |
11:53 |
<jmm@cumin2002> |
START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: flowspec1001.eqiad.wmnet |
[production] |
11:40 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1165 (T328817)', diff saved to https://phabricator.wikimedia.org/P43798 and previous config saved to /var/cache/conftool/dbconfig/20230208-114040-marostegui.json |
[production] |
11:38 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depooling db1165 (T328817)', diff saved to https://phabricator.wikimedia.org/P43797 and previous config saved to /var/cache/conftool/dbconfig/20230208-113832-marostegui.json |
[production] |
11:38 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance |
[production] |
11:38 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance |
[production] |
11:38 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance |
[production] |
11:37 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance |
[production] |
11:13 |
<marostegui> |
Stop mysql on db1096 (s5,s6) T329147 |
[production] |
11:05 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance |
[production] |
11:05 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance |
[production] |
11:05 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T328817)', diff saved to https://phabricator.wikimedia.org/P43796 and previous config saved to /var/cache/conftool/dbconfig/20230208-110507-marostegui.json |
[production] |
10:57 |
<zabe@deploy1002> |
Finished scap: Backport for [[gerrit:887748|Remove cul_reason comment table migration code (T233004 T329151)]] (duration: 08m 05s) |
[production] |
10:51 |
<zabe@deploy1002> |
zabe: Backport for [[gerrit:887748|Remove cul_reason comment table migration code (T233004 T329151)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet |
[production] |
10:50 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P43793 and previous config saved to /var/cache/conftool/dbconfig/20230208-105001-marostegui.json |
[production] |
10:49 |
<zabe@deploy1002> |
Started scap: Backport for [[gerrit:887748|Remove cul_reason comment table migration code (T233004 T329151)]] |
[production] |
10:38 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe |
[production] |
10:35 |
<jmm@cumin2002> |
START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe |
[production] |
10:34 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P43791 and previous config saved to /var/cache/conftool/dbconfig/20230208-103455-marostegui.json |
[production] |
10:33 |
<volans> |
deploying python3-wmflib_1.2.1 to the fleet |
[production] |
10:28 |
<zabe@deploy1002> |
Finished scap: Backport for [[gerrit:887747|Revert "slwiki: Raise AF emergency disable treshold+count" (T328366)]] (duration: 08m 49s) |
[production] |
10:26 |
<marostegui> |
Failover m2-master from dbproxy1013 to dbproxy1015 T329073 |
[production] |
10:21 |
<zabe@deploy1002> |
zabe: Backport for [[gerrit:887747|Revert "slwiki: Raise AF emergency disable treshold+count" (T328366)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet |
[production] |
10:19 |
<zabe@deploy1002> |
Started scap: Backport for [[gerrit:887747|Revert "slwiki: Raise AF emergency disable treshold+count" (T328366)]] |
[production] |
10:19 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T328817)', diff saved to https://phabricator.wikimedia.org/P43790 and previous config saved to /var/cache/conftool/dbconfig/20230208-101948-marostegui.json |
[production] |