2023-01-26
ยง
|
07:25 |
<dcausse> |
T322869: depooling wdqs2009 wdqs2010 wdqs2011 wdqs2012 these hosts should not serve user traffic yet they don't have the database loaded |
[production] |
07:23 |
<marostegui> |
Failover m1 from db1195 to db1176 - T327800 |
[production] |
07:20 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43356 and previous config saved to /var/cache/conftool/dbconfig/20230126-072017-root.json |
[production] |
07:18 |
<root@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1001.eqiad.wmnet with reason: m1 switchover |
[production] |
07:17 |
<root@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on backup1001.eqiad.wmnet with reason: m1 switchover |
[production] |
07:17 |
<root@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backupmon1001.eqiad.wmnet with reason: m1 switchover |
[production] |
07:17 |
<root@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on backupmon1001.eqiad.wmnet with reason: m1 switchover |
[production] |
07:16 |
<marostegui@deploy1002> |
marostegui: Backport for [[gerrit:883699|ProductionServices.php: Depool pc2011 (T327925)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet |
[production] |
07:14 |
<marostegui@deploy1002> |
Started scap: Backport for [[gerrit:883699|ProductionServices.php: Depool pc2011 (T327925)]] |
[production] |
07:12 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2132,2160].codfw.wmnet,db[1117,1176,1195].eqiad.wmnet with reason: Primary switchover m1 T327800 |
[production] |
07:12 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on db[2132,2160].codfw.wmnet,db[1117,1176,1195].eqiad.wmnet with reason: Primary switchover m1 T327800 |
[production] |
07:05 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43354 and previous config saved to /var/cache/conftool/dbconfig/20230126-070512-root.json |
[production] |
07:02 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Add some weight to db1103', diff saved to https://phabricator.wikimedia.org/P43353 and previous config saved to /var/cache/conftool/dbconfig/20230126-070220-marostegui.json |
[production] |
07:01 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1120 T327861', diff saved to https://phabricator.wikimedia.org/P43352 and previous config saved to /var/cache/conftool/dbconfig/20230126-070158-root.json |
[production] |
07:00 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Promote db1103 to x1 primary and set section read-write T327861', diff saved to https://phabricator.wikimedia.org/P43351 and previous config saved to /var/cache/conftool/dbconfig/20230126-070035-marostegui.json |
[production] |
07:00 |
<marostegui> |
Starting x1 eqiad failover from db1120 to db1103 - T327861 |
[production] |
06:48 |
<brett@cumin1001> |
conftool action : set/pooled=yes; selector: name=cp6015.drmrs.wmnet |
[production] |
06:48 |
<brett@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6015.drmrs.wmnet with OS bullseye |
[production] |
06:32 |
<ladsgroup@deploy1002> |
Synchronized private/PrivateSettings.php: Rotating wikiuser password (T326802) (duration: 07m 23s) |
[production] |
06:20 |
<brett@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage |
[production] |
06:18 |
<brett@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage |
[production] |
06:17 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Set db1103 with weight 0 T327861', diff saved to https://phabricator.wikimedia.org/P43350 and previous config saved to /var/cache/conftool/dbconfig/20230126-061751-root.json |
[production] |
06:17 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 T327861 |
[production] |
06:16 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 T327861 |
[production] |
05:57 |
<brett@cumin1001> |
START - Cookbook sre.hosts.reimage for host cp6015.drmrs.wmnet with OS bullseye |
[production] |
05:53 |
<brett@cumin1001> |
conftool action : set/pooled=yes; selector: name=cp6006.drmrs.wmnet |
[production] |
05:53 |
<brett@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6006.drmrs.wmnet with OS bullseye |
[production] |
05:32 |
<brett@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6006.drmrs.wmnet with reason: host reimage |
[production] |
05:28 |
<brett@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cp6006.drmrs.wmnet with reason: host reimage |
[production] |
05:10 |
<brett@cumin1001> |
START - Cookbook sre.hosts.reimage for host cp6006.drmrs.wmnet with OS bullseye |
[production] |
05:09 |
<brett@cumin1001> |
conftool action : set/pooled=yes; selector: name=cp6014.drmrs.wmnet |
[production] |
05:07 |
<brett@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6014.drmrs.wmnet with OS bullseye |
[production] |
04:45 |
<brett@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage |
[production] |
04:42 |
<brett@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage |
[production] |
04:24 |
<brett@cumin1001> |
START - Cookbook sre.hosts.reimage for host cp6014.drmrs.wmnet with OS bullseye |
[production] |
04:22 |
<brett@cumin1001> |
conftool action : set/pooled=yes; selector: name=cp6005.drmrs.wmnet |
[production] |
04:17 |
<brett@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6005.drmrs.wmnet with OS bullseye |
[production] |
03:52 |
<brett@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6005.drmrs.wmnet with reason: host reimage |
[production] |
03:49 |
<brett@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cp6005.drmrs.wmnet with reason: host reimage |
[production] |
03:29 |
<brett@cumin1001> |
START - Cookbook sre.hosts.reimage for host cp6005.drmrs.wmnet with OS bullseye |
[production] |
03:27 |
<brett@cumin1001> |
conftool action : set/pooled=yes; selector: name=cp6013.drmrs.wmnet |
[production] |
03:27 |
<ejegg> |
payments-wiki upgraded from 08b8c3bc to 82d89841 |
[production] |
03:26 |
<brett@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6013.drmrs.wmnet with OS bullseye |
[production] |
03:04 |
<brett@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage |
[production] |
03:01 |
<brett@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage |
[production] |
02:41 |
<brett@cumin1001> |
START - Cookbook sre.hosts.reimage for host cp6013.drmrs.wmnet with OS bullseye |
[production] |
02:30 |
<sukhe@cumin2002> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye |
[production] |
02:17 |
<sukhe@cumin2002> |
START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye |
[production] |
02:17 |
<sukhe@cumin2002> |
END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye |
[production] |
01:58 |
<ejegg> |
restarted fundraising scheduled jobs after queue server reboot |
[production] |