2021-08-09
§
|
08:46 |
<godog> |
upgrade prometheus on prometheus2004 - T222113 |
[production] |
08:41 |
<godog> |
upgrade prometheus on prometheus1004 - T222113 |
[production] |
08:36 |
<jynus@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov2002.codfw.wmnet with reason: REIMAGE |
[production] |
08:34 |
<jynus@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov2002.codfw.wmnet with reason: REIMAGE |
[production] |
08:24 |
<marostegui> |
Upgrade db1117 (all sections) to 10.4.19 |
[production] |
08:03 |
<ariel@deploy1002> |
Finished deploy [dumps/dumps@142e91c]: fix for T288192 runnerutils bug (duration: 00m 03s) |
[production] |
08:03 |
<ariel@deploy1002> |
Started deploy [dumps/dumps@142e91c]: fix for T288192 runnerutils bug |
[production] |
07:52 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repool db1160 T288273', diff saved to https://phabricator.wikimedia.org/P16971 and previous config saved to /var/cache/conftool/dbconfig/20210809-075212-marostegui.json |
[production] |
07:50 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
07:45 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
07:30 |
<ladsgroup@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:710919|Enable shellbox for constraint for all of wikidata (T176312)]] (duration: 00m 58s) |
[production] |
07:15 |
<marostegui> |
Stop db1117:3323 to clone db1107 - T288197 |
[production] |
07:05 |
<kart__> |
Updated cxserver to 2021-08-06-062053-production (T288272) |
[production] |
07:04 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1107.eqiad.wmnet with reason: REIMAGE |
[production] |
07:02 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on db1107.eqiad.wmnet with reason: REIMAGE |
[production] |
06:53 |
<kartik@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' . |
[production] |
06:45 |
<kartik@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' . |
[production] |
05:56 |
<XioNoX> |
enable cloudsw1-c8 interfaces toward cloudsw2-c8 - T277340 |
[production] |
05:23 |
<marostegui> |
Lag in s4 (commonswiki) will appear on clouddb* hosts (wiki replicas) T288273 |
[production] |
05:22 |
<marostegui> |
Optimize commonswiki.image on eqiad, lag will appear - T288273 |
[production] |
2021-08-06
§
|
19:17 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
19:12 |
<cmjohnson@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
19:04 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
18:53 |
<cmjohnson@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
18:53 |
<cmjohnson@cumin1001> |
END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) |
[production] |
18:52 |
<cmjohnson@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
18:45 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
18:41 |
<cmjohnson@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
18:40 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
18:36 |
<cmjohnson@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
17:39 |
<brennen> |
gitlab: run ansible to apply [[gerrit:710529|remove backup warning for config backups]] (T288324) |
[production] |
16:59 |
<hnowlan@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=maps2005.codfw.wmnet |
[production] |
16:56 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
16:50 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
16:38 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts peek2001.codfw.wmnet |
[production] |
16:34 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
16:34 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1005.eqiad.wmnet with reason: Awaiting reimaging, depooled. |
[production] |
16:34 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1005.eqiad.wmnet with reason: Awaiting reimaging, depooled. |
[production] |
16:30 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
16:30 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts peek2001.codfw.wmnet |
[production] |
16:29 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 4:00:00 on peek2001.codfw.wmnet with reason: decom |
[production] |
16:29 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 8 days, 4:00:00 on peek2001.codfw.wmnet with reason: decom |
[production] |
16:03 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
16:02 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
15:14 |
<hnowlan> |
removing maps1005 from old maps cassandra cluster before reimaging |
[production] |
14:35 |
<hnowlan@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet |
[production] |
14:29 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2005.codfw.wmnet with reason: Reimaging |
[production] |
14:29 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime for 3:00:00 on maps2005.codfw.wmnet with reason: Reimaging |
[production] |
14:26 |
<hnowlan@cumin2002> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on maps2005.codfw.wmnet with reason: REIMAGE |
[production] |
14:24 |
<hnowlan@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on maps2005.codfw.wmnet with reason: REIMAGE |
[production] |