2021-09-07
ยง
|
17:18 |
<jgiannelos@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' . |
[production] |
17:09 |
<jgiannelos@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' . |
[production] |
17:01 |
<jgiannelos@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' . |
[production] |
16:39 |
<moritzm> |
installing jetty9 security updates on buster |
[production] |
16:30 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue |
[production] |
16:30 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue |
[production] |
16:30 |
<dancy@deploy1002> |
Synchronized README: testing (duration: 00m 59s) |
[production] |
15:18 |
<akosiaris> |
run_benchmarky.py against mwdebug.svc.codfw.wmnet for performance tests |
[production] |
15:07 |
<akosiaris@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
15:04 |
<jbond> |
upload python-prometheus-client_0.6.0 to stretch-wikimedia |
[production] |
14:50 |
<mutante> |
snapshot1015 - manually removed prometheus-puppet-agent-stats from crontab which was sending spam and is now a timer |
[production] |
14:33 |
<mutante> |
CI - migrating zuul-merger cronjob to systemd timer (contint*) |
[production] |
14:23 |
<XioNoX> |
re-pool esams-eqiad - T288503 |
[production] |
14:23 |
<cmjohnson@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1024.eqiad.wmnet with reason: REIMAGE |
[production] |
14:23 |
<cmjohnson@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1024.eqiad.wmnet with reason: REIMAGE |
[production] |
14:22 |
<cmjohnson@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1023.eqiad.wmnet with reason: REIMAGE |
[production] |
14:22 |
<cmjohnson@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1023.eqiad.wmnet with reason: REIMAGE |
[production] |
14:17 |
<marostegui> |
No more db maintenance on eqiad T288594 |
[production] |
14:08 |
<mutante> |
alert1001 - temp disabled puppet, stopped icinga-wm |
[production] |
14:07 |
<mutante> |
temp killed icinga-wm because of flooding |
[production] |
14:01 |
<Emperor> |
removing pc2010 from orchestrator T289117 |
[production] |
13:59 |
<Emperor> |
removing pc2010 from tendril and zarcillo T289117 |
[production] |
13:57 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
13:57 |
<XioNoX> |
drain esams-eqiad for circuit maintenance - T288503 |
[production] |
13:54 |
<pt1979@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
13:51 |
<jayme> |
uncordoned kubestage2001 |
[production] |
13:50 |
<jiji@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
13:49 |
<mutante> |
mw2264 - scap pulled and repooled after T290242 |
[production] |
13:49 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2264.codfw.wmnet |
[production] |
13:43 |
<jiji@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
13:40 |
<mvernon@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc2010.codfw.wmnet |
[production] |
13:25 |
<mvernon@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts pc2010.codfw.wmnet |
[production] |
13:21 |
<Emperor> |
removing pc2009 from orchestrator T289116 |
[production] |
13:21 |
<Emperor> |
removing pc2009 from tendril and zarcillo T289116 |
[production] |
13:02 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'fix s8 weights T288594', diff saved to https://phabricator.wikimedia.org/P17248 and previous config saved to /var/cache/conftool/dbconfig/20210907-130244-marostegui.json |
[production] |
12:59 |
<mvernon@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc2009.codfw.wmnet |
[production] |
12:51 |
<mvernon@deploy1002> |
Synchronized wmf-config/ProductionServices.php: Remove old decommissioned pc hosts T284825 (duration: 01m 02s) |
[production] |
12:45 |
<mvernon@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts pc2009.codfw.wmnet |
[production] |
12:27 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'fix s1 weights T288594', diff saved to https://phabricator.wikimedia.org/P17247 and previous config saved to /var/cache/conftool/dbconfig/20210907-122747-marostegui.json |
[production] |
12:27 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'fix s1 weights T288594', diff saved to https://phabricator.wikimedia.org/P17246 and previous config saved to /var/cache/conftool/dbconfig/20210907-122708-marostegui.json |
[production] |
11:46 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts |
[production] |
11:46 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.remove-downtime for 6 hosts |
[production] |
11:36 |
<awight> |
EU backport complete |
[production] |
11:33 |
<awight@deploy1002> |
Synchronized php-1.37.0-wmf.21/extensions/CodeMirror/extension.json: Backport: [[gerrit:719170|Change line numbers default to null (T290226)]] (duration: 00m 59s) |
[production] |
11:28 |
<awight@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:717192|Set template namespace for code mirror line numbering (T290226)]] (duration: 00m 59s) |
[production] |
10:51 |
<Emperor> |
removing pc2008 from orchestrator T289115 |
[production] |
10:49 |
<Emperor> |
removing pc2008 from tendril and zarcillo T289115 |
[production] |
10:46 |
<mvernon@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc2008.codfw.wmnet |
[production] |
10:35 |
<mvernon@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts pc2008.codfw.wmnet |
[production] |
10:29 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on 6 hosts with reason: commissioning aqs_new hosts |
[production] |