2021-11-19
ยง
|
20:05 |
<legoktm@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts thumbor2002.codfw.wmnet |
[production] |
20:00 |
<dzahn@cumin1001> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2280.codfw.wmnet |
[production] |
19:55 |
<pt1979@cumin2002> |
START - Cookbook sre.hosts.reimage for host kubernetes2018.codfw.wmnet with OS stretch |
[production] |
19:51 |
<mutante> |
shutting down undead server mw2280 - not icinga and puppetdb but in debmonitor and still has IP and puppet cert |
[production] |
19:45 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts mw2280.codfw.wmnet |
[production] |
18:54 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001 |
[production] |
18:10 |
<andrew@deploy1002> |
Finished deploy [horizon/deploy@ba16257]: moving the proxy endpoint behind keystone (duration: 04m 19s) |
[production] |
18:06 |
<andrew@deploy1002> |
Started deploy [horizon/deploy@ba16257]: moving the proxy endpoint behind keystone |
[production] |
17:45 |
<pt1979@cumin2002> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
17:41 |
<pt1979@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
17:25 |
<andrew@deploy1002> |
Finished deploy [horizon/deploy@ee83e27]: fixing sudo rule editing (duration: 04m 10s) |
[production] |
17:21 |
<andrew@deploy1002> |
Started deploy [horizon/deploy@ee83e27]: fixing sudo rule editing |
[production] |
17:19 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
17:10 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
16:54 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
16:50 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
16:42 |
<thcipriani@deploy1002> |
rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.38.0-wmf.9 refs T293950 T296098" |
[production] |
16:35 |
<thcipriani> |
rolling back to group0 for T296098 |
[production] |
16:20 |
<hnowlan@cumin1001> |
START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001 |
[production] |
15:31 |
<akosiaris> |
roll restart wtp10* php7.2-fpm excluding wtp1025, wtp1041 |
[production] |
15:29 |
<akosiaris> |
depooling wtp1041, wtp1025 from traffic. The entire of the parsoid cluster is in a memory pressure situation, it looks like a rolling restart of php-fpm will alleviate the pressure and gives us some time to drill more on the problem before the pressure builds up again. |
[production] |
15:28 |
<akosiaris@cumin1001> |
conftool action : set/pooled=no; selector: cluster=parsoid,name=wtp1025.eqiad.wmnet |
[production] |
15:28 |
<akosiaris@cumin1001> |
conftool action : set/pooled=no; selector: cluster=parsoid,name=wtp1041.eqiad.wmnet |
[production] |
14:52 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet |
[production] |
14:49 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.addnode for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet |
[production] |
14:44 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet |
[production] |
14:39 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet |
[production] |
14:30 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test2001.codfw.wmnet with OS buster |
[production] |
14:15 |
<jayme> |
fleet wide updated wmf-certificates to 0~20211119-1 |
[production] |
13:56 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reimage for host ganeti-test2001.codfw.wmnet with OS buster |
[production] |
13:23 |
<moritzm> |
draining instances from ganeti-test2001 for reimage T284811 |
[production] |
13:02 |
<jgiannelos@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . |
[production] |
12:10 |
<jgiannelos@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . |
[production] |
12:06 |
<jgiannelos@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . |
[production] |
11:54 |
<hnowlan> |
roll-restarting cassandra on eqiad maps for java updates |
[production] |
11:36 |
<jayme> |
imported wmf-certificates 0~20211119-1 to stretch-wikimedia,buster-wikimedia,bullseye-wikimedia |
[production] |
09:53 |
<XioNoX> |
run `commit full` on asw-b-codfw - T295118 |
[production] |
09:30 |
<XioNoX> |
re-enable cr2-codfw<->asw-b7-codfw link after disabling inet6 on cr2-codfw:ae2 - T295118 |
[production] |
09:06 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. |
[production] |
08:46 |
<elukey@cumin1001> |
START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. |
[production] |
08:31 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
08:30 |
<ayounsi@cumin1001> |
END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: update wmf-netbox - ayounsi@cumin1001 |
[production] |
08:29 |
<ayounsi@cumin1001> |
START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: update wmf-netbox - ayounsi@cumin1001 |
[production] |
08:27 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
08:26 |
<ladsgroup@deploy1002> |
Synchronized php-1.38.0-wmf.9/includes: Backport: [[gerrit:739841|Revert "Title: use PageStore instead of LinkCache"]] (duration: 01m 03s) |
[production] |
08:23 |
<ayounsi@deploy1002> |
Finished deploy [homer/deploy@dc007aa]: Homer CR738905 (duration: 01m 25s) |
[production] |
08:22 |
<ayounsi@deploy1002> |
Started deploy [homer/deploy@dc007aa]: Homer CR738905 |
[production] |
08:17 |
<moritzm> |
installing mariadb-10.5 security updates on bullseye (as packaged in Debian, not the wmf-internal packages) |
[production] |
06:55 |
<marostegui> |
Reboot db1132 to pick up new kernel T288720 |
[production] |
06:23 |
<marostegui> |
Upgrade clouddb1019 |
[production] |