1-50 of 10000 results (26ms)
2021-11-19 ยง
23:52 <pt1979@cumin2002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus2005.codfw.wmnet with OS bullseye [production]
23:25 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye [production]
23:24 <pt1979@cumin2002> END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host prometheus2005.codfw.wmnet with OS bullseye [production]
23:15 <mutante> LDAP - added mmartorana to wmf (91354e9e-5706-4289-9a60-98e8a7632853) T295789 [production]
22:59 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye [production]
20:24 <pt1979@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2018.codfw.wmnet with OS stretch [production]
20:21 <mutante> phabricator - adding eigyan to WMF-NDA (phab projectt 61 - https://phabricator.wikimedia.org/project/members/61/ ) - since that is now standard when adding people to the wmf LDAP group (T295928) [production]
20:20 <legoktm@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor2002.codfw.wmnet [production]
20:05 <legoktm@cumin1001> START - Cookbook sre.hosts.decommission for hosts thumbor2002.codfw.wmnet [production]
20:00 <dzahn@cumin1001> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2280.codfw.wmnet [production]
19:55 <pt1979@cumin2002> START - Cookbook sre.hosts.reimage for host kubernetes2018.codfw.wmnet with OS stretch [production]
19:51 <mutante> shutting down undead server mw2280 - not icinga and puppetdb but in debmonitor and still has IP and puppet cert [production]
19:45 <dzahn@cumin1001> START - Cookbook sre.hosts.decommission for hosts mw2280.codfw.wmnet [production]
18:54 <hnowlan@cumin1001> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001 [production]
18:10 <andrew@deploy1002> Finished deploy [horizon/deploy@ba16257]: moving the proxy endpoint behind keystone (duration: 04m 19s) [production]
18:06 <andrew@deploy1002> Started deploy [horizon/deploy@ba16257]: moving the proxy endpoint behind keystone [production]
17:45 <pt1979@cumin2002> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
17:41 <pt1979@cumin2002> START - Cookbook sre.dns.netbox [production]
17:25 <andrew@deploy1002> Finished deploy [horizon/deploy@ee83e27]: fixing sudo rule editing (duration: 04m 10s) [production]
17:21 <andrew@deploy1002> Started deploy [horizon/deploy@ee83e27]: fixing sudo rule editing [production]
17:19 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
17:10 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
16:54 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
16:50 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
16:42 <thcipriani@deploy1002> rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.38.0-wmf.9 refs T293950 T296098" [production]
16:35 <thcipriani> rolling back to group0 for T296098 [production]
16:20 <hnowlan@cumin1001> START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001 [production]
15:31 <akosiaris> roll restart wtp10* php7.2-fpm excluding wtp1025, wtp1041 [production]
15:29 <akosiaris> depooling wtp1041, wtp1025 from traffic. The entire of the parsoid cluster is in a memory pressure situation, it looks like a rolling restart of php-fpm will alleviate the pressure and gives us some time to drill more on the problem before the pressure builds up again. [production]
15:28 <akosiaris@cumin1001> conftool action : set/pooled=no; selector: cluster=parsoid,name=wtp1025.eqiad.wmnet [production]
15:28 <akosiaris@cumin1001> conftool action : set/pooled=no; selector: cluster=parsoid,name=wtp1041.eqiad.wmnet [production]
14:52 <jmm@cumin2002> END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet [production]
14:49 <jmm@cumin2002> START - Cookbook sre.ganeti.addnode for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet [production]
14:44 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet [production]
14:39 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet [production]
14:30 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test2001.codfw.wmnet with OS buster [production]
14:15 <jayme> fleet wide updated wmf-certificates to 0~20211119-1 [production]
13:56 <jmm@cumin2002> START - Cookbook sre.hosts.reimage for host ganeti-test2001.codfw.wmnet with OS buster [production]
13:23 <moritzm> draining instances from ganeti-test2001 for reimage T284811 [production]
13:02 <jgiannelos@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . [production]
12:10 <jgiannelos@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . [production]
12:06 <jgiannelos@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' . [production]
11:54 <hnowlan> roll-restarting cassandra on eqiad maps for java updates [production]
11:36 <jayme> imported wmf-certificates 0~20211119-1 to stretch-wikimedia,buster-wikimedia,bullseye-wikimedia [production]
09:53 <XioNoX> run `commit full` on asw-b-codfw - T295118 [production]
09:30 <XioNoX> re-enable cr2-codfw<->asw-b7-codfw link after disabling inet6 on cr2-codfw:ae2 - T295118 [production]
09:06 <elukey@cumin1001> END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. [production]
08:46 <elukey@cumin1001> START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. [production]
08:31 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
08:30 <ayounsi@cumin1001> END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: update wmf-netbox - ayounsi@cumin1001 [production]