8401-8450 of 10000 results (46ms)
2020-06-25 §
10:00 <akosiaris@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
10:00 <akosiaris@cumin1001> START - Cookbook sre.hosts.downtime [production]
09:59 <jmm@cumin2001> START - Cookbook sre.hosts.reboot-single [production]
09:58 <jmm@cumin2001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [production]
09:57 <akosiaris@cumin1001> END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) [production]
09:57 <akosiaris@cumin1001> START - Cookbook sre.ganeti.makevm [production]
09:53 <jmm@cumin2001> START - Cookbook sre.hosts.reboot-single [production]
09:37 <volans@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
09:34 <volans@cumin1001> START - Cookbook sre.dns.netbox [production]
09:28 <akosiaris> schedule downtime for eqiad wikifeeds as it's flapping too much without yet knowing why. T256358 [production]
09:28 <godog> extend lv on thanos-fe2001 and restart thanos-compact [production]
09:21 <vgutierrez> rolling restart of ncredir instances to catch up on kernel updates [production]
09:13 <joal@deploy1001> Finished deploy [analytics/refinery@4aba370] (thin): Analytics fix over weekly train THIN [analytics/refinery@4aba370] (duration: 00m 10s) [production]
09:13 <joal@deploy1001> Started deploy [analytics/refinery@4aba370] (thin): Analytics fix over weekly train THIN [analytics/refinery@4aba370] [production]
09:13 <joal@deploy1001> Finished deploy [analytics/refinery@4aba370]: Analytics fix over weekly train [analytics/refinery@4aba370] (duration: 16m 27s) [production]
09:01 <vgutierrez> restarting acme-chief instances to catch up on kernel updates [production]
08:56 <joal@deploy1001> Started deploy [analytics/refinery@4aba370]: Analytics fix over weekly train [analytics/refinery@4aba370] [production]
08:42 <hashar> releases2002: restarted bacula-fd to take in account the puppet provided configuration # T247652 [production]
08:14 <jynus> restarting bacula-dir on backup1001 [production]
08:09 <akosiaris> restart etherpad-lite on etherpad1002 [production]
08:03 <marostegui> Failover m1 from db1135 to db1097 - T254556 [production]
07:52 <jynus> stop bacula-director on backup1001 for db maintenance T254556 [production]
07:49 <akosiaris@cumin1001> END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) [production]
07:49 <akosiaris@cumin1001> START - Cookbook sre.ganeti.makevm [production]
07:49 <akosiaris@cumin1001> END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) [production]
07:49 <akosiaris@cumin1001> START - Cookbook sre.ganeti.makevm [production]
07:49 <akosiaris@cumin1001> END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) [production]
07:48 <akosiaris@cumin1001> START - Cookbook sre.ganeti.makevm [production]
07:48 <akosiaris@cumin1001> END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) [production]
07:47 <akosiaris@cumin1001> START - Cookbook sre.ganeti.makevm [production]
07:36 <elukey> reboot an-launcher1001 for kernel upgrades [production]
07:18 <elukey> reboot kafkamon* vms for kernel upgrades [production]
07:08 <marostegui> Start pre switchover steps on m1 T254556 [production]
06:40 <elukey> reboot matomo1002 for kernel upgrades [production]
06:35 <elukey> reboot archiva1002 (new vm, not yet in service) for kernel upgrades [production]
06:34 <elukey> reboot archiva for kernel upgrades [production]
06:31 <elukey> force puppet run on ores1003/1005 to restore celery (killed by the oom) [production]
06:24 <elukey> reboot an-tool* vms for kernel upgrades [production]
06:23 <elukey> reboot analytics-tool1004 for kernel upgrades (Superset host) [production]
06:22 <elukey> reboot analytics-tool1001 for kernel upgrades [production]
06:19 <elukey> execute ip addr flush ens5 on an-airflow1001 to clear RTNETLINK answers: File exists (error from ifup@ens5.service) [production]
06:03 <elukey> reboot an-airflow1001 for kernel upgrades [production]
04:26 <marostegui> Remove triggers from db2095:3312 - T238966 [production]
04:25 <marostegui> Deploy schema change on s2 codfw - T238966 [production]
00:48 <twentyafterfour> restart php-fpm on phab1001 to fix T256343 [production]
00:12 <twentyafterfour> phabricator updated, all seems normal [production]
00:11 <twentyafterfour> updating phabricator to release/2020-06-25/1, momentary (<1 minute) downtime expected. [production]
2020-06-24 §
23:44 <mutante> releases2002 - systemctl stop jenkins, kill 15244 (rogue jenkins process), start jenkins with systemctl start jenkins (T247652) [production]
23:43 <mutante> releases1002 - kill rogue jenkins process, start jenkins with systemctl start jenkins (T247652) [production]
23:02 <mutante> releases1002/2002 - disabling puppet, removing failing cron job to pull deployment_charts (because /srv/deployment-charts does not exist yet) [production]