2020-06-25
§
|
09:53 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.reboot-single |
[production] |
09:37 |
<volans@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
09:34 |
<volans@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
09:28 |
<akosiaris> |
schedule downtime for eqiad wikifeeds as it's flapping too much without yet knowing why. T256358 |
[production] |
09:28 |
<godog> |
extend lv on thanos-fe2001 and restart thanos-compact |
[production] |
09:21 |
<vgutierrez> |
rolling restart of ncredir instances to catch up on kernel updates |
[production] |
09:13 |
<joal@deploy1001> |
Finished deploy [analytics/refinery@4aba370] (thin): Analytics fix over weekly train THIN [analytics/refinery@4aba370] (duration: 00m 10s) |
[production] |
09:13 |
<joal@deploy1001> |
Started deploy [analytics/refinery@4aba370] (thin): Analytics fix over weekly train THIN [analytics/refinery@4aba370] |
[production] |
09:13 |
<joal@deploy1001> |
Finished deploy [analytics/refinery@4aba370]: Analytics fix over weekly train [analytics/refinery@4aba370] (duration: 16m 27s) |
[production] |
09:01 |
<vgutierrez> |
restarting acme-chief instances to catch up on kernel updates |
[production] |
08:56 |
<joal@deploy1001> |
Started deploy [analytics/refinery@4aba370]: Analytics fix over weekly train [analytics/refinery@4aba370] |
[production] |
08:42 |
<hashar> |
releases2002: restarted bacula-fd to take in account the puppet provided configuration # T247652 |
[production] |
08:14 |
<jynus> |
restarting bacula-dir on backup1001 |
[production] |
08:09 |
<akosiaris> |
restart etherpad-lite on etherpad1002 |
[production] |
08:03 |
<marostegui> |
Failover m1 from db1135 to db1097 - T254556 |
[production] |
07:52 |
<jynus> |
stop bacula-director on backup1001 for db maintenance T254556 |
[production] |
07:49 |
<akosiaris@cumin1001> |
END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) |
[production] |
07:49 |
<akosiaris@cumin1001> |
START - Cookbook sre.ganeti.makevm |
[production] |
07:49 |
<akosiaris@cumin1001> |
END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) |
[production] |
07:49 |
<akosiaris@cumin1001> |
START - Cookbook sre.ganeti.makevm |
[production] |
07:49 |
<akosiaris@cumin1001> |
END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) |
[production] |
07:48 |
<akosiaris@cumin1001> |
START - Cookbook sre.ganeti.makevm |
[production] |
07:48 |
<akosiaris@cumin1001> |
END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) |
[production] |
07:47 |
<akosiaris@cumin1001> |
START - Cookbook sre.ganeti.makevm |
[production] |
07:36 |
<elukey> |
reboot an-launcher1001 for kernel upgrades |
[production] |
07:18 |
<elukey> |
reboot kafkamon* vms for kernel upgrades |
[production] |
07:08 |
<marostegui> |
Start pre switchover steps on m1 T254556 |
[production] |
06:40 |
<elukey> |
reboot matomo1002 for kernel upgrades |
[production] |
06:35 |
<elukey> |
reboot archiva1002 (new vm, not yet in service) for kernel upgrades |
[production] |
06:34 |
<elukey> |
reboot archiva for kernel upgrades |
[production] |
06:31 |
<elukey> |
force puppet run on ores1003/1005 to restore celery (killed by the oom) |
[production] |
06:24 |
<elukey> |
reboot an-tool* vms for kernel upgrades |
[production] |
06:23 |
<elukey> |
reboot analytics-tool1004 for kernel upgrades (Superset host) |
[production] |
06:22 |
<elukey> |
reboot analytics-tool1001 for kernel upgrades |
[production] |
06:19 |
<elukey> |
execute ip addr flush ens5 on an-airflow1001 to clear RTNETLINK answers: File exists (error from ifup@ens5.service) |
[production] |
06:03 |
<elukey> |
reboot an-airflow1001 for kernel upgrades |
[production] |
04:26 |
<marostegui> |
Remove triggers from db2095:3312 - T238966 |
[production] |
04:25 |
<marostegui> |
Deploy schema change on s2 codfw - T238966 |
[production] |
00:48 |
<twentyafterfour> |
restart php-fpm on phab1001 to fix T256343 |
[production] |
00:12 |
<twentyafterfour> |
phabricator updated, all seems normal |
[production] |
00:11 |
<twentyafterfour> |
updating phabricator to release/2020-06-25/1, momentary (<1 minute) downtime expected. |
[production] |
2020-06-24
§
|
23:44 |
<mutante> |
releases2002 - systemctl stop jenkins, kill 15244 (rogue jenkins process), start jenkins with systemctl start jenkins (T247652) |
[production] |
23:43 |
<mutante> |
releases1002 - kill rogue jenkins process, start jenkins with systemctl start jenkins (T247652) |
[production] |
23:02 |
<mutante> |
releases1002/2002 - disabling puppet, removing failing cron job to pull deployment_charts (because /srv/deployment-charts does not exist yet) |
[production] |
21:45 |
<shdubsh> |
install mtail 3.0.0~rc35+wmf2 on logstash1007 - T255776 |
[production] |
20:42 |
<brennen@deploy1001> |
Synchronized php: group1 wikis to 1.35.0-wmf.38 (duration: 01m 06s) |
[production] |
20:41 |
<brennen@deploy1001> |
rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.38 |
[production] |
20:41 |
<brennen> |
train 1.35.0-wmf.38: attempting to roll forward to group1 after php-fpm restart on mw1287 (T256305, T254175) |
[production] |
20:32 |
<cdanis> |
restarting php-fpm on mw1287 T256305 |
[production] |
20:32 |
<bsitzmann@deploy1001> |
helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' . |
[production] |