2022-01-15
§
|
00:46 |
<jforrester@deploy1002> |
Finished scap: Revert "LinksUpdate refactor" and follow-ups for T299244 re. T293958 (duration: 03m 58s) |
[production] |
00:45 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
00:45 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn |
[production] |
00:44 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
00:42 |
<jforrester@deploy1002> |
Started scap: Revert "LinksUpdate refactor" and follow-ups for T299244 re. T293958 |
[production] |
00:28 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn |
[production] |
00:27 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
00:27 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn |
[production] |
00:26 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn |
[production] |
00:14 |
<dduvall@deploy1002> |
rebuilt and synchronized wikiversions files: Revert "all/group1 wikis to 1.38.0-wmf.17" |
[production] |
2022-01-14
§
|
23:07 |
<ryankemper@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2051.codfw.wmnet with OS stretch |
[production] |
22:26 |
<ryankemper@cumin2002> |
START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch |
[production] |
18:09 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15 days, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing |
[production] |
18:09 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime for 15 days, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing |
[production] |
17:44 |
<bblack> |
drmrs asw: removed native-vlan-id from config on secondary (x-rack) interfaces of lvses to debug network issue |
[production] |
17:26 |
<bblack> |
reboot lvs600[23] |
[production] |
16:55 |
<bblack> |
reboot lvs6001 |
[production] |
16:30 |
<bblack> |
rebooting cp60xx where x is 6, 7, 8, 14, 15, 16 (downtimed) |
[production] |
16:15 |
<dancy@deploy1002> |
Synchronized README: Testing php-fpm restart (duration: 03m 18s) |
[production] |
16:04 |
<hnowlan@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster |
[production] |
15:40 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster |
[production] |
15:39 |
<bblack> |
lvs6001 + all services downtimed |
[production] |
15:29 |
<bblack@cumin1001> |
conftool action : set/pooled=yes; selector: dc=drmrs |
[production] |
15:00 |
<bblack> |
silenced site=drmrs in alertmanager for one month, I think |
[production] |
15:00 |
<bblack> |
silenced site=drmrs in alertmanager, I think |
[production] |
13:31 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2011.codfw.wmnet with OS bullseye |
[production] |
13:20 |
<hnowlan@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster |
[production] |
12:59 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.reimage for host pc2011.codfw.wmnet with OS bullseye |
[production] |
12:53 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster |
[production] |
12:51 |
<hnowlan@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster |
[production] |
12:49 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1024.eqiad.wmnet with OS buster |
[production] |
12:22 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reimage for host ganeti1024.eqiad.wmnet with OS buster |
[production] |
12:20 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster |
[production] |
12:18 |
<hnowlan@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster |
[production] |
11:51 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster |
[production] |
11:49 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing |
[production] |
11:48 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing |
[production] |
11:45 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1023.eqiad.wmnet with OS buster |
[production] |
11:18 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reimage for host ganeti1023.eqiad.wmnet with OS buster |
[production] |
11:01 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM archiva1002.wikimedia.org |
[production] |
11:00 |
<moritzm> |
systemctl reset-failed ifup@ens5.service on archiva1002 T273026 |
[production] |
10:56 |
<moritzm> |
rebooting archiva1002 (running archiva.wikimedia.org) |
[production] |
10:56 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.reboot-vm for VM archiva1002.wikimedia.org |
[production] |
10:55 |
<bking@cumin2002> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch |
[production] |
10:50 |
<moritzm> |
systemctl reset-failed ifup@ens5.service on an-test-ui1001 T273026 |
[production] |
10:50 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-ui1001.eqiad.wmnet |
[production] |
10:42 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.reboot-vm for VM an-test-ui1001.eqiad.wmnet |
[production] |
10:21 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-test-presto1001.eqiad.wmnet |
[production] |
10:17 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.reboot-vm for VM an-test-presto1001.eqiad.wmnet |
[production] |
10:07 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM matomo1002.eqiad.wmnet |
[production] |