1151-1200 of 10000 results (45ms)
2022-01-15 §
00:57 <dduvall@deploy1002> rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.17 refs T293958 [production]
00:57 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn [production]
00:52 <dduvall@deploy1002> Synchronized php: group1 wikis to 1.38.0-wmf.17 refs T293958 (duration: 00m 52s) [production]
00:51 <dduvall@deploy1002> rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.17 refs T293958 [production]
00:46 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn [production]
00:46 <jforrester@deploy1002> Finished scap: Revert "LinksUpdate refactor" and follow-ups for T299244 re. T293958 (duration: 03m 58s) [production]
00:45 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn [production]
00:45 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn [production]
00:44 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn [production]
00:42 <jforrester@deploy1002> Started scap: Revert "LinksUpdate refactor" and follow-ups for T299244 re. T293958 [production]
00:28 <mwdebug-deploy@deploy1002> helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn [production]
00:27 <mwdebug-deploy@deploy1002> helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn [production]
00:27 <mwdebug-deploy@deploy1002> helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn [production]
00:26 <mwdebug-deploy@deploy1002> helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn [production]
00:14 <dduvall@deploy1002> rebuilt and synchronized wikiversions files: Revert "all/group1 wikis to 1.38.0-wmf.17" [production]
2022-01-14 §
23:07 <ryankemper@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2051.codfw.wmnet with OS stretch [production]
22:26 <ryankemper@cumin2002> START - Cookbook sre.hosts.reimage for host elastic2051.codfw.wmnet with OS stretch [production]
18:09 <hnowlan@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15 days, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing [production]
18:09 <hnowlan@cumin1001> START - Cookbook sre.hosts.downtime for 15 days, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing [production]
17:44 <bblack> drmrs asw: removed native-vlan-id from config on secondary (x-rack) interfaces of lvses to debug network issue [production]
17:26 <bblack> reboot lvs600[23] [production]
16:55 <bblack> reboot lvs6001 [production]
16:30 <bblack> rebooting cp60xx where x is 6, 7, 8, 14, 15, 16 (downtimed) [production]
16:15 <dancy@deploy1002> Synchronized README: Testing php-fpm restart (duration: 03m 18s) [production]
16:04 <hnowlan@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster [production]
15:40 <hnowlan@cumin1001> START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster [production]
15:39 <bblack> lvs6001 + all services downtimed [production]
15:29 <bblack@cumin1001> conftool action : set/pooled=yes; selector: dc=drmrs [production]
15:00 <bblack> silenced site=drmrs in alertmanager for one month, I think [production]
15:00 <bblack> silenced site=drmrs in alertmanager, I think [production]
13:31 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2011.codfw.wmnet with OS bullseye [production]
13:20 <hnowlan@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster [production]
12:59 <marostegui@cumin1001> START - Cookbook sre.hosts.reimage for host pc2011.codfw.wmnet with OS bullseye [production]
12:53 <hnowlan@cumin1001> START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster [production]
12:51 <hnowlan@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster [production]
12:49 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1024.eqiad.wmnet with OS buster [production]
12:22 <jmm@cumin2002> START - Cookbook sre.hosts.reimage for host ganeti1024.eqiad.wmnet with OS buster [production]
12:20 <hnowlan@cumin1001> START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster [production]
12:18 <hnowlan@cumin1001> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2009.codfw.wmnet with OS buster [production]
11:51 <hnowlan@cumin1001> START - Cookbook sre.hosts.reimage for host restbase2009.codfw.wmnet with OS buster [production]
11:49 <hnowlan@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing [production]
11:48 <hnowlan@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on restbase2009.codfw.wmnet with reason: not in restbase cluster, used for testing [production]
11:45 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1023.eqiad.wmnet with OS buster [production]
11:18 <jmm@cumin2002> START - Cookbook sre.hosts.reimage for host ganeti1023.eqiad.wmnet with OS buster [production]
11:01 <jmm@cumin2002> END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM archiva1002.wikimedia.org [production]
11:00 <moritzm> systemctl reset-failed ifup@ens5.service on archiva1002 T273026 [production]
10:56 <moritzm> rebooting archiva1002 (running archiva.wikimedia.org) [production]
10:56 <jmm@cumin2002> START - Cookbook sre.ganeti.reboot-vm for VM archiva1002.wikimedia.org [production]
10:55 <bking@cumin2002> END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2051.codfw.wmnet with OS stretch [production]
10:50 <moritzm> systemctl reset-failed ifup@ens5.service on an-test-ui1001 T273026 [production]