2020-11-16
§
|
23:28 |
<mutante> |
cumin1001 - sudo systemctl start cumin-check-aliases (to confirm switching cron to timer worked) T265138 |
[production] |
22:22 |
<otto@deploy1001> |
helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . |
[production] |
22:19 |
<otto@deploy1001> |
helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' . |
[production] |
22:19 |
<otto@deploy1001> |
helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' . |
[production] |
22:17 |
<otto@deploy1001> |
helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' . |
[production] |
22:09 |
<otto@deploy1001> |
helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' . |
[production] |
22:06 |
<mutante> |
planet - fixed updates of uk.planet which failed due to non-ASCII chars in a URL - since updates are systemd timers now that affects the entire systemd state monitoring |
[production] |
21:40 |
<rzl@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet |
[production] |
21:40 |
<rzl@cumin1001> |
conftool action : set/weight=1; selector: name=mw2250.codfw.wmnet,cluster=videoscaler,service=canary |
[production] |
21:38 |
<rzl@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet,cluster=jobrunner |
[production] |
21:30 |
<mutante> |
peek2001 - mv /var/lib/peek/git to git.old ; run puppet ; let it fix git checkout |
[production] |
21:07 |
<rzl> |
disable puppet on jobrunners T264991 |
[production] |
20:40 |
<mutante> |
planet1002/planet2002 - delete entire crontab of user planet, drop update cronjobs after switching to systemd timers with gerrit:636105 (T265138) |
[production] |
20:06 |
<pt1979@cumin2001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
20:06 |
<mutante> |
releases2002 systemctl reset-failed should clear Icinga systemd alert after gerrit:641228 |
[production] |
20:05 |
<dwisehaupt> |
disabling process-control jobs and moving to maintenance mode for maint window |
[production] |
19:57 |
<pt1979@cumin2001> |
START - Cookbook sre.dns.netbox |
[production] |
19:53 |
<ebernhardson@deploy1001> |
Finished deploy [wikimedia/discovery/analytics@4a953ca]: query_clicks_hourly: handle wmf.webrequest page_id change from int to bigint (duration: 02m 27s) |
[production] |
19:51 |
<ebernhardson@deploy1001> |
Started deploy [wikimedia/discovery/analytics@4a953ca]: query_clicks_hourly: handle wmf.webrequest page_id change from int to bigint |
[production] |
19:48 |
<effie> |
disable puppet on parsoid servers - T264991 |
[production] |
19:01 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) |
[production] |
18:59 |
<mutante> |
mw2255 - is pooled and puppet works on next run, after it removed php 7.2 config files |
[production] |
18:56 |
<mutante> |
running puppet on mw2313 and mw2255 which were listed in puppetboard as failed puppet runs |
[production] |
18:15 |
<rzl> |
disable puppet on 'A:mw-api and not A:mw-api-canary' T264991 |
[production] |
18:05 |
<effie> |
disable puppet on all appservers |
[production] |
17:48 |
<elukey> |
enable and run puppet on kafka-main2003 (it will start kafka services) - T267865 |
[production] |
17:42 |
<dwisehaupt> |
frmon1001 upgraded to buster |
[production] |
17:36 |
<volans> |
moved interfaces in Netbox from old to new switch - T267865 |
[production] |
17:24 |
<vgutierrez> |
switching back from lvs2010 to lvs2007 - T267865 |
[production] |
17:21 |
<vgutierrez> |
repooling cp2037 and cp2038 - T267865 |
[production] |
16:46 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) |
[production] |
16:40 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
16:16 |
<XioNoX> |
update c7 serial in row C VC config - T267865 |
[production] |
16:16 |
<rzl> |
disable puppet on A:mw-api-canary T264991 |
[production] |
16:14 |
<hnowlan@cumin1001> |
START - Cookbook sre.cassandra.roll-restart |
[production] |
16:08 |
<effie> |
disable puppet in appservers canaries to install ICU 63 - T264991 |
[production] |
16:07 |
<vgutierrez@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet |
[production] |
16:07 |
<vgutierrez@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=cp2037.codfw.wmnet |
[production] |
16:06 |
<hnowlan@cumin1001> |
END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) |
[production] |
16:03 |
<hnowlan> |
joined maps2006 to maps codfw cassandra cluster |
[production] |
16:01 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) |
[production] |
15:57 |
<hnowlan@cumin1001> |
START - Cookbook sre.cassandra.roll-restart |
[production] |
15:57 |
<hnowlan> |
roll-restarting eqiad restbase for java security updates |
[production] |
15:56 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) |
[production] |
15:50 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
15:40 |
<cdanis@cumin1001> |
END (PASS) - Cookbook sre.network.cf (exit_code=0) |
[production] |
15:40 |
<cdanis@cumin1001> |
START - Cookbook sre.network.cf |
[production] |
14:16 |
<hnowlan@cumin1001> |
START - Cookbook sre.cassandra.roll-restart |
[production] |
14:12 |
<marostegui@deploy1001> |
Synchronized wmf-config/db-eqiad.php: Repool pc1007 in pc1 after restarting mysql T266483 (duration: 00m 59s) |
[production] |