201-250 of 10000 results (22ms)
2020-11-23 §
11:25 <hnowlan> starting cassandra bootstrap of maps2008 [production]
11:20 <effie> enable puppet on cp* hosts [production]
11:16 <moritzm> installing poppler security updates on stretch [production]
11:13 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) [production]
11:13 <elukey@cumin1001> START - Cookbook sre.hosts.decommission [production]
11:05 <XioNoX> eqiad row A, standardize interfaces descriptions and ranges order [production]
10:35 <jmm@cumin2001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [production]
10:26 <effie> disable puppet on cp* hosts to merge 641730 [production]
10:26 <jmm@cumin2001> START - Cookbook sre.hosts.reboot-single [production]
10:26 <moritzm> rebooting serpens [production]
10:21 <XioNoX> eqiad row B, split LVS, Ganeti, Cloud, interface-ranges to individual terms [production]
09:48 <XioNoX> eqiad row B, standardize interfaces descriptions and ranges order [production]
08:46 <elukey> drop kerberos keytabs for analytics10[28-41] from krb1001:/srv/kerberos/keytabs, decommed nodes (old hadoop test cluster) [production]
08:43 <godog> start stress testing on ms-be106* - T268435 [production]
08:41 <elukey> drop kerberos principals from krb1001 for analytics10[29-41], decommed nodes (old hadoop test cluster) [production]
08:36 <elukey> drop analytics1028's krb principals from krb1001 - old decommed node [production]
08:35 <moritzm> installing remaining krb5 security updates for Stretch [production]
07:27 <marostegui> Stop MySQL on db1125:3314 to clone clouddb1015 and clouddb1019 - lag will appear on Commosnwiki on wikireplicas - T267090 [production]
07:06 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [production]
07:00 <marostegui@cumin1001> START - Cookbook sre.hosts.decommission [production]
06:46 <marostegui> Restart clouddb1013 clouddb1015 clouddb1017 clouddb1019 for testing T267090 [production]
2020-11-21 §
09:18 <joal> Drop historical logs of 'Wikidata Concepts Monitor ETL' on HDFS keeping one example - freeing 60Tb [production]
09:17 <joal> Drop historical logs of ' [production]
08:28 <ariel@deploy1001> Finished deploy [dumps/dumps@1a76a9a]: revinfo updates (duration: 00m 05s) [production]
08:28 <ariel@deploy1001> Started deploy [dumps/dumps@1a76a9a]: revinfo updates [production]
08:10 <elukey> remove big stderrlog fine in /var/lib/hadoop/data/d/yarn/logs/application_1605880843685_1450 on an-worker1110 [production]
08:05 <elukey> remove big stderrlog fine in /var/lib/hadoop/data/e/yarn/logs/application_1605880843685_1450 on an-worker1105 [production]
2020-11-20 §
23:38 <mutante> synced puppet-compiler facts - new hosts should be usable in compiler [production]
22:30 <mutante> cumin1001 - sudo systemctl start cumin-check-aliases -> <+icinga-wm> RECOVERY - Check systemd state on cumin1001 is OK T268369 [production]
21:30 <razzi@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [production]
20:26 <razzi@cumin1001> START - Cookbook sre.ganeti.makevm [production]
20:09 <razzi@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [production]
19:52 <mutante> releases2002 - systemctl disable wmf_auto_restart_rsync; rm /usr/lib/systemd/system/wmf_auto_restart_rsync.* ; systemctl daemon-reload ; systemctl reset-failed - clear up systemd unit that was not absented and fix Icinga alerts [production]
19:45 <mutante> releases2002 systemctl reset-failed (wmf_auto_restart_rsync.service failed but hopefully fixed) [production]
19:39 <mutante> Icinga: ACKing all the "unhandled CRIT" alerts on clouddb* an an-coord* that have disabled notifications to remove monitoring noise. from 72 to 25 active alerts [production]
19:14 <razzi@cumin1001> START - Cookbook sre.ganeti.makevm [production]
18:47 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [production]
18:42 <elukey@cumin1001> START - Cookbook sre.hosts.decommission [production]
18:37 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [production]
18:36 <razzi@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [production]
18:31 <elukey@cumin1001> START - Cookbook sre.hosts.decommission [production]
18:31 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [production]
18:18 <elukey@cumin1001> START - Cookbook sre.hosts.decommission [production]
18:14 <dwisehaupt> shifting 100% of thank_you mail through frmxs ahead of tomorrow's banner test - T267259 [production]
17:37 <pt1979@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
17:35 <pt1979@cumin2001> START - Cookbook sre.hosts.downtime [production]
17:32 <razzi@cumin1001> START - Cookbook sre.ganeti.makevm [production]
17:24 <razzi@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [production]
16:48 <volans@cumin1001> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [production]
16:40 <volans@cumin1001> START - Cookbook sre.hosts.decommission [production]