9601-9650 of 10000 results (50ms)
2020-11-23 §
11:25 <hnowlan> starting cassandra bootstrap of maps2008 [production]
11:20 <effie> enable puppet on cp* hosts [production]
11:16 <moritzm> installing poppler security updates on stretch [production]
11:13 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) [production]
11:13 <elukey@cumin1001> START - Cookbook sre.hosts.decommission [production]
11:12 <dcaro> Launching control-6, to replace control-3 (T267140) [toolsbeta]
11:05 <XioNoX> eqiad row A, standardize interfaces descriptions and ranges order [production]
10:45 <dcaro> Taking out control-2 node, replaced by control-5 (I saw one 503 reply on the proxy when creating control-5, fyi) (T267140) [toolsbeta]
10:35 <jmm@cumin2001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [production]
10:32 <dcaro> Creating new control-5 node (will replace control-2) (T267140) [toolsbeta]
10:26 <effie> disable puppet on cp* hosts to merge 641730 [production]
10:26 <jmm@cumin2001> START - Cookbook sre.hosts.reboot-single [production]
10:26 <moritzm> rebooting serpens [production]
10:21 <XioNoX> eqiad row B, split LVS, Ganeti, Cloud, interface-ranges to individual terms [production]
09:58 <dcaro> Remove control-1 node from the pool (was replaced by control-4) (T267140) [toolsbeta]
09:57 <dcaro> Remove control-1 node from the pool (was replaced by control-4) (T267195) [toolsbeta]
09:48 <XioNoX> eqiad row B, standardize interfaces descriptions and ranges order [production]
08:46 <elukey> drop kerberos keytabs for analytics10[28-41] from krb1001:/srv/kerberos/keytabs, decommed nodes (old hadoop test cluster) [production]
08:43 <godog> start stress testing on ms-be106* - T268435 [production]
08:41 <elukey> drop kerberos principals from krb1001 for analytics10[29-41], decommed nodes (old hadoop test cluster) [production]
08:36 <elukey> drop analytics1028's krb principals from krb1001 - old decommed node [production]
08:35 <moritzm> installing remaining krb5 security updates for Stretch [production]
07:27 <marostegui> Stop MySQL on db1125:3314 to clone clouddb1015 and clouddb1019 - lag will appear on Commosnwiki on wikireplicas - T267090 [production]
07:06 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [production]
07:00 <marostegui@cumin1001> START - Cookbook sre.hosts.decommission [production]
06:46 <marostegui> Restart clouddb1013 clouddb1015 clouddb1017 clouddb1019 for testing T267090 [production]
2020-11-22 §
17:40 <andrewbogott> apt-get upgrade on cloudservices1003/1004 [admin]
17:32 <andrewbogott> upgrading Designate on cloudservices1003/1004 to Stein [admin]
2020-11-21 §
21:25 <wm-bot> <lucaswerkmeister> deployed 1608cc4dd9 (gender-dependent messages) [tools.lexeme-forms]
09:18 <joal> Drop historical logs of 'Wikidata Concepts Monitor ETL' on HDFS keeping one example - freeing 60Tb [production]
09:17 <joal> Drop historical logs of ' [production]
08:28 <ariel@deploy1001> Finished deploy [dumps/dumps@1a76a9a]: revinfo updates (duration: 00m 05s) [production]
08:28 <ariel@deploy1001> Started deploy [dumps/dumps@1a76a9a]: revinfo updates [production]
08:10 <elukey> remove big stderrlog fine in /var/lib/hadoop/data/d/yarn/logs/application_1605880843685_1450 on an-worker1110 [analytics]
08:10 <elukey> remove big stderrlog fine in /var/lib/hadoop/data/d/yarn/logs/application_1605880843685_1450 on an-worker1110 [production]
08:05 <elukey> remove big stderrlog fine in /var/lib/hadoop/data/e/yarn/logs/application_1605880843685_1450 on an-worker1105 [analytics]
08:05 <elukey> remove big stderrlog fine in /var/lib/hadoop/data/e/yarn/logs/application_1605880843685_1450 on an-worker1105 [production]
2020-11-20 §
23:38 <mutante> synced puppet-compiler facts - new hosts should be usable in compiler [production]
23:15 <mutante> syncing facts from production masters [puppet-diffs]
22:30 <mutante> cumin1001 - sudo systemctl start cumin-check-aliases -> <+icinga-wm> RECOVERY - Check systemd state on cumin1001 is OK T268369 [production]
21:30 <razzi@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [production]
21:09 <razzi> truncate /var/lib/hadoop/data/u/yarn/logs/application_1605880843685_0581/container_e27_1605880843685_0581_01_000171/stderr logfile on an-worker1098 [analytics]
20:40 <mutante> added new member razzi [puppet-diffs]
20:26 <razzi@cumin1001> START - Cookbook sre.ganeti.makevm [production]
20:09 <razzi@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [production]
19:52 <mutante> releases2002 - systemctl disable wmf_auto_restart_rsync; rm /usr/lib/systemd/system/wmf_auto_restart_rsync.* ; systemctl daemon-reload ; systemctl reset-failed - clear up systemd unit that was not absented and fix Icinga alerts [production]
19:45 <mutante> releases2002 systemctl reset-failed (wmf_auto_restart_rsync.service failed but hopefully fixed) [production]
19:39 <mutante> Icinga: ACKing all the "unhandled CRIT" alerts on clouddb* an an-coord* that have disabled notifications to remove monitoring noise. from 72 to 25 active alerts [production]
19:17 <Jayprakash12345> Deploying app (T267488) [tools.book2scrollv2]
19:14 <razzi@cumin1001> START - Cookbook sre.ganeti.makevm [production]