1-50 of 10000 results (18ms)
2020-11-20 ยง
23:38 <mutante> synced puppet-compiler facts - new hosts should be usable in compiler [production]
22:30 <mutante> cumin1001 - sudo systemctl start cumin-check-aliases -> <+icinga-wm> RECOVERY - Check systemd state on cumin1001 is OK T268369 [production]
21:30 <razzi@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [production]
20:26 <razzi@cumin1001> START - Cookbook sre.ganeti.makevm [production]
20:09 <razzi@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [production]
19:52 <mutante> releases2002 - systemctl disable wmf_auto_restart_rsync; rm /usr/lib/systemd/system/wmf_auto_restart_rsync.* ; systemctl daemon-reload ; systemctl reset-failed - clear up systemd unit that was not absented and fix Icinga alerts [production]
19:45 <mutante> releases2002 systemctl reset-failed (wmf_auto_restart_rsync.service failed but hopefully fixed) [production]
19:39 <mutante> Icinga: ACKing all the "unhandled CRIT" alerts on clouddb* an an-coord* that have disabled notifications to remove monitoring noise. from 72 to 25 active alerts [production]
19:14 <razzi@cumin1001> START - Cookbook sre.ganeti.makevm [production]
18:47 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [production]
18:42 <elukey@cumin1001> START - Cookbook sre.hosts.decommission [production]
18:37 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [production]
18:36 <razzi@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [production]
18:31 <elukey@cumin1001> START - Cookbook sre.hosts.decommission [production]
18:31 <elukey@cumin1001> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [production]
18:18 <elukey@cumin1001> START - Cookbook sre.hosts.decommission [production]
18:14 <dwisehaupt> shifting 100% of thank_you mail through frmxs ahead of tomorrow's banner test - T267259 [production]
17:37 <pt1979@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
17:35 <pt1979@cumin2001> START - Cookbook sre.hosts.downtime [production]
17:32 <razzi@cumin1001> START - Cookbook sre.ganeti.makevm [production]
17:24 <razzi@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [production]
16:48 <volans@cumin1001> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [production]
16:40 <volans@cumin1001> START - Cookbook sre.hosts.decommission [production]
16:29 <razzi@cumin1001> START - Cookbook sre.ganeti.makevm [production]
16:29 <razzi@cumin1001> END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) [production]
16:28 <razzi> removed canceled ip address records for kafka-test1002 from netbox [production]
16:11 <pt1979@cumin2001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [production]
16:09 <pt1979@cumin2001> START - Cookbook sre.hosts.downtime [production]
16:01 <razzi@cumin1001> START - Cookbook sre.ganeti.makevm [production]
16:01 <razzi@cumin1001> END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) [production]
15:42 <razzi@cumin1001> START - Cookbook sre.ganeti.makevm [production]
15:09 <andrew@cumin1001> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [production]
15:01 <andrew@cumin1001> START - Cookbook sre.hosts.decommission [production]
14:59 <andrew@cumin1001> END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) [production]
14:58 <andrew@cumin1001> START - Cookbook sre.hosts.decommission [production]
14:30 <elukey> force umount/mount for /mnt/hdfs on all stat1* nodes to pick up new openjdk settings [production]
14:28 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) [production]
14:00 <elukey> restart hadoop daemons on an-master[1001-1002] (Hadoop masters) to pick up new rack settings and openjdk upgrades [production]
13:59 <elukey@cumin1001> START - Cookbook sre.hadoop.roll-restart-masters [production]
13:34 <liw> finished trying to test scap on beta cluster [production]
13:24 <bblack> cp*: remove remnants of expiring globalsign-2019 unified cert, including ocsp config+outputs [production]
13:12 <liw> testing upcoming Scap release on beta [production]
13:00 <bblack> dns*: upgrade remainder of fleet to gdnsd to 3.4.1 [production]
12:54 <elukey@cumin1001> END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) [production]
12:29 <moritzm> uploaded wmf-sre-laptop 0.3 to buster-wikimedia/component/wmf-sre-laptop [production]
12:16 <marostegui@cumin1001> dbctl commit (dc=all): 'Set original weight to db1089', diff saved to https://phabricator.wikimedia.org/P13351 and previous config saved to /var/cache/conftool/dbconfig/20201120-121645-marostegui.json [production]
12:14 <marostegui> Run check private data on clouddb1013:3311 clouddb1013:3313 clouddb1015:3316 clouddb1017:3311 clouddb1017:3313 clouddb1019:3316 T267090 [production]
12:11 <Urbanecm> Start of mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log in a tmux at mwmaint1002 (wiki=fawiki; T246539) [production]
11:50 <marostegui@cumin1001> dbctl commit (dc=all): 'More traffic to db1089', diff saved to https://phabricator.wikimedia.org/P13350 and previous config saved to /var/cache/conftool/dbconfig/20201120-115057-marostegui.json [production]
11:47 <marostegui@cumin1001> dbctl commit (dc=all): 'More traffic to db1089', diff saved to https://phabricator.wikimedia.org/P13349 and previous config saved to /var/cache/conftool/dbconfig/20201120-114758-marostegui.json [production]