2201-2250 of 10000 results (19ms)
2020-06-22 §
09:56 <marostegui@cumin2001> START - Cookbook sre.hosts.downtime [production]
09:33 <marostegui@cumin2001> dbctl commit (dc=all): 'Depool db1094 for reimage', diff saved to https://phabricator.wikimedia.org/P11621 and previous config saved to /var/cache/conftool/dbconfig/20200622-093323-marostegui.json [production]
09:31 <godog> roll-restart logstash in codfw/eqiad to apply configuration change [production]
08:59 <jmm@cumin2001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
08:56 <jmm@cumin2001> START - Cookbook sre.hosts.downtime [production]
08:33 <moritzm> reimaging cumin1001 to buster T245114 [production]
08:13 <godog> extend prometheus codfw ops filesystem to 1TB [production]
08:02 <vgutierrez> upgrade to trafficserver 8.0.8~rc0-1wm1 on cp4026 and cp4032 [production]
08:02 <vgutierrez> upload trafficserver 8.0.8~rc0-1wm1 to apt.wm.o (buster) [production]
07:33 <marostegui@cumin2001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
07:30 <marostegui@cumin2001> START - Cookbook sre.hosts.downtime [production]
07:16 <marostegui> Reimage db1117 (irc haproxy alerts will be triggered) [production]
06:26 <marostegui@cumin2001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
06:24 <marostegui@cumin2001> START - Cookbook sre.hosts.downtime [production]
06:06 <marostegui> Stop MySQL on dbstore1005 for reimage to Buster - T254870 [production]
05:58 <marostegui> Compress InnoDb on db1118 T254462 [production]
05:51 <marostegui@cumin2001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
05:49 <marostegui@cumin2001> START - Cookbook sre.hosts.downtime [production]
05:43 <marostegui> Stop haproxy on dbproxy1008 - T255406 [production]
05:33 <marostegui@cumin2001> dbctl commit (dc=all): 'Depool db1118 for reimage and InnoDB compression', diff saved to https://phabricator.wikimedia.org/P11617 and previous config saved to /var/cache/conftool/dbconfig/20200622-053334-marostegui.json [production]
05:31 <marostegui@cumin1001> dbctl commit (dc=all): 'Fully repool db1134', diff saved to https://phabricator.wikimedia.org/P11616 and previous config saved to /var/cache/conftool/dbconfig/20200622-053104-marostegui.json [production]
05:17 <marostegui@cumin1001> dbctl commit (dc=all): 'Slowly repool db1134', diff saved to https://phabricator.wikimedia.org/P11615 and previous config saved to /var/cache/conftool/dbconfig/20200622-051730-marostegui.json [production]
05:17 <marostegui@cumin1001> dbctl commit (dc=all): 'Slowly repool db1134', diff saved to https://phabricator.wikimedia.org/P11614 and previous config saved to /var/cache/conftool/dbconfig/20200622-051720-marostegui.json [production]
05:03 <marostegui@cumin1001> dbctl commit (dc=all): 'Slowly repool db1134', diff saved to https://phabricator.wikimedia.org/P11613 and previous config saved to /var/cache/conftool/dbconfig/20200622-050259-marostegui.json [production]
04:50 <marostegui> Deploy schema change on s3 primary master with a big sleep between wikis - T250066 [production]
04:48 <marostegui@cumin1001> dbctl commit (dc=all): 'Slowly repool db1134', diff saved to https://phabricator.wikimedia.org/P11612 and previous config saved to /var/cache/conftool/dbconfig/20200622-044853-marostegui.json [production]
2020-06-21 §
21:24 <RhinosF1> reset & updated mhphab config T255922 [tools.zppixbot-test]
21:24 <RhinosF1> reset & updated mhphab config T255922 [tools.zppixbot]
21:18 <RhinosF1> reset & updated status.py config T255922 [tools.zppixbot]
21:17 <RhinosF1> reset & updated status.py config T255922 [tools.zppixbot-test]
2020-06-20 §
23:16 <RhinosF1> scrambled email&pw for a user [tools.zppixbot]
23:08 <RhinosF1> restart all pods to unfallout T255926 [tools.zppixbot]
23:05 <RhinosF1> restart all pods to unfallout T255926 [tools.zppixbot-test]
22:56 <cdanis@cumin2001> dbctl commit (dc=all): 'db1088 seems to have crashed', diff saved to https://phabricator.wikimedia.org/P11611 and previous config saved to /var/cache/conftool/dbconfig/20200620-225624-cdanis.json [production]
21:39 <RhinosF1> deploy a status config change (temp due to fallout) and add new twitter api key --T255922 [tools.zppixbot]
21:39 <RhinosF1> deploy a status config change (temp due to fallout) and add new twitter api key --T255922 [tools.zppixbot-test]
18:34 <RhinosF1> revoke access of EK for ToU vio [tools.zppixbot-test]
18:34 <RhinosF1> revoke access of EK for ToU vio [tools.zppixbot]
07:42 <elukey> powercycle an-worker1093 - bug soft lock up CPU showed in mgmt console [production]
07:41 <elukey> powercycle an-worker1093 - bug soft lock up CPU showed in mgmt console [analytics]
07:37 <elukey> powercycle an-worker1091 - bug soft lock up CPU showed in mgmt console [analytics]
07:36 <elukey> powercycle an-worker1091 - bug soft lock up CPU showed in mgmt console [production]
2020-06-19 §
18:10 <otto@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Bump eventlogging_Test schema version to 1.1.0 to pick up client_dt - T238230 (duration: 00m 59s) [production]
16:07 <mutante> ganeti4003 - rebooting install4001 - trying to bootstrap OS install from install2003 [production]
15:47 <dzahn@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [production]
15:28 <godog> roll-restart kibana to apply new settings [production]
13:01 <moritzm> installing cups security updates (client side libs/tools) [production]
12:31 <qchris> Disabling puppet on gerrit1002 (test instance) to do some more testing [production]
12:14 <godog> delete march indices from logstash 5 eqiad to free up space [production]
12:12 <marostegui@cumin2001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]