551-600 of 3097 results (15ms)
2022-08-16 §
22:39 <andrewbogott> replacing the now-rebuilt cloudvirt1025 in 'ceph' aggregate and removing it from the 'maintenance' aggregate [admin]
17:41 <andrewbogott> removing cloudvirt1025 from the 'ceph' aggregate and adding it to the 'maintenance' aggregate [admin]
17:40 <andrewbogott> reimaging cloudvirt1025 after I accidentally deleted the hw raid [admin]
17:38 <andrewbogott> root@cloudcontrol1005:~# cinder-manage volume update_host --currenthost cloudcontrol1003@rbd#RBD --newhost cloudcontrol1005@rbd#RBD [admin]
17:37 <andrewbogott> root@cloudcontrol1005:~# cinder-manage volume update_host --currenthost cloudcontrol1004@rbd#RBD --newhost cloudcontrol1006@rbd#RBD [admin]
16:26 <wm-bot2> Ceph cluster at eqiad1 set out of maintenance. - cookbook ran by dcaro@vulcanus [admin]
15:43 <wm-bot2> Restarting the osd daemons from nodes cloudcephosd1001,cloudcephosd1002,cloudcephosd1003,cloudcephosd1004,cloudcephosd1005,cloudcephosd1006,cloudcephosd1007,cloudcephosd1008,cloudcephosd1009,cloudcephosd1010,cloudcephosd1011,cloudcephosd1012,cloudcephosd1013,cloudcephosd1014,cloudcephosd1015,cloudcephosd1016,cloudcephosd1017,cloudcephosd1018,cloudcephosd1019,cloudcephosd1020,cloudcephosd1021,cloudcephosd1022,cloudcephosd1023,c [admin]
15:42 <wm-bot2> Finished restarting all the OSD daemons from the nodes ['cloudcephosd2001-dev', 'cloudcephosd2002-dev', 'cloudcephosd2003-dev'] - cookbook ran by dcaro@vulcanus [admin]
15:38 <wm-bot2> Restarting the osd daemons from nodes cloudcephosd2001-dev,cloudcephosd2002-dev,cloudcephosd2003-dev - cookbook ran by dcaro@vulcanus [admin]
13:08 <wm-bot2> Restarting the osd daemons from nodes cloudcephosd2001-dev,cloudcephosd2002-dev,cloudcephosd2003-dev - cookbook ran by dcaro@vulcanus [admin]
13:07 <wm-bot2> Restarting the osd daemons from nodes cloudcephosd2001-dev,cloudcephosd2002-dev,cloudcephosd2003-dev - cookbook ran by dcaro@vulcanus [admin]
13:02 <wm-bot2> Restarting the osd daemons from nodes cloudcephosd2001-dev,cloudcephosd2002-dev,cloudcephosd2003-dev - cookbook ran by dcaro@vulcanus [admin]
13:01 <wm-bot2> Restarting the osd daemons from nodes cloudcephosd2001-dev,cloudcephosd2002-dev,cloudcephosd2003-dev - cookbook ran by dcaro@vulcanus [admin]
12:59 <wm-bot2> Restarting the osd daemons from nodes cloudcephosd2001-dev,cloudcephosd2002-dev,cloudcephosd2003-dev - cookbook ran by dcaro@vulcanus [admin]
2022-08-14 §
18:36 <taavi> deleted the http keystone endpoints from the keystone service catalog [admin]
2022-08-11 §
13:57 <andrewbogott> decommissioning cloudcontrol1003 + cloudcontrl1004. I backed up $home in case anyone needs their files. [admin]
08:42 <wm-bot2> The cluster is now rebalanced after adding the new OSDs ['cloudcephosd1025.eqiad.wmnet'] (T314870) - cookbook ran by fran@MacBook-Pro.station [admin]
08:42 <wm-bot2> Added 1 new OSDs ['cloudcephosd1025.eqiad.wmnet'] (T314870) - cookbook ran by fran@MacBook-Pro.station [admin]
08:42 <wm-bot2> Added OSD cloudcephosd1025.eqiad.wmnet... (1/1) (T314870) - cookbook ran by fran@MacBook-Pro.station [admin]
08:40 <wm-bot2> Finished rebooting node cloudcephosd1025.eqiad.wmnet (T314870) - cookbook ran by fran@MacBook-Pro.station [admin]
08:36 <wm-bot2> Rebooting node cloudcephosd1025.eqiad.wmnet (T314870) - cookbook ran by fran@MacBook-Pro.station [admin]
08:36 <wm-bot2> Adding OSD cloudcephosd1025.eqiad.wmnet... (1/1) (T314870) - cookbook ran by fran@MacBook-Pro.station [admin]
08:36 <wm-bot2> Adding new OSDs ['cloudcephosd1025.eqiad.wmnet'] to the cluster (T314870) - cookbook ran by fran@MacBook-Pro.station [admin]
2022-08-10 §
13:10 <wm-bot2> Finished rebooting node cloudcephosd1025.eqiad.wmnet (T314870) - cookbook ran by fran@MacBook-Pro.station [admin]
13:06 <wm-bot2> Rebooting node cloudcephosd1025.eqiad.wmnet (T314870) - cookbook ran by fran@MacBook-Pro.station [admin]
13:06 <wm-bot2> Adding OSD cloudcephosd1025.eqiad.wmnet... (1/1) (T314870) - cookbook ran by fran@MacBook-Pro.station [admin]
13:06 <wm-bot2> Adding new OSDs ['cloudcephosd1025.eqiad.wmnet'] to the cluster (T314870) - cookbook ran by fran@MacBook-Pro.station [admin]
2022-08-04 §
17:16 <taavi> deleted all scheduler_fanout_ rabbit queues in an attempt to fix scheduling [admin]
16:32 <taavi> restart neutron-l3-agent to pick up rabbit config changes [admin]
15:12 <andrewbogott> stopping rabbitmq on cloudcontrol1xxx [admin]
09:57 <taavi> stop wikitech_run_jobs.timer on labweb1001/1002, hosts pending decom [admin]
2022-08-03 §
20:55 <andrewbogott> root@tools-checker-04:~# systemctl restart uwsgi-toolschecker_cron.service [admin]
20:41 <andrewbogott> restarting neutron-l3-agent.service on cloudnet1003 and 1004. The agent was routing properly but had lost touch with rabbitmq [admin]
2022-08-02 §
14:07 <andrewbogott> shutting down codfw1dev ceph cluster according to https://docs.mirantis.com/mcp/q4-18/mcp-operations-guide/scheduled-maintenance-power-outage/power-off-ceph-cluster.html [admin]
13:54 <andrewbogott> shutting down basically all of codfw1dev to support pdu maintenance -- all the ceph OSDs will lose power so best to have everything stopped. [admin]
2022-07-27 §
19:32 <andrewbogott> switching the openstack.eqiad1.wikimedia.cloud endpoint from cloudcontrol1004 to 1006, https://gerrit.wikimedia.org/r/c/operations/dns/+/817878/2/templates/wikimediacloud.org#54 [admin]
16:33 <andrewbogott> here is a test message in the admin channel [admin]
2022-07-25 §
13:43 <andrewbogott> pooling cloudweb100[34] and depooling labweb100[12] for testing in prep for decomming labweb100[12] [admin]
2022-07-22 §
16:41 <taavi> depool cloudweb1003/1004 since horizon seems to be having issues [admin]
16:22 <taavi> pooling cloudweb1003/1004 now that grant issues are sorted [admin]
2022-07-21 §
18:26 <andrewbogott> depooling cloudweb1003 and 1004 for wikitech, horizon, striker -- pending db grant changes [admin]
18:06 <andrewbogott> pooling cloudweb1003 and 1004 for wikitech, horizon, striker [admin]
2022-07-20 §
18:02 <dcaro> things seem stable, trying to bring up a the last rabbit node, cloudcontrol1007 (T313400) [admin]
17:45 <bd808> `sudo service striker restart` on labweb1002 [admin]
17:43 <bd808> `sudo service striker restart` on labweb1001 [admin]
17:10 <dcaro> things seem stable, trying to bring up a fourth rabbit node, cloudcontrol1006 (T313400) [admin]
16:26 <dcaro> things seem stable, trying to bring up a third, cloudcontrol1005 (T313400) [admin]
15:51 <dcaro> things seem stable now with one rabbit node, trying to bring up a second (T313400) [admin]
14:16 <dcaro> stopping rabbin on cloudcontrol1004, leaving only 1003 alive (T313400) [admin]
13:17 <dcaro> restarting the whole rabbit cluster (T313400) [admin]