2020-03-30 §
23:42 <bstorm_> deleted "Kubernetes Cluster" and "Kubernetes Performance" dashboards T246689 [admin]
16:44 <arturo> [codfw1dev] installing package neutron-openvswitch-agent in cloudvirt2002-dev (T248881) [admin]
16:42 <andrewbogott> restarting l3 agents on cloudnets in codfw1dev after applying https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/584188/ [admin]
2020-03-27 §
21:28 <bd808> Created huggle.wmcloud.org Designate zone and allocated it to the huggle project [admin]
19:51 <jeh> start haproxy on cloudcontrol2003-dev.wikimedia.org [admin]
2020-03-26 §
15:01 <arturo> icinga downtime cloudvirt* cloudcontrol* cloudnet* lab* cloudstore* [admin]
15:01 <andrewbogott> beginning openstack upgrade window for T242766 [admin]
12:32 <arturo> [codfw1dev] downgraded systemd, libsystemd0, udev and friends to the non-backports versions (T247013) [admin]
2020-03-25 §
19:29 <andrewbogott> dumping a bunch of VMs on cloudvirt1015 to see if it still crashes [admin]
17:56 <jeh> add labweb1002 back into the pool - completed horizon testing T240852 [admin]
17:09 <jeh> depool labweb1002 for horizon testing T240852 [admin]
2020-03-24 §
19:41 <jeh> switch cloudvirt1016 from maintenance to standard host aggregate T243327 [admin]
15:31 <andrewbogott> restarting nova-conductor and nova-api on cloudcontrol1003 and cloudcontrol1004 [admin]
2020-03-23 §
21:41 <jeh> restart neutron-l3-agent on cloudnet100[3,4] to pickup policy.yaml changes [admin]
13:28 <jeh> disable puppet on labweb100[1,2] to enable horizon event traces T240852 [admin]
10:26 <arturo> restarting apache in both labweb1001/labweb1002 upon reports of returning 500s [admin]
2020-03-21 §
14:23 <andrewbogott> restarting apache2 on labweb1001 and 1002 [admin]
2020-03-18 §
19:17 <andrewbogott> deleted a bunch of records from the pdns database on cloudservices1003/1004 which had a record name but the content (where an IP address should be) was NULL, e.g. m.wikidata.beta.wmflabs.org. [admin]
10:55 <arturo> [codfw1dev] deleting BGP agent, undoing changes we did for T245606 [admin]
2020-03-14 §
17:40 <jeh> restart maintain-dbusers on labstore1004 T247654 [admin]
2020-03-13 §
12:39 <arturo> [codfw1dev] reintroduce address scopes for another round of testing T244851 [admin]
12:17 <arturo> [codfw1dev] enabling puppet in cloudnet200x-dev servers after merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/579259 (T247505) [admin]
2020-03-12 §
22:29 <bstorm_> running puppet across all dumps mounts to make sure active links are shifted to labstore1006 [admin]
2020-03-11 §
18:38 <jeh> set icingia downtime until 2020-03-23 on CODFW cloud[control,net,virt] hosts during openstack upgrades [admin]
12:50 <arturo> [codfw1dev] several tests creating/deleting address scopes (T244727 T247135 T246887 T245606) [admin]
12:46 <arturo> [codfw1dev] disable routing_source_ip in l3 agents for testing proposal detailed at https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Network_refresh#Eliminate_routing_source_ip_address (T244727) [admin]
2020-03-10 §
17:02 <arturo> [codfw1dev] deleting address scopes, bad interaction with our custom NAT setup T247135 [admin]
13:55 <arturo> [codfw1dev] rebooting cloudnet2003-dev into linux kernel 4.14 for testing stuff related to T247135 [admin]
2020-03-09 §
18:09 <arturo> enabling puppet in cloudvirt1006, all services have been restored [admin]
17:59 <arturo> deleted the neutron bridge on cloudvirt1006, for testing stuff related to the queens upgrade [admin]
17:58 <arturo> stopped neutron-linuxbridge-agent and nova-compute in cloudvirt1006 for testing stuff related to the queens upgrade [admin]
2020-03-06 §
14:54 <andrewbogott> draining all instances off of cloudvirt1006 for T246908 [admin]
2020-03-05 §
14:24 <arturo> [codfw1dev] we just enabled BGP session between cloudnet2xxx-dev and cr1-codfw (T245606) [admin]
13:07 <arturo> [codfw1dev] move the extra IP address for BGP in cloudnet200x-dev servers from eno2.2120 to the br-external bridge device (T245606) [admin]
13:06 <arturo> [codfw1dev] upgrade neutron-dynamic-routing packages in cloudnet200X-dev and cloudcontrol200X-dev servers to 11.0.0-2~bpo9+1 (T245606) [admin]
2020-03-04 §
22:22 <andrewbogott> upgrading designate on cloudservices1003/1004 to Queens [admin]
22:09 <andrewbogott> moving cloudvirt1006 into the maintenance aggregate for T246908 [admin]
21:37 <bd808> Running wmcs-wikireplica-dns to add service names for ngwikimedia.*.db.svc.eqiad.wmflabs (T240772) [admin]
21:14 <bd808> Running `sudo maintain-meta_p --all-databases --purge` on labsdb1009 (T246056) [admin]
21:10 <bd808> Running `sudo maintain-meta_p --all-databases --purge` on labsdb1010 (T246056) [admin]
21:08 <bd808> Running `sudo maintain-meta_p --all-databases --purge` on labsdb1011 (T246056) [admin]
21:05 <bd808> Running `sudo maintain-meta_p --all-databases --purge` on labsdb1002 (T246056) [admin]
2020-03-02 §
16:54 <arturo> [codfw1dev] deleted python3-os-ken debian package in cloudnet2003-dev which was installed by hand and had depedency issues [admin]
2020-02-29 §
16:32 <bstorm_> downtimed the smart alert on cloudvirt1009 until Monday since apparently predictive failures flap T244986 [admin]
2020-02-26 §
22:03 <jeh> powering down cloudvirt1014 for hardware maintenance [admin]
2020-02-25 §
16:08 <andrewbogott> changing neutron's rabbitmq password because oslo is having trouble parsing some of the characters in the password [admin]
15:26 <andrewbogott> updated the cell_mapping record in the nova_api database to add the second rabbitmq server to the transport_url field [admin]
15:26 <andrewbogott> updated the cell_mapping record in the nova_api database to set the db uri to 'mysql+pymysql' -- this in response to a deprecation notice [admin]
2020-02-24 §
12:16 <arturo> [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-speaker-peer-add bgpspeaker cr2-codfw` (T245606) [admin]
12:16 <arturo> [codfw1dev] `root@cloudcontrol2001-dev:~# neutron bgp-speaker-peer-add bgpspeaker cr1-codfw` (T245606) [admin]