201-250 of 705 results (7ms)
2020-09-30 §
11:40 <arturo> rebooting cloudnet1004 (standby) to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/631167 (T262979) [admin]
11:38 <arturo> [codfw1dev] rebooting cloudnet2002-dev to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/631167 [admin]
11:36 <arturo> [codfw1dev] rebooting cloudnet2003-dev to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/631167 [admin]
11:33 <arturo> disabling puppet and downtiming every virt/net server in the fleet in preparation for merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/631167 (T262979) [admin]
09:32 <arturo> rebooting cloudvirt1012 to investigate linuxbridge agent issues [admin]
2020-09-29 §
15:40 <arturo> downgrade linux kernel from linux-image-4.19.0-11-amd64 to linux-image-4.19.0-10-amd64 on cloudvirt1012 [admin]
14:47 <arturo> rebooting cloudvirt1012, chasing config weirdness in the linuxbridge agent [admin]
14:05 <andrewbogott> reimaging 1014 over and over in an attempt to get partman right [admin]
13:51 <arturo> rebooting cloudvirt1012 [admin]
2020-09-28 §
14:55 <arturo> [jbond42] upgraded facter to v3 across the VM fleet [admin]
13:54 <andrewbogott> moving cloudvirt1035 from aggregate 'spare' to 'ceph'. We're going to need all the capacity we can get while converting older cloudvirts to ceph [admin]
2020-09-24 §
15:47 <arturo> stopping/restarting rabbitmq-server in all cloudcontrol servers [admin]
15:45 <arturo> restarting rabbitmq-server in cloudcontrol103 [admin]
15:15 <arturo> restarting floating_ip_ptr_records_updater.service in all 3 cloudcontrol servers to reset state after a DNS failure [admin]
2020-09-18 §
10:16 <arturo> cloudvirt1039 libvirtd service issues were fixed with a reboot [admin]
09:56 <arturo> rebooting cloudvirt1039 (spare) to try to fix some weird libvirtd failure [admin]
09:50 <arturo> enabling puppet in cloudvirts and effectively merging patches from T262979 [admin]
08:59 <arturo> disable puppet in all buster cloudvirts (cloudvirt[1024,1031-1039].eqiad.wmnet) to merge a patch for T263205 and T262979 [admin]
08:50 <arturo> installing iptables from buster-bpo in cloudvirt1036 (T263205 and T262979) [admin]
2020-09-15 §
20:32 <andrewbogott> rebooting cloudvirt1038 to see if it resolves T262979 [admin]
13:58 <andrewbogott> draining cloudvirt1002 with wmcs-ceph-migrate [admin]
2020-09-14 §
14:21 <andrewbogott> draining cloudvirt1001, migrating all VMs with wmcs-ceph-migrate [admin]
10:41 <arturo> [codfw1dev] trying to get the bonding working for labtestvirt2003 (T261724) [admin]
09:47 <arturo> installed qemu security update in eqiad1 cloudvirts (T262386) [admin]
09:43 <arturo> [codfw1dev] installed qemu security update in codfw1dev cloudvirts (T262386) [admin]
2020-09-09 §
18:13 <andrewbogott> restarting ceph-mon@cloudcephmon1003 in hopes that the slow ops reported are phantoms [admin]
18:01 <andrewbogott> restarting ceph-mgr@cloudcephmon1003 in hopes that the slow ops reported are phantoms (https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/EOWNO3MDYRUZKAK6RMQBQ5WBPQNLHOPV/) [admin]
17:40 <andrewbogott> giving ceph pg autoscale another chance: ceph osd pool set eqiad1-compute pg_autoscale_mode on [admin]
00:05 <bd808> Running wmcs-novastats-dnsleaks (T262359) [admin]
2020-09-08 §
21:48 <bd808> Renamed FQDN prefixes to wikimedia.cloud scheme in cloudinfra-db01's labspuppet db (T260614) [admin]
14:29 <andrewbogott> restarting nova-compute on all cloudvirts (everyone is upset from the reset switch failure) [admin]
14:18 <arturo> restarting nova-fullstack service in cloudcontrol1003 [admin]
14:17 <andrewbogott> stopping apache2 on labweb1001 to make sure the Horizon outage is total [admin]
2020-09-03 §
09:31 <arturo> icinga downtime cloud* servers for 30 mins (T261866) [admin]
2020-09-02 §
08:46 <arturo> [codfw1dev] reimaging spare server labtestvirt2003 as debian buster (T261724) [admin]
2020-09-01 §
18:18 <andrewbogott> adding drives on cloudcephosd100[3-5] to ceph osd pool [admin]
13:40 <andrewbogott> adding drives on cloudcephosd101[0-2] to ceph osd pool [admin]
13:34 <andrewbogott> adding drives on cloudcephosd100[1-3] to ceph osd pool [admin]
11:27 <arturo> [codfw1dev] rebooting again cloudnet2002-dev after some network tests, to reset initial state (T261724) [admin]
11:09 <arturo> [codfw1dev] rebooting cloudnet2002-dev after some network tests, to reset initial state (T261724) [admin]
10:49 <arturo> disable puppet in cloudnet servers to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/623569/ [admin]
2020-08-31 §
23:26 <bd808> Removed stale lockfile at cloud-puppetmaster-03.cloudinfra.eqiad.wmflabs:/var/lib/puppet/volatile/GeoIP/.geoipupdate.lock [admin]
11:20 <arturo> [codfw1dev] livehacking https://gerrit.wikimedia.org/r/c/operations/puppet/+/615161 in the puppetmasters for tests before merging [admin]
2020-08-28 §
20:12 <bd808> Running `wmcs-novastats-dnsleaks --delete` from cloudcontrol1003 [admin]
2020-08-26 §
17:12 <bstorm> Running 'ionice -c 3 nice -19 find /srv/tools -type f -size +100M -printf "%k KB %p\n" > tools_large_files_20200826.txt' on labstore1004 T261336 [admin]
2020-08-21 §
21:34 <andrewbogott> restarting nova-compute on cloudvirt1033; it seems stuck [admin]
2020-08-19 §
14:21 <andrewbogott> rebooting cloudweb2001-dev, labweb1001, labweb1002 to address mediawiki-induced memleak [admin]
2020-08-06 §
21:02 <andrewbogott> removing cloudvirt1004/1006 from nova's list of hypervisors; rebuilding them to use as backup test hosts [admin]
20:06 <bstorm> manually stopped the RAID check on cloudcontrol1003 T259760 [admin]
2020-08-04 §
18:54 <bstorm> restarting mariadb on cloudcontrol1004 to setup parallel replication [admin]