1-50 of 763 results (9ms)
2021-02-22 §
17:14 <bstorm> restarting nova-compute on cloudvirt1016 and cloudvirt1036 in case it helps T275411 [admin]
15:02 <dcaro> Re-uploaded the debian buster 10.0 image from rbd to glance, that worked, re-spawning all the broken instances (T275378) [admin]
11:11 <dcaro> Refreshing all the canary instances (T275354) [admin]
2021-02-18 §
14:50 <arturo> rebooting cloudnet1004 for T271058 [admin]
10:25 <dcaro> Rebooting cloudmetrics1001 to apply new kernel (T275116) [admin]
10:16 <dcaro> Rebooting cloudmetrics1002 to apply new kernel (T275116) [admin]
10:14 <dcaro> Upgrading grafana on cloudmetrics1002 (T275116) [admin]
10:12 <dcaro> Upgrading grafana on cloudmetrics1001 (T275116) [admin]
2021-02-17 §
15:58 <arturo> deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/664845 to cloudnet servers (T268335) [admin]
2021-02-15 §
16:25 <arturo> [codfw1dev] rebooting all cloudgw200x-dev / cloudnet200x-dev servers (T272963) [admin]
15:45 <arturo> [codfw1dev] drop subnet definition for cloud-instances-transport1-b-codfw (T272963) [admin]
15:45 <arturo> [codfw1dev] connect virtual router cloudinstances2b-gw to vlan cloud-gw-transport-codfw (185.15.57.10) (T272963) [admin]
2021-02-11 §
12:01 <arturo> [codfw1dev] drop instance `tools-codfw1dev-bastion-1` in `tools-codfw1dev` (was buster, cannot use it yet) [admin]
11:59 <arturo> [codfw1dev] create instance `tools-codfw1dev-bastion-2` (stretch) in `tools-codfw1dev` to test stuff related to T272397 [admin]
11:45 <arturo> [codfw1dev] create instance `tools-codfw1dev-bastion-1` in `tools-codfw1dev` to test stuff related to T272397 [admin]
11:42 <arturo> [codfw1dev] drop `tools` project, create `tools-codfw1dev` [admin]
11:38 <arturo> [codfw1dev] drop `coudinfra` project (we are using `cloudinfra-codfw1dev` there) [admin]
05:37 <bstorm> downtimed cloudnet1004 for another week T271058 [admin]
2021-02-09 §
15:23 <arturo> icinga-downtime for 2h everything *labs *cloud for openstack upgrades [admin]
11:14 <dcaro> Merged the osd scheduler change for all osds, applying on all cloudcephosd* (T273791) [admin]
2021-02-08 §
18:50 <bstorm> enabled puppet on cloudvirt1023 for now T274144 [admin]
18:44 <bstorm> restarted the backup_vms.service on cloudvirt1027 T274144 [admin]
17:51 <bstorm> deleted project pki T273175 [admin]
2021-02-05 §
10:59 <arturo> icinga-downtime labstore1004 tools share space check for 1 week (T272247) [admin]
10:21 <dcaro> This was affecting maps and several others, maps and project-proxy have been fixed (T273956) [admin]
09:19 <dcaro> Some certs around the infra are expired (T273956) [admin]
2021-02-04 §
10:12 <dcaro> Increasing the memory limit of osds in eqiad from 8589934592(8G) to 12884901888(12G) (T273851) [admin]
2021-02-03 §
09:59 <dcaro> Doing a full vm backup on cloudvirt1024 with the new script (T260692) [admin]
01:50 <bstorm> icinga-downtime cloudnet1004 for a week T271058 [admin]
2021-02-02 §
17:14 <dcaro> Changed osd memory limit from 4G to 8G (T273649) [admin]
11:00 <arturo> icinga-downtime cloudvirt-wdqs1001 for 1 week (T273579) [admin]
03:12 <andrewbogott> running /usr/local/sbin/wmcs-purge-backups and /usr/local/sbin/wmcs-backup-instances on cloudvirt1024 to see why the backup job paged [admin]
2021-01-29 §
15:36 <andrewbogott> disabling puppet and some services on eqiad1 cloudcontrol nodes; replacing nova-placement-api with placement-api [admin]
2021-01-28 §
19:44 <andrewbogott> shutting down cloudcontrol2001-dev because it's in a partially upgraded state; will revive when it's time for Train [admin]
2021-01-27 §
00:50 <bstorm> icinga-downtime cloudnet1004 for a week T271058 [admin]
2021-01-22 §
16:44 <andrewbogott> upgrading designate on cloudvirt1003/1004 to OpenStack 'train' [admin]
11:29 <dcaro> Doing some tests removed cloudcontrol1003 puppet cert, regenerating... [admin]
2021-01-21 §
11:35 <arturo> merging core router firewall changes https://gerrit.wikimedia.org/r/c/operations/homer/public/+/657439 (T209082) [admin]
11:30 <arturo> merging core router firewall changes https://gerrit.wikimedia.org/r/c/operations/homer/public/+/657358 (T272486, T209082) [admin]
2021-01-20 §
10:49 <arturo> merging core router firewall change https://gerrit.wikimedia.org/r/c/operations/homer/public/+/657302 (T209082) [admin]
10:05 <dcaro> Everything looks ok, created a new vm with a volume in ceph without issues, and on warnings/errors on ceph status, closing (T272303) [admin]
09:55 <dcaro> Eqiad ceph cluster uprgaded, doing sanity checks (T272303) [admin]
09:46 <dcaro> 75% of the eqiad cluster upgraded... continuing (T272303) [admin]
09:37 <dcaro> 25% of the eqiad cluster upgraded... continuing (T272303) [admin]
09:24 <dcaro> Mgr daemons upgraded and running, upgrading osd daemons on servers cloudcephosd1*, this make take a bit longer (T272303) [admin]
09:22 <dcaro> Mon daemons upgraded and running, upgrading mgr daemons on servers cloudcephmon1* (T272303) [admin]
09:16 <dcaro> Starting eqiad ceph upgrade, upgrading the mon servers cloudcephmon1* (T272303) [admin]
09:01 <dcaro> Will start the ceph upgrade in 15 min, no downtime nor performance impact is expected (T272303) [admin]
2021-01-19 §
10:17 <arturo> icinga-downtime cloudnet1004 for 1 week (T271058) [admin]
2021-01-18 §
16:00 <dcaro> Codfw1 ceph cluster uprgaded, will wait until tomorrow to see if there's any instability, but everything looks fine (T272303) [admin]