2022-04-29 §
20:59 <mutante> - restarting instance gitlab-prod-1001 - No route to host [devtools]
20:55 <mutante> - attempting to soft reboot instance deploy1004 (got the puppet fail mail and wasnt reachable by ssh), this happened lately as well to gitlab-prod-1001, same project, different instance, but this time it doesn't just come back yet [devtools]
2022-04-20 §
17:03 <mutante> soft rebooting gitlab-prod-1001 which was sending "failed puppet" reports and was unreachable, just like the other day. [devtools]
2022-04-18 §
19:08 <mutante> - gitlab-prod-1001 is indeed back after soft rebooting the instance. uptime 1 min T297411 [devtools]
19:07 <mutante> - gitlab-prod-1001 randomly stopped working. we got the "puppet failed" mails without having made changes and can't ssh to the instance anymore when trying to check out why. trying soft reboot via Horizon T297411 [devtools]
2022-04-15 §
18:00 <mutante> - deleting deploy-1002 - use deploy-1004 instead - T306069 [devtools]
17:03 <mutante> - not sure if possible (for me) to create a bullseye deployment server in cloud, using scap: failed: Execution of '/usr/bin/scap deploy --init', missing PHP packages, missing prometheus-mcrouter-exporter and more T306069 [devtools]
17:02 <mutante> - not sure if possible (for me) to create ad deployment server in cloud, using scap: failed: Execution of '/usr/bin/scap deploy --init' [devtools]
16:40 <mutante> : creating deploy1003 to replace deploy1002 T306069 [devtools]
16:36 <mutante> : deleting instance gitlab-runner-1001 - was just for testing, real runners are upgrade in their own project [devtools]
2022-03-02 §
22:22 <mutante> - creating gitlab-runner-1001 on bullseye - purely test for T297659 [devtools]
2022-03-01 §
18:16 <taavi> allocated secondary IP for gitlab-prod-1001 per request on T302803 [devtools]
2022-02-15 §
16:08 <taavi> created devtools.wmcloud.org dns zone for the devtools project T301793 [devtools]
2022-01-26 §
17:26 <arturo> bump quota, floating IP from 1 to 2 (T299561) [devtools]
15:56 <arturo> bump quota, RAM from 32 to 40, cores from 16 to 20 (T299561) [devtools]
2022-01-21 §
22:11 <mutante> - created new instance gitlab-prod-1001 T297411 [devtools]
21:57 <mutante> - deleted instances "doc" and "doc1002" to make room for gitlab instance T299561 - T297411 [devtools]
2022-01-19 §
17:36 <mutante> - added brennen, aokoth and jelto as users and projectadmins (T297411) [devtools]
2021-11-10 §
19:49 <mutante> - removing manually added things in Horizon Hiera that were already in the repo, please don't keep adding in web UI, we don't want to repeat the same thing we did in deployment-prep [devtools]
2021-07-28 §
16:39 <andrewbogott> rebooting gerrit-prod-1001; seemingly unreachable [devtools]
2021-03-10 §
10:58 <arturo> briefly stopped VM 'doc' to disable VMX cpu flag and live-migrate it [devtools]
2021-02-22 §
20:58 <mutante> fixed puppet run on deploy-1002 by adding empty array of wikimedia-sites to hiera [devtools]
20:01 <mutante> deploy-1002 is broken because mediawiki::sites is not in Hiera (yet) [devtools]
2020-10-28 §
17:01 <andrewbogott> fixed puppet runs on phabricator-stage-1001 (previously puppetmaster name mismatch) [devtools]
2020-09-01 §
00:16 <mutante> - unbreaking puppet run on the local deployment after it was broken since July due to changes in prod deployment_server role [devtools]
2020-06-30 §
20:22 <mutante> managed to let certbot get LE certs for gerrit.devtools.wmflabs.org and the floating IP [devtools]
2020-06-17 §
19:55 <paladox> ran `iptables -A INPUT -p tcp -m tcp --dport 80 -j ACCEPT` on phabricator-prod-1001 [devtools]
2020-05-08 §
07:01 <mutante> phabricator-prod-1001 - removing cron for public task dump (though puppet should have removed it) [devtools]
2020-05-07 §
09:24 <mutante> - cloud puppetmasters still affected by https://phabricator.wikimedia.org/T83447#5807825 [devtools]
09:07 <mutante> - puppetmaster-1001 - Permission denied @ rb_sysopen - /var/lib/puppet/volatile/GeoIP/.geoipupdate.lock [devtools]
09:06 <mutante> - avoiding the need for a second role for deployment_servers in cloud with https://gerrit.wikimedia.org/r/c/operations/puppet/+/594903 [devtools]
09:05 <mutante> - puppet fixed on deploy-1002 with https://gerrit.wikimedia.org/r/c/operations/puppet/+/594900 [devtools]
08:04 <mutante> - broken puppet again from prod changes. this time: deploy-1002 - []' is not applicable to an Undef Value. mediawiki/mcrouter_wancache.pp, line: 19 [devtools]
2020-04-13 §
10:00 <mutante> - phabricator-stage-1001: replace deployment-tin.deployment-rep with deploy-1002.devtools in deployment-cache/.config [devtools]
09:40 <mutante> set missing (and new) profile::tlsproxy::envoy::capitalize_headers: true to fix puppet errors [devtools]
09:35 <mutante> set phabricator::vcs::address::v6 to fe80 local address to fix puppet error on phabricator-stage-1001 [devtools]
2020-01-16 §
00:53 <mutante> deploy-1002 - become 'trebuchet' user and ssh to phabricator scap targets. to fix ssh host key verification issue on first deploy [devtools]
00:30 <mutante> deploy-1002 live hack /srv/deployment/phabricator/deployment/scap/phabricator-targets and replace prod server with cloud instances; scap deploy in phabricator repo [devtools]
2020-01-15 §
23:51 <paladox> deploy-1002 rm -rf /srv/deployment [devtools]
23:44 <mutante> deploy-1002 sudo git init in /srv/deployment ; scap deploy --init (now fails with 'fatal: Not a valid object name HEAD') [devtools]
23:42 <mutante> deploy-1002 mkdir /srv/deployments/.git ; chown trebuchet:wikidev .git ; manually run "scap deploy --init" as trebuchet user in an attempt to fix initial puppet run on deployment_server [devtools]
2020-01-14 §
00:59 <mutante> deleting instance codesearch-buster [devtools]
00:54 <mutante> - deleting instance codesearch-stretch, creating codesearch-buster [devtools]
2020-01-11 §
00:35 <mutante> deleting instance codesearch-buster, creating codesearch-stretch [devtools]
00:05 <mutante> s/cloudsearch/codesearch/g [devtools]
00:04 <mutante> creating throwaway instance "cloudsearch" [devtools]
00:04 <mutante> deleting instance deploy1001 (buster), creating deploy-1002 (stretch) instead [devtools]
2020-01-04 §
16:01 <bstorm_> moving vm puppetmaster-1001 from cloudvirt1024 to cloudvirt1009 due to hardware error T241884 [devtools]
2020-01-03 §
22:37 <mutante> - sudo vi /srv/deployment/phabricator/deployment-cache/.config on both phabricator instances to fix deployment server (remove deployment-tin (!)) [devtools]