production SAL

1301-1350 of 10000 results (40ms)

2021-11-25 §
07:17	<jelto@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 32 hosts with reason: helm3 de-deploy T251305	[production]
07:17	<jelto@cumin1001>	START - Cookbook sre.hosts.downtime for 3:00:00 on 32 hosts with reason: helm3 de-deploy T251305	[production]
07:10	<jelto>	downtime PyBal backends health check on lvs1015 and lvs1016 for helm3 de-deploy T251305. I'm keeping an eye on icing and remove downtime as soon as I'm finished	[production]
07:09	<jelto>	start re-deploy procedure in eqiad Kubernetes T251305	[production]
06:31	<marostegui>	Restart tendril's DB	[production]
05:51	<ryankemper>	[WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good	[production]
04:45	<ryankemper@deploy1002>	Finished deploy [wdqs/wdqs@29c5cd7] (wcqs): Deploy 0.3.93 to WCQS (duration: 05m 27s)	[production]
04:43	<ryankemper>	[WCQS Deploy] Tests look good following deploy of `0.3.93` to canary `wcqs1002.eqiad.wmnet`, proceeding to rest of fleet	[production]
04:40	<ryankemper@deploy1002>	Started deploy [wdqs/wdqs@29c5cd7] (wcqs): Deploy 0.3.93 to WCQS	[production]
04:39	<ryankemper>	[WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`	[production]
04:38	<ryankemper>	[WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`	[production]
04:38	<ryankemper>	[WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`	[production]
04:35	<ryankemper@deploy1002>	Finished deploy [wdqs/wdqs@29c5cd7]: 0.3.93 (duration: 09m 23s)	[production]
04:30	<ryankemper>	[Elastic] Cleaning up dangling apt packages: `ryankemper@cumin1001:~$ sudo cumin -b 4 'elastic*' 'sudo apt autoremove -y'`	[production]
04:27	<ryankemper>	[WDQS Deploy] Tests passing following deploy of `0.3.93` on canary `wdqs1003`; proceeding to rest of fleet	[production]
04:25	<ryankemper@deploy1002>	Started deploy [wdqs/wdqs@29c5cd7]: 0.3.93	[production]
04:25	<ryankemper>	[WDQS Deploy] Gearing up for deploy of wdqs `0.3.93`. Pre-deploy tests passing on canary `wdqs1003`	[production]
03:12	<pt1979@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2072.codfw.wmnet with OS buster	[production]
02:42	<pt1979@cumin2002>	START - Cookbook sre.hosts.reimage for host elastic2072.codfw.wmnet with OS buster	[production]
02:34	<pt1979@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2071.codfw.wmnet with OS buster	[production]
02:23	<pt1979@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2070.codfw.wmnet with OS buster	[production]
02:04	<pt1979@cumin2002>	START - Cookbook sre.hosts.reimage for host elastic2071.codfw.wmnet with OS buster	[production]
01:54	<pt1979@cumin2002>	START - Cookbook sre.hosts.reimage for host elastic2070.codfw.wmnet with OS buster	[production]
01:49	<pt1979@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2068.codfw.wmnet with OS buster	[production]
01:34	<pt1979@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2067.codfw.wmnet with OS buster	[production]
01:19	<pt1979@cumin2002>	START - Cookbook sre.hosts.reimage for host elastic2068.codfw.wmnet with OS buster	[production]
01:04	<pt1979@cumin2002>	START - Cookbook sre.hosts.reimage for host elastic2067.codfw.wmnet with OS buster	[production]
00:37	<pt1979@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2066.codfw.wmnet with OS buster	[production]
2021-11-24 §
23:59	<pt1979@cumin2002>	START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS buster	[production]
23:52	<pt1979@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2065.codfw.wmnet with OS buster	[production]
23:44	<mutante>	puppetmaster1001:~] $ sudo puppet cert sign gitlab-runner1001.eqiad.wmnet \| sudo install_console gitlab-runner1001.eqiad.wmnet (T295481)	[production]
23:26	<mutante>	ganeti - bringing up new VM - sudo gnt-instance start gitlab-runner1001.eqiad.wmnet ; ran puppet on install1003; installing OS T295481	[production]
23:22	<pt1979@cumin2002>	START - Cookbook sre.hosts.reimage for host elastic2065.codfw.wmnet with OS buster	[production]
23:11	<pt1979@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2064.codfw.wmnet with OS buster	[production]
23:09	<mutante>	mwmaint1002 - sudo /usr/bin/find /var/lib/puppet/clientbucket/ -type f -size 1M -delete - to fix Icinga alert about large files in client bucket	[production]
23:08	<dzahn@cumin1001>	END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab-runner1001.eqiad.wmnet	[production]
23:03	<mutante>	wcqs1001 - sudo systemctl restart wcqs-blazegraph - after <+jinxer-wm> (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wcqs1001:9195 is burning free allocators	[production]
22:52	<dzahn@cumin1001>	START - Cookbook sre.ganeti.makevm for new host gitlab-runner1001.eqiad.wmnet	[production]
22:50	<mutante>	Creating a new Ganeti VM and wondering which row to put it? [ganeti1009:~] $ for row in A B C D; do echo "row ${row}: $(sudo gnt-instance list -o name -F "pnode.group == 'row_${row}'" \| wc -l) VMs"; done	[production]
22:43	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab-runner1001.wikimedia.org	[production]
22:41	<pt1979@cumin2002>	START - Cookbook sre.hosts.reimage for host elastic2064.codfw.wmnet with OS buster	[production]
22:39	<pt1979@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2063.codfw.wmnet with OS buster	[production]
22:38	<mutante>	running decom cookbook on gitlab-runner1001.wikimedia.org VM which was in state "ADMIN_down" and not used yet. to make room to recreate it as gitlab-runner1001.eqiad.wmnet T295481	[production]
22:36	<dzahn@cumin1001>	START - Cookbook sre.hosts.decommission for hosts gitlab-runner1001.wikimedia.org	[production]
22:08	<pt1979@cumin2002>	START - Cookbook sre.hosts.reimage for host elastic2063.codfw.wmnet with OS buster	[production]
22:03	<pt1979@cumin2002>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2062.codfw.wmnet with OS buster	[production]
21:40	<mwdebug-deploy@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
21:37	<mwdebug-deploy@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
21:35	<legoktm@deploy1002>	Synchronized wmf-config/: Improve docs on $wmgUseGlobalAbuseFilters and sort list of wikis (duration: 00m 57s)	[production]
21:33	<pt1979@cumin2002>	START - Cookbook sre.hosts.reimage for host elastic2062.codfw.wmnet with OS buster	[production]