__all__ SAL

5751-5800 of 10000 results (29ms)

2023-02-16 §
10:59	<wm-bot2>	Depooling OSDs with ids in [55, 54, 53, 52, 51, 50] on cloudcephosd1001 from eqiad1 (T329498) - cookbook ran by dcaro@vulcanus	[admin]
10:55	<claime>	repool parse1012 for monitoring of possible CPU1 issues	[production]
10:45	<slyngshede@cumin1001>	END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host idm1001.wikimedia.org with OS bullseye	[production]
10:37	<elukey@deploy1002>	helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync	[production]
10:36	<elukey@deploy1002>	helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync	[production]
10:35	<arturo>	aborrero@tools-k8s-control-1:~$ sudo -i kubectl apply -f /etc/kubernetes/psp/base-pod-security-policies.yaml	[tools]
10:34	<slyngshede@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idm1001.wikimedia.org with reason: host reimage	[production]
10:32	<arturo>	aborrero@toolsbeta-test-k8s-control-4:~$ sudo -i kubectl apply -f /etc/kubernetes/psp/base-pod-security-policies.yaml	[toolsbeta]
10:31	<slyngshede@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on idm1001.wikimedia.org with reason: host reimage	[production]
10:31	<jnuche>	restarted Zuul to clear stale events	[releng]
10:22	<moritzm>	installing postgresql-11 security updates on maps*	[production]
10:20	<slyngshede@cumin1001>	START - Cookbook sre.ganeti.reimage for host idm1001.wikimedia.org with OS bullseye	[production]
10:15	<dcaro>	purges osd daemons 48 and 40 from eqiad ceph cluster (T329709)	[admin]
10:12	<slyngshede@cumin1001>	END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idm1001.wikimedia.org	[production]
10:02	<slyngshede@cumin1001>	END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idm1001.wikimedia.org on all recursors	[production]
10:02	<slyngshede@cumin1001>	START - Cookbook sre.dns.wipe-cache idm1001.wikimedia.org on all recursors	[production]
10:02	<slyngshede@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
10:02	<slyngshede@cumin1001>	END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm1001.wikimedia.org - slyngshede@cumin1001"	[production]
10:01	<slyngshede@cumin1001>	START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm1001.wikimedia.org - slyngshede@cumin1001"	[production]
09:59	<slyngshede@cumin1001>	START - Cookbook sre.dns.netbox	[production]
09:59	<slyngshede@cumin1001>	START - Cookbook sre.ganeti.makevm for new host idm1001.wikimedia.org	[production]
09:58	<godog>	issue test page with: amtool alert add TestPage address=6.6.6.6 team=sre severity=page job=testjob --annotation=runbook=lol --annotation=description='this is a test page, please ignore' --annotation=dashboard=no	[production]
09:48	<arturo>	grid engine was failed over to shadow server, manually put it back into normal https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Grid#GridEngine_Master	[tools]
09:39	<arturo>	aborrero@tools-sgegrid-shadow:~$ sudo truncate -s 1G /var/log/syslog (was 17G, full root disk)	[tools]
09:35	<mvernon@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet	[production]
09:35	<godog>	puppet cert clean labstore100[67] - T319217	[production]
09:27	<mvernon@cumin1001>	START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet	[production]
09:07	<moritzm>	uploaded openjdk-8 8u362-ga-4~deb10u1 to component/jdk8 for buster-wikimedia (forward port of latest Java 8 security release)	[production]
08:36	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 9584	[production]
08:36	<ayounsi@cumin1001>	START - Cookbook sre.network.peering with action 'configure' for AS: 9584	[production]
08:25	<moritzm>	upgrading cassandra-dev to Java 8u362-ga-4	[production]
08:17	<apergos>	UTC morning backport and config training window done	[production]
08:15	<kartik@deploy1002>	Finished scap: Backport for [[gerrit:889656\|Enable Section Translation in 9 Wikipedias (T323825 T304865)]] (duration: 12m 38s)	[production]
08:05	<kartik@deploy1002>	kartik: Backport for [[gerrit:889656\|Enable Section Translation in 9 Wikipedias (T323825 T304865)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet	[production]
08:03	<kartik@deploy1002>	Started scap: Backport for [[gerrit:889656\|Enable Section Translation in 9 Wikipedias (T323825 T304865)]]	[production]
07:41	<elukey>	depool parse1012 to allow the service ops team to check it	[production]
07:39	<elukey>	powercycle parse1012 - CPU1 errors registered in `racadm getsel`	[production]
07:25	<bking@cumin1001>	END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-airflow1005.eqiad.wmnet with OS buster	[production]
06:15	<kart_>	Updated cxserver to 2023-02-15-085109-production (T328310, T110190, T116466)	[production]
06:11	<kartik@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/cxserver: apply	[production]
06:11	<kartik@deploy1002>	helmfile [eqiad] START helmfile.d/services/cxserver: apply	[production]
06:06	<kartik@deploy1002>	helmfile [codfw] DONE helmfile.d/services/cxserver: apply	[production]
06:05	<kartik@deploy1002>	helmfile [codfw] START helmfile.d/services/cxserver: apply	[production]
06:00	<kartik@deploy1002>	helmfile [staging] DONE helmfile.d/services/cxserver: apply	[production]
06:00	<kartik@deploy1002>	helmfile [staging] START helmfile.d/services/cxserver: apply	[production]
2023-02-15 §
23:30	<dduvall@deploy1002>	Synchronized php: group1 wikis to 1.40.0-wmf.23 refs T325586 (duration: 06m 43s)	[production]
23:23	<dduvall@deploy1002>	rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.23 refs T325586	[production]
23:15	<ladsgroup@deploy1002>	Finished scap: Backport for [[gerrit:889608\|Change linter maintenance scripts to use existing config varaibles (T329342)]] (duration: 08m 12s)	[production]
23:08	<ladsgroup@deploy1002>	ladsgroup: Backport for [[gerrit:889608\|Change linter maintenance scripts to use existing config varaibles (T329342)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet	[production]
23:06	<ladsgroup@deploy1002>	Started scap: Backport for [[gerrit:889608\|Change linter maintenance scripts to use existing config varaibles (T329342)]]	[production]