production SAL

1601-1650 of 10000 results (75ms)

2022-10-04 §
07:11	<elukey@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging1001.eqiad.wmnet with reason: Kafka PKI upgrade	[production]
07:11	<elukey@cumin1001>	START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging1001.eqiad.wmnet with reason: Kafka PKI upgrade	[production]
07:06	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35328 and previous config saved to /var/cache/conftool/dbconfig/20221004-070653-root.json	[production]
06:51	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35327 and previous config saved to /var/cache/conftool/dbconfig/20221004-065148-root.json	[production]
06:43	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: apply	[production]
06:42	<mwdebug-deploy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply	[production]
06:42	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply	[production]
06:39	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
06:36	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35326 and previous config saved to /var/cache/conftool/dbconfig/20221004-063643-root.json	[production]
06:33	<ayounsi@cumin1001>	END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 25885	[production]
06:32	<ayounsi@cumin1001>	START - Cookbook sre.network.peering with action 'configure' for AS: 25885	[production]
06:21	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35325 and previous config saved to /var/cache/conftool/dbconfig/20221004-062138-root.json	[production]
06:06	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35324 and previous config saved to /var/cache/conftool/dbconfig/20221004-060633-root.json	[production]
05:51	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1189 (re)pooling @ 3%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35323 and previous config saved to /var/cache/conftool/dbconfig/20221004-055128-root.json	[production]
05:36	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35322 and previous config saved to /var/cache/conftool/dbconfig/20221004-053623-root.json	[production]
03:12	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: apply	[production]
03:09	<mwdebug-deploy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply	[production]
03:09	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply	[production]
03:07	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
02:31	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: apply	[production]
02:30	<mwdebug-deploy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply	[production]
02:30	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply	[production]
02:28	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
02:13	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: apply	[production]
02:09	<mwdebug-deploy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply	[production]
02:09	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply	[production]
02:05	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply	[production]
2022-10-03 §
21:45	<robh@cumin2002>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
21:44	<robh@cumin2002>	START - Cookbook sre.dns.netbox	[production]
21:44	<robh@cumin2002>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye	[production]
21:18	<robh@cumin2002>	START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye	[production]
19:41	<ryankemper>	[Elastic] Unbanned `elastic1066`	[production]
19:37	<ryankemper>	[Elastic] Restarted psi on `elastic1066`; will unban host after process is up and running	[production]
19:32	<robh>	msw1-ulsfo swap successful, mgmt recovering in icinga and tested connection with 3 servers all work	[production]
19:25	<robh>	msw1-ulsfo swap, some mgmt flapping expected, swap complete but not powered back up yet	[production]
19:22	<ryankemper>	[Elastic] Banned `elastic1066` (`curl -H 'Content-Type: application/json' -XPUT http://localhost:9600/_cluster/settings -d '{"transient":{"cluster.routing.allocation.exclude":{"_host": "","_name": "elastic1066-production-search-psi-eqiad"}}}'`); will restart elasticsearch-psi after shards drain	[production]
19:15	<robh@cumin2002>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye	[production]
18:48	<robh@cumin2002>	START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye	[production]
18:41	<robh@cumin2002>	END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye	[production]
18:34	<robh@cumin2002>	START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye	[production]
18:30	<robh@cumin2002>	END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED	[production]
18:30	<bblack@cumin1001>	END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4045.ulsfo.wmnet with OS buster	[production]
18:21	<robh@cumin2002>	START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED	[production]
18:12	<robh@cumin2002>	END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED	[production]
18:06	<robh@cumin2002>	START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED	[production]
18:04	<robh@cumin2002>	END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED	[production]
18:00	<robh@cumin2002>	START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED	[production]
17:52	<robh@cumin2002>	END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED	[production]
17:42	<robh@cumin2002>	START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED	[production]
17:41	<robh@cumin2002>	END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns4003	[production]