production SAL

1551-1600 of 10000 results (47ms)

2022-02-10 §
22:39	<mutante>	etherpad - succesfully switched to etherpad1003 (bullseye) and etherpad 1.8.16 - on second attempt after making it listen on IPv6 to work behind envoy (T300568) - https://gerrit.wikimedia.org/r/c/operations/puppet/+/761727/	[production]
22:34	<bblack@cumin1001>	START - Cookbook sre.dns.netbox	[production]
22:31	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance	[production]
22:31	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance	[production]
22:28	<bblack@cumin1001>	END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)	[production]
22:27	<bblack@cumin1001>	END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS buster	[production]
22:26	<bblack@cumin1001>	START - Cookbook sre.dns.netbox	[production]
22:24	<mutante>	etherpad - one more short downtime for maintenance - downtimed in alertmanager and icinga	[production]
22:04	<bblack@cumin1001>	START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS buster	[production]
21:54	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance	[production]
21:53	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance	[production]
21:53	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298554)', diff saved to https://phabricator.wikimedia.org/P20589 and previous config saved to /var/cache/conftool/dbconfig/20220210-215354-ladsgroup.json	[production]
21:38	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P20588 and previous config saved to /var/cache/conftool/dbconfig/20220210-213849-ladsgroup.json	[production]
21:23	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P20587 and previous config saved to /var/cache/conftool/dbconfig/20220210-212344-ladsgroup.json	[production]
21:16	<bblack>	cr1-eqiad - manual config, static fallback for high-traffic1 to lvs1017	[production]
21:08	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298554)', diff saved to https://phabricator.wikimedia.org/P20586 and previous config saved to /var/cache/conftool/dbconfig/20220210-210839-ladsgroup.json	[production]
21:08	<bblack>	lvs1017 - bringing pybal online with real routing, flips high-traffic (text-cluster) traffic from lvs1020 -> lvs1017	[production]
20:48	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Depooling db1112 (T298554)', diff saved to https://phabricator.wikimedia.org/P20585 and previous config saved to /var/cache/conftool/dbconfig/20220210-204831-ladsgroup.json	[production]
20:48	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance	[production]
20:48	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance	[production]
20:48	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance	[production]
20:48	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance	[production]
20:48	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298554)', diff saved to https://phabricator.wikimedia.org/P20584 and previous config saved to /var/cache/conftool/dbconfig/20220210-204818-ladsgroup.json	[production]
20:33	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P20583 and previous config saved to /var/cache/conftool/dbconfig/20220210-203313-ladsgroup.json	[production]
20:18	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P20582 and previous config saved to /var/cache/conftool/dbconfig/20220210-201808-ladsgroup.json	[production]
20:17	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn	[production]
20:15	<mwdebug-deploy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn	[production]
20:15	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn	[production]
20:14	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn	[production]
20:08	<jhuneidi@deploy1002>	rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.21 refs T300197	[production]
20:03	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298554)', diff saved to https://phabricator.wikimedia.org/P20581 and previous config saved to /var/cache/conftool/dbconfig/20220210-200304-ladsgroup.json	[production]
19:45	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Depooling db1166 (T298554)', diff saved to https://phabricator.wikimedia.org/P20580 and previous config saved to /var/cache/conftool/dbconfig/20220210-194518-ladsgroup.json	[production]
19:45	<ladsgroup@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance	[production]
19:45	<ladsgroup@cumin1001>	START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance	[production]
19:45	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298554)', diff saved to https://phabricator.wikimedia.org/P20579 and previous config saved to /var/cache/conftool/dbconfig/20220210-194510-ladsgroup.json	[production]
19:30	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P20578 and previous config saved to /var/cache/conftool/dbconfig/20220210-193005-ladsgroup.json	[production]
19:25	<bblack>	lvs1017 reboot again for clean network config - T301142	[production]
19:23	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn	[production]
19:19	<mwdebug-deploy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn	[production]
19:19	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn	[production]
19:18	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn	[production]
19:15	<ladsgroup@cumin1001>	dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P20577 and previous config saved to /var/cache/conftool/dbconfig/20220210-191501-ladsgroup.json	[production]
19:13	<jgiannelos@deploy1002>	Finished deploy [kartotherian/deploy@828a428] (eqiad): Configure geoshapes postgres max conns (duration: 01m 29s)	[production]
19:13	<mwdebug-deploy@deploy1002>	helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn	[production]
19:13	<urbanecm@deploy1002>	Synchronized wmf-config/flaggedrevs.php: 72f3b31: Migrate $wmfStandardAutoPromote to $wmgStandardAutoPromote (T45956) (duration: 00m 49s)	[production]
19:12	<mwdebug-deploy@deploy1002>	helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn	[production]
19:12	<mwdebug-deploy@deploy1002>	helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn	[production]
19:12	<jgiannelos@deploy1002>	Started deploy [kartotherian/deploy@828a428] (eqiad): Configure geoshapes postgres max conns	[production]
19:11	<mwdebug-deploy@deploy1002>	helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn	[production]
19:11	<bblack>	lvs1017 rebooting for sanity-check after prod config - T301142	[production]