production SAL

301-350 of 10000 results (72ms)

2024-02-05 §
08:19	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P56182 and previous config saved to /var/cache/conftool/dbconfig/20240205-081914-marostegui.json	[production]
08:10	<jmm@cumin2002>	END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nskaggs out of all services on: 2205 hosts	[production]
08:09	<jmm@cumin2002>	START - Cookbook sre.idm.logout Logging Nskaggs out of all services on: 2205 hosts	[production]
08:04	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2128 (T355609)', diff saved to https://phabricator.wikimedia.org/P56181 and previous config saved to /var/cache/conftool/dbconfig/20240205-080407-marostegui.json	[production]
07:58	<marostegui@cumin1002>	dbctl commit (dc=all): 'Depooling db2128 (T355609)', diff saved to https://phabricator.wikimedia.org/P56180 and previous config saved to /var/cache/conftool/dbconfig/20240205-075856-marostegui.json	[production]
07:58	<marostegui@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2186.codfw.wmnet with reason: Maintenance	[production]
07:58	<marostegui@cumin1002>	START - Cookbook sre.hosts.downtime for 12:00:00 on db2186.codfw.wmnet with reason: Maintenance	[production]
07:58	<marostegui@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2128.codfw.wmnet with reason: Maintenance	[production]
07:58	<marostegui@cumin1002>	START - Cookbook sre.hosts.downtime for 6:00:00 on db2128.codfw.wmnet with reason: Maintenance	[production]
07:58	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2123 (T355609)', diff saved to https://phabricator.wikimedia.org/P56179 and previous config saved to /var/cache/conftool/dbconfig/20240205-075818-marostegui.json	[production]
07:56	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1002.eqiad.wmnet	[production]
07:55	<zabe>	zabe@mwmaint2002:/tmp/uploads$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user="Illegitimate Barrister" . # T356607	[production]
07:51	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netboxdb1002.eqiad.wmnet	[production]
07:50	<jmm@cumin2002>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2002.codfw.wmnet	[production]
07:46	<jmm@cumin2002>	START - Cookbook sre.hosts.reboot-single for host netboxdb2002.codfw.wmnet	[production]
07:43	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P56178 and previous config saved to /var/cache/conftool/dbconfig/20240205-074312-marostegui.json	[production]
07:28	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P56177 and previous config saved to /var/cache/conftool/dbconfig/20240205-072805-marostegui.json	[production]
07:12	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2123 (T355609)', diff saved to https://phabricator.wikimedia.org/P56176 and previous config saved to /var/cache/conftool/dbconfig/20240205-071259-marostegui.json	[production]
07:07	<marostegui@cumin1002>	dbctl commit (dc=all): 'Depooling db2123 (T355609)', diff saved to https://phabricator.wikimedia.org/P56175 and previous config saved to /var/cache/conftool/dbconfig/20240205-070745-marostegui.json	[production]
07:07	<marostegui@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance	[production]
07:07	<marostegui@cumin1002>	START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance	[production]
07:07	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2111 (T355609)', diff saved to https://phabricator.wikimedia.org/P56174 and previous config saved to /var/cache/conftool/dbconfig/20240205-070723-marostegui.json	[production]
06:56	<marostegui>	dbmaint Drop indexes on site table on s8 T356417	[production]
06:56	<marostegui>	dbamaint Drop mathoid, mathlatexml tables T355050	[production]
06:54	<marostegui>	Drop indexes on site table on s8 T356417	[production]
06:52	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P56173 and previous config saved to /var/cache/conftool/dbconfig/20240205-065216-marostegui.json	[production]
06:37	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P56172 and previous config saved to /var/cache/conftool/dbconfig/20240205-063709-marostegui.json	[production]
06:22	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db2111 (T355609)', diff saved to https://phabricator.wikimedia.org/P56171 and previous config saved to /var/cache/conftool/dbconfig/20240205-062203-marostegui.json	[production]
06:15	<marostegui@cumin1002>	dbctl commit (dc=all): 'Depooling db2111 (T355609)', diff saved to https://phabricator.wikimedia.org/P56170 and previous config saved to /var/cache/conftool/dbconfig/20240205-061511-marostegui.json	[production]
06:15	<marostegui@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2111.codfw.wmnet with reason: Maintenance	[production]
06:14	<marostegui@cumin1002>	START - Cookbook sre.hosts.downtime for 6:00:00 on db2111.codfw.wmnet with reason: Maintenance	[production]
06:11	<marostegui>	Drop mathoid, mathlatexml tables T355050	[production]
06:10	<marostegui@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance	[production]
06:09	<marostegui@cumin1002>	START - Cookbook sre.hosts.downtime for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance	[production]
00:59	<rzl@deploy2002>	helmfile [codfw] DONE helmfile.d/admin 'apply'.	[production]
00:59	<rzl@deploy2002>	helmfile [codfw] START helmfile.d/admin 'apply'.	[production]
2024-02-04 §
01:53	<urandom>	decommissioning cassandra, restbase2018-{a,b,c} — T352469	[production]
01:49	<eevans@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2018.codfw.wmnet with reason: Decommissioning — T352469	[production]
01:49	<eevans@cumin1002>	START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2018.codfw.wmnet with reason: Decommissioning — T352469	[production]
2024-02-03 §
13:30	<eevans@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2017.codfw.wmnet with reason: Decommissioning — T352469	[production]
13:30	<eevans@cumin1002>	START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2017.codfw.wmnet with reason: Decommissioning — T352469	[production]
08:19	<ryankemper>	[cloudelastic] Replica shards have re-initialized; cluster is back to green. Will probably see a wall of `ElasticSearch unassigned shard check - 9400` resolve messages soon, fingers crossed	[production]
08:15	<ryankemper>	[cloduelastic] Re-enabled replica allocation on `cloudelastic-omega-eqiad` => `curl -H 'Content-Type: application/json' -XPUT https://cloudelastic.wikimedia.org:9443/_cluster/settings -d '{"transient":{"cluster.routing.allocation":{"enable": "all"}}}'`	[production]
08:10	<ryankemper>	[cloudelastic] Seeing `replica allocations are forbidden due to cluster setting [cluster.routing.allocation.enable=primaries`; that likely explains the many unassigned shards of cloudelastic.wikimedia.org:9400 ... feels like a previous cookbook run didn't back out successfully leaving replica allocation disabled	[production]
08:09	<ryankemper>	[cloudelastic] current state: `{"cluster_name":"cloudelastic-omega-eqiad","status":"yellow","number_of_nodes":10,"number_of_data_nodes":10,"active_primary_shards":798,"active_shards":1438,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":160,"delayed_unassigned_shards":0,"active_shards_percent_as_number":89.98748435544431}`	[production]
01:13	<marostegui@cumin1002>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance	[production]
01:13	<marostegui@cumin1002>	START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance	[production]
01:13	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1249 (T355609)', diff saved to https://phabricator.wikimedia.org/P56168 and previous config saved to /var/cache/conftool/dbconfig/20240203-011337-marostegui.json	[production]
00:58	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P56167 and previous config saved to /var/cache/conftool/dbconfig/20240203-005830-marostegui.json	[production]
00:43	<marostegui@cumin1002>	dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P56166 and previous config saved to /var/cache/conftool/dbconfig/20240203-004324-marostegui.json	[production]