production SAL

6701-6750 of 10000 results (73ms)

2019-04-15 §
17:15	<mutante>	restarted wikibugs because it stopped talking	[production]
16:08	<onimisionipe>	pooling maps2001 - postgres reinit is complete	[production]
15:55	<Reedy>	changed /srv/mediawiki/docroot/wikimedia.org to a symlink to standard-docroot	[production]
15:53	<XioNoX>	add cloud-in4 firewall filter to codfw - T211921	[production]
15:31	<onimisionipe>	restarting prometheus-wmf-elasticsearch-exporter-9* on all elastic nodes	[production]
15:30	<onimisionipe>	restarting prometheus-wmf-elasticsearch-exporter-9200 on all elastic nodes	[production]
15:28	<_joe_>	systemctl reset-failed on ms-be1027, debmonitor session	[production]
15:24	<Amir1>	end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T219871)	[production]
14:55	<gehel>	deploying tilerator to maps1001 to validate deployment is working - T220982	[production]
14:55	<Amir1>	start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T219871)	[production]
14:43	<_joe_>	running apply-config-tilerator on maps1001	[production]
14:40	<_joe_>	running apply-config-karthoterian on maps1001	[production]
14:22	<cdanis>	T220982 cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'maps1*' 'sudo chmod -R a+r /srv/deployment/tilerator /srv/deployment/kartotherian'	[production]
14:21	<cdanis>	cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'maps1*' "disable-puppet 'bad permissions - T220982 - cdanis'"	[production]
14:18	<cdanis>	cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'maps*' 'sudo chmod -R a+r /srv/deployment/tilerator /srv/deployment/kartotherian'	[production]
14:18	<gehel>	reseting permissions on maps server fir /srv/deployment/kartotherian and /srv/deplyoment/tilerator	[production]
14:04	<moritzm>	rebooting ms-fe1005 for combined kernel/glibc/OpenSSL update	[production]
13:57	<jbond42>	upgrading puppet 4 -> 5 and facter 2 -> 3 on mediawiki::canary_appserver, mediawiki::appserver::canary_api and cache::cache roles	[production]
13:56	<gehel>	restart tilerator / kartotherian on all maps servers for openssl update	[production]
13:55	<godog>	start ms-be1013 decom - T220590	[production]
13:42	<godog>	reboot ms-be1013	[production]
13:09	<moritzm>	installing wget security updates on trusty hosts	[production]
12:59	<moritzm>	restarting archiva on archiva1001 for OpenJDK security update	[production]
12:50	<moritzm>	restarting Apache on matomo1001 to pick up OpenSSL update	[production]
12:14	<moritzm>	rolling restart of HHVM/Apache on deployment servers to pick up OpenSSL update	[production]
11:59	<fsero>	pointing boron docker builds to the new registry temporarily (docker builds on boron might fail)	[production]
11:35	<Amir1>	EU swat is done	[production]
11:26	<moritzm>	rolling restart of HHVM/Apache on labweb* to pick up OpenSSL update	[production]
09:58	<moritzm>	installing openssl1.0 security updates	[production]
09:18	<gehel>	unbanning elastic1029 from cluster	[production]
08:58	<moritzm>	updating mediawiki servers in eqiad to version 1.8.1 of the PHP extension for wikidiff	[production]
08:29	<onimisionipe>	increase wal_keep_segments on codfw maps master	[production]
08:19	<moritzm>	updating mediawiki servers in codfw to version 1.8.1 of the PHP extension for wikidiff	[production]
07:50	<Amir1>	ladsgroup@mwmaint1002:~$ mwscript maintenance/initSiteStats.php --wiki=hywwiki --active (T220936)	[production]
05:31	<marostegui>	Upgrade db1100	[production]
05:07	<marostegui>	powercycle mw1280 (crashed)	[production]
2019-04-14 §
06:10	<ebernhardson>	unban elastic1027 from eqiad-psi	[production]
05:36	<ebernhardson>	unbanning elastic1027 after about half the shards left and load dropped	[production]
05:31	<ebernhardson>	ban elastic1027 from elasticsearch-psi in eqiad	[production]
04:59	<ebernhardson>	restart elasticsearch_6@production-searhc-psi-eqiad on elastic1027 due to 100% cpu for last 30+ minutes	[production]
2019-04-13 §
18:46	<godog>	3h downtime for cloudvirt1015	[production]
15:58	<ebernhardson>	restart elasticsearch on elastic1027	[production]
15:34	<shdubsh>	restart recommendation_api on scb1001	[production]
15:33	<shdubsh>	restart recommendation_api on scb2001	[production]
10:46	<onimisionipe>	depooling maps2001 for postgres init	[production]
08:05	<gehel>	repooling wdqs1008 - data transfer completed - T220830	[production]
00:32	<krinkle@deploy1001>	Synchronized php-1.33.0-wmf.25/includes/: Idc19cc29764a / T220854 - hot fix (duration: 05m 37s)	[production]
2019-04-12 §
21:16	<Krinkle>	scap was unable to sync to 1 apache (connect to host cloudweb2001-dev.wikimedia.org port 22: Connection timed out)	[production]
21:10	<krinkle@deploy1001>	Synchronized php-1.33.0-wmf.25/extensions/ImageMap/includes/ImageMap.php: I0ee84f059da / T217087 (duration: 05m 12s)	[production]
19:27	<dzahn@cumin1001>	END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)	[production]