production SAL

4051-4100 of 10000 results (38ms)

2021-08-10 §
12:23	<kormat>	non-destructive (🤞) testing of db-switchover against s2/eqiad T288500	[production]
12:17	<ppchelko@deploy1002>	Started deploy [restbase/deploy@5791a7a]: Add count parameter to recommendations API T287227	[production]
11:27	<dzahn@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue	[production]
11:27	<dzahn@cumin1001>	START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue	[production]
10:56	<marostegui>	Install 10.4.21 on db1169 (s1)	[production]
10:54	<jayme@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
10:53	<mutante>	etherpad deleting 2 pads as requested in T288328	[production]
10:52	<marostegui>	Install 10.4.21 on db1096 (s5 and s6)	[production]
10:34	<elukey@deploy1002>	helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.	[production]
10:34	<elukey@deploy1002>	helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.	[production]
10:33	<elukey@deploy1002>	helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.	[production]
10:33	<elukey@deploy1002>	helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.	[production]
10:28	<oblivian@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
10:27	<oblivian@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
10:24	<oblivian@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
09:55	<lucaswerkmeister-wmde@deploy1002>	Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:708309\|Remove $wmgWikibaseClientRepoDatabase (T257260)]] (2/2, beta) (duration: 00m 57s)	[production]
09:54	<lucaswerkmeister-wmde@deploy1002>	Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708309\|Remove $wmgWikibaseClientRepoDatabase (T257260)]] (1/2, prod) (duration: 00m 57s)	[production]
09:50	<lucaswerkmeister-wmde@deploy1002>	Synchronized wmf-config/Wikibase.php: Config: [[gerrit:708308\|Stop setting $wgWBClientSettings['repoDatabase'] (T257260)]] (duration: 00m 58s)	[production]
09:47	<jayme@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
09:23	<ariel@deploy1002>	Finished deploy [dumps/dumps@72ff209]: refuse to use info from corrupt run settings file (duration: 00m 03s)	[production]
09:22	<ariel@deploy1002>	Started deploy [dumps/dumps@72ff209]: refuse to use info from corrupt run settings file	[production]
09:17	<kormat>	running non-destructive test against s7/codfw (db2107/db2014) T288500	[production]
09:05	<jayme@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
09:04	<moritzm>	removing stale Java 8 packages from logstash1024/1025/2023/2024/2025 (ELK7 Logstash cluster is on Java 11 for a while now)	[production]
09:00	<oblivian@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
08:58	<ariel@deploy1002>	Finished deploy [dumps/dumps@170e394]: more resilience when reading bad run cache settings files (duration: 00m 03s)	[production]
08:58	<ariel@deploy1002>	Started deploy [dumps/dumps@170e394]: more resilience when reading bad run cache settings files	[production]
08:49	<oblivian@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
08:20	<elukey@deploy1002>	helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.	[production]
08:20	<elukey@deploy1002>	helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.	[production]
08:19	<jayme@deploy1002>	helmfile [codfw] DONE helmfile.d/admin 'apply'.	[production]
08:18	<jayme@deploy1002>	helmfile [codfw] START helmfile.d/admin 'apply'.	[production]
08:16	<jayme@deploy1002>	helmfile [eqiad] DONE helmfile.d/admin 'apply'.	[production]
08:16	<jayme@deploy1002>	helmfile [eqiad] START helmfile.d/admin 'apply'.	[production]
08:15	<jayme@deploy1002>	helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.	[production]
08:15	<jayme@deploy1002>	helmfile [staging-eqiad] START helmfile.d/admin 'apply'.	[production]
08:15	<jayme@deploy1002>	helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.	[production]
08:14	<jayme@deploy1002>	helmfile [staging-codfw] START helmfile.d/admin 'apply'.	[production]
08:06	<godog>	upload thanos 0.21.1-1 and upgrade prometheus1004 / thanos-fe2001 to it - T288326	[production]
08:03	<moritzm>	installing openjdk-8 security updates on stretch	[production]
07:33	<moritzm>	installing lynx security updates	[production]
05:56	<marostegui@cumin1001>	dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16987 and previous config saved to /var/cache/conftool/dbconfig/20210810-055642-root.json	[production]
05:41	<marostegui@cumin1001>	dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16986 and previous config saved to /var/cache/conftool/dbconfig/20210810-054139-root.json	[production]
05:26	<marostegui@cumin1001>	dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16985 and previous config saved to /var/cache/conftool/dbconfig/20210810-052635-root.json	[production]
05:11	<marostegui@cumin1001>	dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16984 and previous config saved to /var/cache/conftool/dbconfig/20210810-051131-root.json	[production]
05:06	<marostegui@cumin1001>	dbctl commit (dc=all): 'Set s2 as read-write again - master has not been swapped T287454', diff saved to https://phabricator.wikimedia.org/P16983 and previous config saved to /var/cache/conftool/dbconfig/20210810-050604-root.json	[production]
05:00	<marostegui@cumin1001>	dbctl commit (dc=all): 'Set s2 codfw as read-only for maintenance - T287454', diff saved to https://phabricator.wikimedia.org/P16982 and previous config saved to /var/cache/conftool/dbconfig/20210810-050051-root.json	[production]
05:00	<marostegui>	Starting s2 codfw failover from db2107 to db2104 - T287454	[production]
04:23	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Master switchover s2 T287454	[production]
04:23	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Master switchover s2 T287454	[production]