production SAL

2451-2500 of 10000 results (35ms)

2021-06-08 §
10:53	<kormat@cumin1001>	dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: reimaged to buster T283131', diff saved to https://phabricator.wikimedia.org/P16326 and previous config saved to /var/cache/conftool/dbconfig/20210608-105346-kormat.json	[production]
10:50	<jbond@deploy1002>	Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4) (duration: 00m 53s)	[production]
10:49	<jbond@deploy1002>	Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4)	[production]
10:16	<liw>	testing upcoming Scap release on beta	[production]
10:01	<XioNoX>	upgrade Routinator 3000 to 0.9.0 on rpki2001 - T282469	[production]
09:58	<jbond@deploy1002>	Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4) (duration: 00m 54s)	[production]
09:57	<jbond@deploy1002>	Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4)	[production]
09:52	<oblivian@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
09:04	<jayme>	removing docker-images from registry: releng/ci-jessie, releng/ci-src-setup, releng/composer-php56, releng/composer-test-php56, releng/npm, releng/npm-test, releng/npm-test-3d2png, releng/npm-test-graphoid, releng/npm-test-librdkafka, releng/npm-test-maps-service, releng/php56, releng/quibble-jessie, releng/quibble-jessie-hhvm, releng/quibble-jessie-php56 - T251918	[production]
08:31	<dcausse>	depooling wdqs1006 (lag)	[production]
08:29	<dcausse>	restarting blazegraph on wdqs1006	[production]
08:19	<elukey@cumin1001>	END (PASS) - Cookbook sre.dns.netbox (exit_code=0)	[production]
08:13	<oblivian@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
08:13	<elukey@cumin1001>	START - Cookbook sre.dns.netbox	[production]
07:49	<jmm@cumin1001>	END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2002.codfw.wmnet	[production]
07:41	<jmm@cumin1001>	START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet	[production]
07:40	<oblivian@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
07:37	<oblivian@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
07:35	<oblivian@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
07:29	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16324 and previous config saved to /var/cache/conftool/dbconfig/20210608-072937-root.json	[production]
07:14	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16323 and previous config saved to /var/cache/conftool/dbconfig/20210608-071433-root.json	[production]
06:59	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16322 and previous config saved to /var/cache/conftool/dbconfig/20210608-065930-root.json	[production]
06:52	<tgr>	T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index with gerrit:696307 applied	[production]
06:44	<marostegui@cumin1001>	dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16321 and previous config saved to /var/cache/conftool/dbconfig/20210608-064426-root.json	[production]
06:40	<marostegui@cumin1001>	dbctl commit (dc=all): 'Depool db1161 for upgrade', diff saved to https://phabricator.wikimedia.org/P16320 and previous config saved to /var/cache/conftool/dbconfig/20210608-064055-marostegui.json	[production]
06:27	<elukey>	clean some airflow logs on an-airflow1001 as one off to free space (had a chat with the Search team first)	[production]
05:46	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE	[production]
05:44	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE	[production]
05:17	<marostegui@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE	[production]
05:15	<marostegui@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE	[production]
04:54	<marostegui>	Repool clouddb1019:3314	[production]
04:07	<ryankemper@cumin1001>	END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)	[production]
02:38	<ryankemper@cumin1001>	START - Cookbook sre.wdqs.data-transfer	[production]
02:38	<ryankemper>	T284445 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1012.eqiad.wmnet --reason "repairing overinflated blazegraph journal" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs`	[production]
02:37	<ryankemper>	T284445 after manually stopping blazegraph/wdqs-updater, `sudo rm -fv /srv/wdqs/wikidata.jnl` on `wdqs1012` (clearing old overinflated journal file away before xferring new one)	[production]
02:34	<ryankemper>	[WDQS] `ryankemper@wdqs1005:~$ sudo depool` (catching up on ~7h of lag)	[production]
2021-06-07 §
21:26	<otto@cumin1001>	END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)	[production]
21:12	<sbassett>	Deployed security patch for T284364	[production]
19:30	<ryankemper>	T284479 [Cirrussearch] We'll keep monitoring. For now this incident is resolved. Glancing at our current volume relative to what we'd expect, the numbers we see match what we'd expect. If we're accidentally banning any innocent requests they must be an incredibly small percentage of the total otherwise we'd see significantly lower volume than expected	[production]
19:25	<ryankemper>	T284479 [Cirrussearch] Seeing the expected drop in `entity_full_text` requests here: https://grafana-rw.wikimedia.org/d/000000455/elasticsearch-percentiles?viewPanel=47&orgId=1&from=now-12h&to=now As a result we're no longer rejecting any requests	[production]
19:21	<ryankemper>	T284479 [Cirrussearch] We're working on rolling out https://gerrit.wikimedia.org/r/698607, which will ban search API requests that match the Google App Engine IP range `2600:1900::0/28` AND whose user agent includes `HeadlessChrome`	[production]
19:19	<cdanis>	T284479 ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕞🍵 sudo cumin -b16 'A:cp-text' "run-puppet-agent"	[production]
19:07	<andrew@deploy1002>	Finished deploy [horizon/deploy@6199b67]: disable shelve/unshelve T284462 (duration: 04m 53s)	[production]
19:02	<andrew@deploy1002>	Started deploy [horizon/deploy@6199b67]: disable shelve/unshelve T284462	[production]
19:01	<andrew@deploy1002>	Finished deploy [horizon/deploy@6199b67]: disable shelve/unshelve (duration: 02m 01s)	[production]
18:59	<andrew@deploy1002>	Started deploy [horizon/deploy@6199b67]: disable shelve/unshelve	[production]
18:57	<herron>	prometheus3001: moved /srv back to vda1 filesystem T243057	[production]
18:25	<urbanecm>	[urbanecm@mwmaint1002 /srv/mediawiki/php-1.37.0-wmf.7]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=skwiki --phab=T284149	[production]
18:24	<urbanecm@deploy1002>	Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/includes/WelcomeSurvey.php: 368b5d9: 0e79aee: WelcomeSurvey backports (T284127, T284257; 2/2) (duration: 00m 57s)	[production]
18:22	<urbanecm@deploy1002>	Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/extension.json: 368b5d9: 0e79aee: WelcomeSurvey backports (T284127, T284257; 1/2) (duration: 00m 56s)	[production]