__all__ SAL

4901-4950 of 10000 results (48ms)

2021-10-04 §
06:56	<joal>	Kill-restart pageview-monthly_dump-coord to apply fix for SLA	[analytics]
06:44	<elukey>	depool + restart blazegraph + restart updater on wdqs1004	[production]
05:50	<ladsgroup@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .	[production]
05:49	<ladsgroup@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .	[production]
05:47	<ladsgroup@deploy1002>	helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .	[production]
2021-10-03 §
21:30	<bstorm>	rebuilding buster containers since they are also affected T291387 T292355	[tools]
21:29	<bstorm>	rebuilt stretch containers for potential issues with LE cert updates T291387	[tools]
18:35	<chicocvenancio>	building python 2.7.18 to use in python 3.9 container T292355	[tools.video2commons]
17:28	<Operator873\|CVN>	restarted bots 5, 12, 28, and 29 failed to regain nick.	[cvn]
14:45	<_joe_>	restarting acmechief on acmechief1001	[production]
12:55	<kormat@cumin1001>	dbctl commit (dc=all): 'Depool db1127, bad ram', diff saved to https://phabricator.wikimedia.org/P17414 and previous config saved to /var/cache/conftool/dbconfig/20211003-125530-kormat.json	[production]
12:02	<majavah>	update to python 3.9, after it broke when due to recent LE changes, was using python 3.4 / jessie	[tools.sge-jobs]
08:24	<elukey>	powercycle cp5006 (unresponsive to ssh, remote tty available but not able to login as root, no prometheus metrics in hours)	[production]
08:23	<elukey@puppetmaster1001>	conftool action : set/pooled=no; selector: name=cp5006.eqsin.wmnet	[production]
2021-10-02 §
21:31	<Krinkle>	krinkle@cvn-app8 Idem	[cvn]
21:29	<Krinkle>	krinkle@cvn-app9 `sudo sed -i 's#mozilla/DST_Root_CA_X3.crt#!mozilla/DST_Root_CA_X3.crt#' /etc/ca-certificates.conf && sudo update-ca-certificates` ref T292289, ref https://github.com/mono/mono/issues/21233	[cvn]
21:24	<Krinkle>	/cs flags #cvn-wp-es LuchoCR local_op ; verified nick and sysop at es.wikipedia	[cvn]
21:11	<Krinkle>	/cs flags #cvn-wp-en tn local_op	[cvn]
21:04	<Krinkle>	/cs flags #cvn-wp-en tn voiced - verified nick	[cvn]
17:28	<bd808@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .	[production]
16:10	<bd808@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .	[production]
2021-10-01 §
23:19	<bd808@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .	[production]
22:27	<mutante>	puppetmaster2001 - systemctl reset-failed	[production]
22:16	<mutante>	puppetmaster2001 systemctl disable geoip_update_ipinfo.timer	[production]
22:15	<mutante>	puppetmaster2001 - sudo /usr/local/bin/geoipupdate_job after adding new shell command and timer - succesfully downloaded enterprise database for T288844	[production]
21:59	<bd808>	clush -w @all -b 'sudo sed -i "s#mozilla/DST_Root_CA_X3.crt#!mozilla/DST_Root_CA_X3.crt#" /etc/ca-certificates.conf && sudo update-ca-certificates' for T292289	[tools]
21:56	<bd808@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .	[production]
21:44	<mutante>	puppetmasters - temp. disabling puppet one more time, now for a different deploy, to fetch an additional MaxMind database - T288844	[production]
21:19	<mutante>	puppetmaster2001 - puppet removed cron sync_volatile and cron sync_ca - starting and verifying new timers: 'systemctl status sync-puppet-volatile', 'systemctl status sync-puppet-ca' T273673	[production]
21:12	<mutante>	puppetmaster1002, puppetmaster1003, puppetmaster2002, puppetmaster2003: re-enabled puppet, they are backends. backends don't have the sync cron/job/timer, so noop as well, just like 1004/1005/2004/2005. this just leaves the actual change on 2001 - T273673	[production]
21:07	<mutante>	puppetmaster1004, puppetmaster1005, puppetmaster2004, puppetmaster2005: re-enabled puppet, they are "insetup" role	[production]
21:06	<mbsantos@deploy1002>	Finished deploy [kartotherian/deploy@d309a6e] (eqiad): tegola: reduce load to 50% during the weekend (duration: 00m 54s)	[production]
21:05	<mbsantos@deploy1002>	Started deploy [kartotherian/deploy@d309a6e] (eqiad): tegola: reduce load to 50% during the weekend	[production]
21:05	<mutante>	puppetmaster1001 - re-enabled puppet, noop as expected, the passive host pulls from the active one, so only 2001 has the cron/job/timer	[production]
21:05	<mwdebug-deploy@deploy1002>	helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
21:02	<mwdebug-deploy@deploy1002>	helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .	[production]
21:01	<legoktm@deploy1002>	Synchronized wmf-config/CommonSettings.php: Revert "Have PdfHandler use Shellbox on Commons for 10% of requests" (duration: 00m 59s)	[production]
20:58	<mutante>	temp disabling puppet on puppetmasters - deploying gerrit:724115 (gerrit:723310) T273673	[production]
18:58	<robh@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-db1002.eqiad.wmnet with reason: REIMAGE	[production]
18:56	<robh@cumin1001>	END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-db1001.eqiad.wmnet with reason: REIMAGE	[production]
18:55	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-db1002.eqiad.wmnet with reason: REIMAGE	[production]
18:53	<robh@cumin1001>	START - Cookbook sre.hosts.downtime for 2:00:00 on an-db1001.eqiad.wmnet with reason: REIMAGE	[production]
18:07	<robh@cumin1001>	END (ERROR) - Cookbook sre.experimental.reimage (exit_code=97) for host an-db1001.eqiad.wmnet	[production]
18:05	<robh@cumin1001>	START - Cookbook sre.experimental.reimage for host an-db1001.eqiad.wmnet	[production]
17:58	<effie>	depool mw1025, mw1319, mw1312 for test	[production]
16:20	<dancy>	testing upcoming Scap 4.0.2 release on beta	[production]
15:11	<btullis>	sudo -u analytics kerberos-run-command analytics /usr/local/bin/refine_eventlogging_legacy --ignore_failure_flag=true --table_include_regex='editoractivation' --since='2021-09-29T22:00:00.000Z' --until='2021-09-30T23:00:00.000Z'	[analytics]
14:04	<bblack>	C:envoyproxy (appservers and others): restarting envoyproxy	[production]
14:04	<bblack>	C:envoyproxy (appservers and others): ca-certificates updated via cumin to workaround T292291 issues	[production]
13:45	<elukey@deploy1002>	helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.	[production]