1-50 of 10000 results (47ms)
2018-10-07 §
22:16 <bd808> Got webservice to connect to gateway properly with: webservice stop; rm $HOME/service.manifest; webservice --backend=kubernetes python start [tools.ldap]
22:07 <bd808> Restarted, then stopped and started webservice to attempt to fix gateway timeout errors. Failures continue. Will investigate further [tools.ldap]
21:57 <zhuyifei1999_> restarted maintain-kubeusers on tools-k8s-master-01 T194859 [tools]
21:48 <zhuyifei1999_> maintain-kubeusers on tools-k8s-master-01 seems to be in an infinite loop of 10 seconds. installed python3-dbg [tools]
21:44 <zhuyifei1999_> journal on tools-k8s-master-01 is full of etcd failures, did a puppet run, nothing interesting happens [tools]
16:40 <dereckson> Reset user email for account "Dominic Mayers" (T206421) [production]
16:35 <elukey> run a script in tmux (my username) on mw2201 to poll the status of a mcrouter key/route every 10s using its admin api (very lightweight but kill if needed) [production]
14:52 <onimisionipe> repooling wdqs2003. Catched up on Lag and also Lag issues seems to be creeping on wdqs200[1|2] [production]
04:29 <SMalyshev> temp depooled wdqs2003 [production]
03:12 <ejegg> disabled all fundraising scheduled jobs - something that looks like disk issues on civi1001 [production]
2018-10-06 §
21:20 <gehel> repooling wdqs2003: catched up on updater lag [production]
20:43 <_joe_> restarting apache2 on puppetmaster1001 [production]
19:16 <onimisionipe> depooling wdqs2003 [production]
18:10 <elukey> restart Yarn Resource Manager on an-master1002 to force an-master1001 to take the active role back (failed over due to a zk conn issue) [analytics]
18:09 <elukey> restart Yarn Resource Manager on an-master1002 to force an-master1001 to take the active role back (failed over due to a zk conn issue) [production]
17:07 <onimisionipe> restarting wdqs-blazegraph on wdqs2003 [production]
17:02 <framawiki> qdeled 5794887 too, stuck unblock job [tools.totoazero]
16:50 <framawiki> qdeled 5323359 and 5794089, maj_articles_recents jobs who were stuck since Mon Sep 17 and Thu Sep 27 [tools.totoazero]
16:30 <framawiki> deployed 8550956 on quarry-web-01 [quarry]
13:59 <Reedy> cleared some large folders out of /tmp on deployment-deploy01 [releng]
13:48 <bblack> multatuli: update gdnsd package to 2.99.9930-beta-1+wmf1 [production]
13:47 <bblack> authdns1001: update gdnsd package to 2.99.9930-beta-1+wmf1 (correction to last msg) [production]
13:46 <bblack> authdns1001: update gdnsd package to 2.99.9161-beta-1+wmf1 [production]
12:57 <bblack> rebooting cp1076 [production]
12:49 <bblack> depool cp1076, apparently has disk issues [production]
06:00 <wikibugs> Updated channels.yaml to: 88c0a4fb868be28e9bdf37afdaa970f0d5ab61d7 Add some tags that might get added on their own, without Scoring-platform-team [tools.wikibugs]
2018-10-05 §
23:50 <bblack> <<<<<<< repooling eqiad edge caches, a few days ahead of intended switchback next Weds, to alleviate some traffic engineering concerns over the weekend >>>>>> [production]
20:48 <mutante> T191183 - it's still showing the error page as before but that isn't due to apache issues, it just needs additional ferm rules [production]
20:44 <mutante> gerrit - adding gerrit.wmfusercontent.org virtual host for avatars. applied first on gerrit2001, then on cobalt (T191183) [production]
20:03 <ejegg> updated fundraising CiviCRM from ebc2e0076c to 7a0d14015e [production]
19:48 <banyek> repooling labsdb1009 (T195747) [production]
19:47 <marxarelli> bringing integration-slave-docker-1040 back online [releng]
19:44 <smalyshev@deploy1001> Finished deploy [wdqs/wdqs@f8776de]: Redeploy 1009 (duration: 00m 26s) [production]
19:44 <smalyshev@deploy1001> Started deploy [wdqs/wdqs@f8776de]: Redeploy 1009 [production]
19:05 <marxarelli> taking integration-docker-slave-1040 offline for docker daemon restart [releng]
19:04 <marxarelli> bringing integration-slave-docker-1038/1041/1043 back online [releng]
19:02 <marxarelli> taking integration-docker-slave-1038/1041/1043 offline for docker daemon restart [releng]
19:01 <marxarelli> bringing integration-slave-docker-1033/1037 back online [releng]
18:58 <marxarelli> taking integration-docker-slave-1033/1037 offline for docker daemon restart [releng]
18:56 <marxarelli> bringing integration-slave-docker-1034 back online [releng]
18:56 <marxarelli> integration-puppetmaster01:/var/lib/git/operations/puppet is up-to-date again after manually updating submodules and subsequent automated git-sync-upstream [releng]
18:37 <bblack> authdns2001: upgraded gdnsd to 2.99.9930-beta [production]
18:33 <marxarelli> taking integration-slave-docker-1034 offline for docker daemon restart [releng]
18:31 <bblack> gdnsd-2.99.9930-beta-1+wmf1 uploaded to stretch-wikimedia [production]
18:26 <mutante> icinga - noop on all servers, no change, puppet re-enabled, operations normal [production]
18:08 <mutante> disabling puppet on icinga for 5 min for extra safety before a change that should be noop [production]
17:58 <banyek> depooling labsdb1009 (T195747) [production]
17:50 <banyek> repooling labsdb1011 (T195747) [production]
17:12 <elukey> set etcd in codfw as read/write (was readonly) and eqiad as readonly (was read/write) [production]
14:57 <banyek> depooling labsdb1011 (T195747) [production]