151-200 of 10000 results (29ms)
2016-07-05 §
10:05 <elukey@palladium> conftool action : set/weight=30; selector: mw1274.eqiad.wmnet [production]
10:05 <elukey@palladium> conftool action : set/weight=30; selector: mw1273.eqiad.wmnet [production]
09:59 <elukey@palladium> conftool action : set/weight=30; selector: mw1272.eqiad.wmnet [production]
09:31 <_joe_> shutting down mw1009-16 for decommissioning [production]
09:06 <_joe_> decommissioning mw1009-16 [production]
08:38 <elukey@palladium> conftool action : set/pooled=yes; selector: mw1275.eqiad.wmnet [production]
08:36 <elukey@palladium> conftool action : set/pooled=yes; selector: mw1274.eqiad.wmnet [production]
08:32 <gehel> deleting enwikisource_titlesuggest on elasticsearch codfw (index creation issue during cluster restart) [production]
08:31 <elukey@palladium> conftool action : set/pooled=yes; selector: mw1273.eqiad.wmnet [production]
08:23 <elukey@palladium> conftool action : set/pooled=yes; selector: mw1272.eqiad.wmnet [production]
08:21 <elukey> adding and pooling new appservers - mw127[2345].eqiad [production]
08:07 <godog> swift codfw-prod: ms-be202[567] weight 1500 [production]
07:55 <jynus> dropping etherpad_restore2 database from m1 T138516 [production]
07:40 <akosiaris> T138516 forcing a puppet run on cache::misc hosts after merging https://gerrit.wikimedia.org/r/297352 [production]
07:29 <akosiaris> T138516 stop the secondary etherpad instance on etherpad1001. etherpad-restore.wikimedia.org has served its purpose, killing it [production]
02:44 <l10nupdate@tin> ResourceLoader cache refresh completed at Tue Jul 5 02:44:09 UTC 2016 (duration 6m 12s) [production]
02:37 <mwdeploy@tin> scap sync-l10n completed (1.28.0-wmf.8) (duration: 17m 13s) [production]
2016-07-04 §
20:28 <jynus> removing /tmp/joal/sstables on all analytics10* hosts [production]
20:22 <jynus> deleted 21GB worth of temporary files from analytics1050 [production]
19:58 <aaron@tin> Synchronized wmf-config/filebackend-production.php: Increase redis lockmanager timeout to 2 (duration: 00m 31s) [production]
19:57 <legoktm@tin> Synchronized php-1.28.0-wmf.8/extensions/MassMessage/: MassMessage is no longer accepting lists in the MassMessageList content model - T139303 (duration: 00m 39s) [production]
17:37 <jynus> testing slave_parallel_threads=5 on db1073 [production]
14:27 <moritzm> rebooting lithium for kernel update [production]
14:22 <moritzm> installing tomcat7/ libservlet3.0-java security update on the kafka brokers [production]
14:06 <_joe_> shutting down mw1001-1008 for decommissioning [production]
14:03 <gehel> rolling restart of elasticsearch codfw/eqiad for kernel upgrade (T138811) [production]
13:47 <_joe_> stopping jobrunner on mw1011-16 as well, befor decommissioning [production]
13:46 <moritzm> depooling mw1153-mw1160 (trusty image scalers), replaced by mw1291-mw1298 (jessie image scalers) [production]
13:44 <godog> ack all mr1-codfw related alerts in librenms [production]
13:43 <akosiaris> restart smokeping on netmon1001, temporarily disabled msw1-codfw [production]
13:38 <gehel> resuming writes on Cirrus / elasticsearch, this did not speedup cluster recovery [production]
13:18 <godog> bounce redis on rcs1001 [production]
13:16 <gehel> restarting elastic1021 for kernel upgrade (T138811) [production]
13:07 <elukey> Bootstrapping again Cassandra on aqs100[456] (rack awareness + 2.2.6 - testing environment) [production]
13:02 <gehel> pausing writes on Cirrus / elasticsearch for faster cluster restart [production]
12:43 <hashar> Nodepool back up with 10 instances (instead of 20) to accomodate for labs capacity T139285 [production]
12:39 <godog> nodetool-b stop -- COMPACTION on restbase1014 [production]
12:29 <moritzm> rolling reboot of rcs* cluster for kernel security update [production]
12:10 <moritzm> rolling reboot of ocg* cluster for kernel security update [production]
11:40 <jynus@tin> Synchronized wmf-config/db-eqiad.php: Failover db1053 to db1072 (duration: 00m 40s) [production]
10:56 <moritzm> rolling reboot of swift frontends in eqiad for kernel security update [production]
10:30 <yuvipanda> stop nodepool on labnodepool1001 and disable puppet to keep it down, to allow stabilizing labs first [production]
10:28 <yuvipanda> restart rabbitmq-server on labcontrol1001 [production]
10:14 <moritzm> installing chromium security update on osmium [production]
10:07 <moritzm> installing xerces-c security updates on Ubuntu systems (jessie already fixed) [production]
10:01 <_joe_> stopping jobchron and jobrunner on mw1001-10 before decommission [production]
09:50 <godog> reimage ms-be300[234] with jessie [production]
09:44 <hashar> Labs infra cant delete instances anymore (impacts CI as well) T139285 [production]
09:41 <moritzm> installing p7zip security updates [production]
09:38 <hashar> CI is out of Nodepool instances, the pool has drained because instances can no more be deleted over the OpenStack API [production]