2018-04-02
§
|
08:50 |
<marostegui> |
Deploy schema change on s3 codfw master db2043 (this will generate lag on codfw) - T187089 T185128 T153182 |
[production] |
08:33 |
<joal> |
rerun wikidata-specialentitydata_metrics-wf-2018-4-1 |
[analytics] |
08:21 |
<jynus> |
stop mariadb at labsdb1009 and labsdb1010 |
[production] |
08:15 |
<marostegui@tin> |
Synchronized wmf-config/db-codfw.php: Specify current m5 codfw master (duration: 01m 17s) |
[production] |
08:11 |
<jynus> |
depool labsdb1011 from web wikirreplicas |
[production] |
07:21 |
<apergos> |
restarted pdfrender on scb1004 after poking around there a bit |
[production] |
07:01 |
<apergos> |
restarted pdfrender on scb1001,2, service paged and no jobs were being processed |
[production] |
06:06 |
<marostegui> |
Drop localisation table from the hosts where it still existed - T119811 |
[production] |
02:50 |
<l10nupdate@tin> |
scap sync-l10n completed (1.31.0-wmf.26) (duration: 12m 53s) |
[production] |
2018-03-31
§
|
21:42 |
<Hauskatze> |
Ran sudo puppet agent --enable and sudo puppet agent -tv on deployment-maps03 to fix puppet staleness |
[releng] |
21:15 |
<mutante> |
bast1001 has been shutdown and decom'ed as planned. if you have any issues with shell access make sure you have replaced with bast1002 or any other bast host |
[production] |
11:26 |
<urandom> |
removing corrupt commitlog segment, restbase1009-c |
[production] |
11:25 |
<urandom> |
removing corrupt commitlog segment, restbase1009-b |
[production] |
11:19 |
<urandom> |
starting restbase1009-c |
[production] |
11:18 |
<urandom> |
truncating hints, restbase1009-a |
[production] |
11:14 |
<urandom> |
restarting restbase1009-b |
[production] |
11:13 |
<urandom> |
stopping restbase1009-a (high hints storage) |
[production] |
2018-03-30
§
|
22:40 |
<zhuyifei1999_> |
copied over many prefix puppet configuration in horizon from toolforge T190893 |
[toolsbeta] |
14:16 |
<akosiaris> |
T189076 upload apertium-fra-cat to apt.wikimedia.org/jessie-wikimedia/main |
[production] |
13:48 |
<elukey> |
restart overlord+middlemanager on druid100[23] to avoid consistency issues |
[analytics] |
13:41 |
<elukey> |
restart overlord+middlemanager on druid1001 after failures in real time indexing (overlord leader) |
[analytics] |
12:47 |
<akosiaris> |
T189076 upload apertium-cat to apt.wikimedia.org/jessie-wikimedia/main |
[production] |
12:47 |
<akosiaris> |
T189075 upload apertium-lex-tools to apt.wikimedia.org/jessie-wikimedia/main |
[production] |
12:47 |
<akosiaris> |
T189075 upload apertium-separable to apt.wikimedia.org/jessie-wikimedia/main |
[production] |
12:47 |
<akosiaris> |
T189076 upload apertium-fra to apt.wikimedia.org/jessie-wikimedia/main |
[production] |
11:44 |
<dcausse> |
running forceSearchIndex from terbium to cleanup elastic indices for (testwiki, mediawikiwiki, labswiki, labtestwiki, svwiki) (T189694) |
[production] |
11:40 |
<dcausse> |
elastic@codfw cluster restarts complete (T189239) |
[production] |
11:38 |
<dcausse> |
deployment-prep reindexing with forceSearchIndex all beta wikis (T189694) |
[releng] |
10:55 |
<dcausse> |
resuming elastic@codfw cluster restarts |
[production] |
10:17 |
<elukey> |
roll restart of zookeeper daemons on druid100[123] (Druid analytics cluster) to pick up the new prometheus jmx agent |
[production] |
09:56 |
<hashar> |
Nuking /srv/zuul/git/labs/tools/stewardbots on zuul-merger hosts (contint1001 and contint2001). Fetch fails with org.eclipse.jgit.transport.UploadPackInternalServerErrorException | T191077 |
[releng] |
09:56 |
<hashar> |
Nuking /srv/zuul/git/labs/tools/stewardbots on zuul-merger hosts (contint1001 and contint2001). Fetch fails with org.eclipse.jgit.transport.UploadPackInternalServerErrorException | TT191077 |
[releng] |
09:44 |
<elukey> |
re-enable camus |
[analytics] |
09:31 |
<elukey> |
restart oozie/hive daemons on an1003 for openjdk-8 upgrades |
[production] |
08:38 |
<elukey> |
rolling restart of hadoop-hdfs-datanode on all the hadoop worker nodes after https://gerrit.wikimedia.org/r/423000 |
[production] |
08:26 |
<elukey> |
stopped camus to drain the cluster - prep for easy restart of analytics1003's jvm daemons |
[analytics] |
07:39 |
<elukey> |
rolling restart of yarn-hadoop-nodemanagers on all the hadoop worker nodes after https://gerrit.wikimedia.org/r/423000 |
[production] |
2018-03-29
§
|
23:47 |
<ebernhardson@tin> |
Synchronized wmf-config/InitialiseSettings.php: SWAT: T189252: Enable perf oversampling for remaining countries in Asia (duration: 01m 16s) |
[production] |
23:40 |
<ebernhardson@tin> |
Synchronized php-1.31.0-wmf.27/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.searchSatisfaction.js: SWAT: T187148: Start cirrus AB test (duration: 01m 16s) |
[production] |
23:37 |
<ebernhardson@tin> |
Synchronized php-1.31.0-wmf.26/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.searchSatisfaction.js: SWAT: T187148: Start cirrus AB test (duration: 01m 16s) |
[production] |
23:12 |
<ebernhardson@tin> |
Synchronized wmf-config/InitialiseSettings.php: SWAT: T187148: Configure 5 buckets for cirrus AB test (duration: 01m 17s) |
[production] |
22:10 |
<andrew@tin> |
Finished deploy [horizon/deploy@14d3e7d]: Updating Horizon with possible fix for T189706 (duration: 03m 16s) |
[production] |
22:06 |
<andrew@tin> |
Started deploy [horizon/deploy@14d3e7d]: Updating Horizon with possible fix for T189706 |
[production] |
20:55 |
<milimetric> |
accidentally killed mediawiki-geowiki-monthly-coord, and then restarted it |
[analytics] |
20:12 |
<ottomata> |
blacklisted mediawiki.job topics from main -> jumbo MirrorMaker again, don't want to page over the weekend while this still is not stable. T189464 |
[analytics] |
20:09 |
<chicocvenancio> |
killed interactive processes in tools-bastion-03 |
[tools] |
20:07 |
<robh> |
shuttdown cp2022 for hw testing |
[production] |
19:56 |
<chicocvenancio> |
several interactive jobs running in bastion-03. I am writing to connected users and will kill the jobs once done |
[tools] |
19:39 |
<Amir1> |
ladsgroup@deployment-tin:~$ mwscript updateCollation.php --wiki=fawiki (T190965) |
[releng] |