2019-04-04
§
|
11:43 |
<moritzm> |
upgrading HHVM on mwdebug servers in eqiad along with update to hhvm-wikidiff 1.8.1 |
[production] |
11:35 |
<moritzm> |
uploaded nodejs 10.15.2~dfsg-1+wmf1 to the component/node10 component of apt.wikimedia.org/stretch-wikimedia (updated to latest 10.x release and a change to ensure zlib binary compat with NodeSource) (T215562) |
[production] |
11:34 |
<Amir1> |
EU SWAT is done |
[production] |
11:32 |
<ladsgroup@deploy1001> |
Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:500976|Add mediawiki.org to the URL shortener whitelist]] (duration: 00m 58s) |
[production] |
11:28 |
<jbond42> |
rolling security updates for apache on jessie |
[production] |
11:25 |
<ladsgroup@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:498371|Enable ReferencePreviews beta feature on de- and ar-wiki (T218766)]] (duration: 01m 00s) |
[production] |
11:21 |
<arturo> |
T219626 reimaging cloudcontrol2001-dev again |
[production] |
11:08 |
<arturo> |
drop python-psutil from jessie-wikimedia/openstack-mitaka-jessie, related to T219626 |
[production] |
10:56 |
<moritzm> |
uploaded hhvm-wikidiff 1.8.1 to apt.wikimedia.org/stretch-wikimedia (source package is named php-wikdiff2 for legacy reasons) (T203069) |
[production] |
10:21 |
<arturo> |
T219626 reimaging cloudcontrol2001-dev again |
[production] |
10:01 |
<moritzm> |
installing openssl1.0 security updates on stretch-based DB hosts |
[production] |
08:36 |
<moritzm> |
rolling restart of parsoid to pick up OpenSSL security update |
[production] |
08:06 |
<moritzm> |
uploaded Apache 2.4.10-10+deb8u14+wmf1 to apt.wikimedia.org/jessie-wikimedia (latest jessie security update rebased with our local patches) |
[production] |
05:39 |
<marostegui> |
Stop MySQL on db2033 for decommission - T219493 |
[production] |
05:32 |
<marostegui> |
Remove db2033 from tendril and zarcillo - T219493 |
[production] |
05:19 |
<marostegui@deploy1001> |
Synchronized wmf-config/db-eqiad.php: Remove db2033 for decommission T219493 (duration: 00m 59s) |
[production] |
05:18 |
<marostegui@deploy1001> |
Synchronized wmf-config/db-codfw.php: Remove db2033 for decommission T219493 (duration: 00m 59s) |
[production] |
04:58 |
<marostegui> |
Deploy schema change on labswiki for the job table - T219887 |
[production] |
00:40 |
<chaomodus> |
restart pdfrender on scb1003 - T174916 |
[production] |
2019-04-03
§
|
23:51 |
<catrope@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Enable Flow beta feature on zhwikisource (T219588) (duration: 00m 58s) |
[production] |
23:50 |
<catrope@deploy1001> |
Synchronized dblists/flow.dblist: Enable Flow on zhwikisource (T219588) (duration: 00m 57s) |
[production] |
23:38 |
<catrope@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments homepage EventLogging on testwiki (duration: 00m 59s) |
[production] |
23:20 |
<catrope@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Configure GrowthExperiments homepage tutorial pages on cswiki, kowiki, viwiki (dark deploy) (duration: 00m 59s) |
[production] |
23:18 |
<catrope@deploy1001> |
sync-file aborted: (no justification provided) (duration: 00m 00s) |
[production] |
23:14 |
<catrope@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Configure GrowthExperiments homepage on testwiki (duration: 01m 01s) |
[production] |
21:32 |
<elukey> |
start hadoop-hdfs-namenode on an-master1002 after outage due to big job hitting HDFS |
[production] |
20:40 |
<gehel> |
excluding elastic2048 from cluster and depooling - T220038 |
[production] |
20:29 |
<arlolra> |
Updated Parsoid to 0b3bb10 (T219337) |
[production] |
20:20 |
<arlolra@deploy1001> |
Finished deploy [parsoid/deploy@4f740e3]: Updating Parsoid to 0b3bb10 (duration: 05m 44s) |
[production] |
20:14 |
<arlolra@deploy1001> |
Started deploy [parsoid/deploy@4f740e3]: Updating Parsoid to 0b3bb10 |
[production] |
20:09 |
<marxarelli> |
1.33.0-wmf.24 is holding at group0 following rollback. filed T220037. cc: T206678 |
[production] |
19:56 |
<marxarelli> |
log correction group1 reverted to 1.33.0-wmf.23 |
[production] |
19:56 |
<dduvall@deploy1001> |
rebuilt and synchronized wikiversions files: Revert group1 to 1.33.0-wmf.24 |
[production] |
19:55 |
<marxarelli> |
111,185 and counting DBTransactionError for jobrunner.discovery.wmnet |
[production] |
19:53 |
<marxarelli> |
rolling back group1 |
[production] |
19:53 |
<marxarelli> |
massive spike in DBTransactionError ([{exception_id}] {exception_url} Wikimedia\Rdbms\DBTransactionError from line 246 of /srv/mediawiki/php-1.33.0-wmf.24/includes/libs/rdbms/lbfactory/LBFactory.php: RefreshLinksJob::runForTitle: transaction round 'RefreshLinksJob::run' already started.) |
[production] |
19:51 |
<dduvall@deploy1001> |
Synchronized php: group1 wikis to 1.33.0-wmf.24 (duration: 01m 49s) |
[production] |
19:50 |
<marxarelli> |
dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.24 |
[production] |
19:49 |
<dduvall@deploy1001> |
rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.24 |
[production] |
19:34 |
<smalyshev@deploy1001> |
Finished deploy [wdqs/wdqs@50b2af9]: Deploy new Updater for more cache-friendly update startegy (duration: 10m 54s) |
[production] |
19:23 |
<smalyshev@deploy1001> |
Started deploy [wdqs/wdqs@50b2af9]: Deploy new Updater for more cache-friendly update startegy |
[production] |
18:14 |
<thcipriani> |
gerrit back on 2.15.12 |
[production] |
18:12 |
<thcipriani> |
restarting gerrit for 2.15.12 update |
[production] |
18:11 |
<thcipriani@deploy1001> |
Finished deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on cobalt (restart to follow) (duration: 00m 11s) |
[production] |
18:11 |
<thcipriani@deploy1001> |
Started deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on cobalt (restart to follow) |
[production] |
18:09 |
<thcipriani@deploy1001> |
Finished deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on gerrit2001 only (duration: 00m 11s) |
[production] |
18:09 |
<thcipriani@deploy1001> |
Started deploy [gerrit/gerrit@e416edf]: Gerrit to 2.15.12 on gerrit2001 only |
[production] |
17:57 |
<elukey> |
restart hadoop-hdfs-namenode on an-master1001 as precautionary measure after the outage (currently standby) |
[production] |
17:44 |
<herron> |
shortly postponing restarts of eventbus and kafka services for security updates due to unrelated firefighting - repooling kafka1001 |
[production] |
17:19 |
<elukey> |
restart hadoop-hdfs-namenode on an-master1002 after forced shutdown due to errors |
[production] |