2016-02-18
ยง
|
18:04 |
<mobrovac> |
restbase deploy end of a42976cc82 |
[production] |
18:03 |
<twentyafterfour> |
applied a hotfix from https://secure.phabricator.com/D15306 on iridium to test a fix for https://phabricator.wikimedia.org/T127290 |
[production] |
18:00 |
<godog> |
reenable puppet on restbase1008 |
[production] |
17:49 |
<mobrovac> |
restbase deploy start of a42976cc82 |
[production] |
17:47 |
<elukey> |
manual failover of hadoop master node (analytics1001) to secondary (analytics1002) for maintenance (plus service restarts) |
[production] |
17:41 |
<urandom> |
upgrading Cassandra to 2.1.13 on cerium.eqiad.wmnet (restbase staging) T126629 |
[production] |
17:28 |
<mobrovac> |
restbase deploying a42976cc82 to restbase1002 |
[production] |
17:27 |
<urandom> |
Cassandra on xenon.eqiad.wmnet killed by kernel after Cassandra package upgrade (coincidence?): [1482254.046078] Out of memory: Kill process 21854 (java) score 595 or sacrifice child : T126629 |
[production] |
17:26 |
<urandom> |
Cassandra on xenon.eqiad.wmnet killed by kernel after Cassandra package upgrade (coincidence): [1482254.046078] Out of memory: Kill process 21854 (java) score 595 or sacrifice child |
[production] |
17:21 |
<urandom> |
upgrading Cassandra to 2.1.13 on xenon.eqiad.wmnet (restbase staging) T126629 |
[production] |
17:20 |
<elukey> |
disabled puppet on analytics1027 to avoid any Camus job to run |
[production] |
17:04 |
<dcausse> |
updating completion suggester indices in eqiad |
[production] |
16:54 |
<elukey> |
restarting hadoop services on analytics105* nodes for security updates |
[production] |
16:49 |
<gehel> |
removing cirrus maintenance crons from mw1152 (T127322) |
[production] |
15:52 |
<dcausse> |
creating adywiki indices in codfw |
[production] |
15:44 |
<elukey> |
restarting hadoop services on analytics104* nodes for security updates |
[production] |
15:37 |
<elukey> |
restarting hadoop services on analytics102* nodes for security update |
[production] |
15:33 |
<moritzm> |
restarting apache on silver/wikitech |
[production] |
15:10 |
<elukey> |
restarting hadoop services on analytics103* hosts for security upgrades |
[production] |
14:06 |
<bblack> |
restarting apache on gallium (integration) |
[production] |
13:13 |
<mark> |
decreased raid md2 sync_speed_max to 6000 on restbase1008 |
[production] |
12:55 |
<elukey> |
rebooted kafka1022.eqiad.wmnet for kernel upgrade |
[production] |
12:51 |
<godog> |
decrease raid min_speed to 8000 on restbase1008 |
[production] |
12:50 |
<hoo@tin> |
Synchronized wmf-config/Wikibase.php: Bump $wgCacheEpoch for Wikidata (duration: 01m 54s) |
[production] |
12:41 |
<elukey> |
rebooted kafka1020 for kernel upgrade. |
[production] |
12:40 |
<godog> |
decrease raid min_speed to 10000 on restbase1008 |
[production] |
12:24 |
<godog> |
increase stripe_cache_size to 32470 on restbase1008 |
[production] |
12:21 |
<godog> |
expand raid0 on restbase1008 to sdd and sde |
[production] |
11:36 |
<paravoid> |
upgrading mr1-ulsfo to its pre-recovery version and rebooting (T127295) |
[production] |
11:34 |
<hashar> |
Hard restarting Jenkins T127294 |
[production] |
11:32 |
<jynus> |
logical import of db1021 starting for data consistency check and defragmenting purposes |
[production] |
11:29 |
<paravoid> |
mr1-ulsfo: "request system snapshot media internal slice alternate" + reboot (T127295) |
[production] |
11:27 |
<hashar> |
Jenkins web UI busy with 'jenkins.model.RunIdMigrator doMigrate' while it migrate build records. I did a bunch of cleanup yesterday. Jenkins runs jobs in the background just fine though. T127294 |
[production] |
11:12 |
<hashar> |
Jenkins: reloading configuration from disk. Some metadata are corrupted T127294 |
[production] |
10:48 |
<elukey> |
rebooted kafka1018 for maintenance |
[production] |
10:17 |
<elukey> |
rebooted kafka1014 for maintenance |
[production] |
10:10 |
<moritzm> |
restarting hhvm on mw1* to put glibc update into effect |
[production] |
09:49 |
<godog> |
remove old restbase metrics under restbase.* from graphite1001 and graphite2001 |
[production] |
03:13 |
<twentyafterfour> |
running puppet one last time on iridium. Phabricator upgrade successful with just a few minor issues now resolved. |
[production] |
03:01 |
<l10nupdate@tin> |
ResourceLoader cache refresh completed at Thu Feb 18 03:01:01 UTC 2016 (duration 9m 24s) |
[production] |
02:51 |
<mwdeploy@tin> |
sync-l10n completed (1.27.0-wmf.14) (duration: 11m 20s) |
[production] |
02:29 |
<mwdeploy@tin> |
sync-l10n completed (1.27.0-wmf.13) (duration: 13m 55s) |
[production] |
02:18 |
<twentyafterfour> |
phabricator is back online, sprint extension is broken, I'm investigating |
[production] |
01:57 |
<mutante> |
powercycled frozen mw1147 |
[production] |
01:51 |
<twentyafterfour> |
phab pre-upgrade: http://pastebin.com/RTmXfDhp |
[production] |
01:49 |
<twentyafterfour> |
about to bring down phabricator to do the upgrade |
[production] |
01:49 |
<twentyafterfour> |
ran puppet on iridium for testing |
[production] |
01:08 |
<twentyafterfour> |
stopped phd and started dumping phabricator's database to /srv/dumps/20160218.phabricator.sql.gz (just in case I need to roll back the update) |
[production] |
00:34 |
<catrope@tin> |
Synchronized php-1.27.0-wmf.13/extensions/Flow: Trying again (duration: 01m 50s) |
[production] |
00:28 |
<RoanKattouw> |
00:28:25 64 apaches had sync errors , /usr/bin/sync-common missing |
[production] |