2014-06-09
§
|
20:08 |
<subbu> |
deployed Parsoid 9b673587 (deploy sha 7d0097a1) |
[production] |
19:23 |
<ottomata> |
disabling puppet on analytics1012 |
[production] |
19:00 |
<ottomata> |
decomissioning analytics1012 in hadoop cluster, this will become a Kafka broker |
[production] |
17:59 |
<manybubbles> |
elastic1004-1006 upgraded without trouble - cluster is working on filling elatic1006 before moving on to 1007, and the rest |
[production] |
17:04 |
<andrewbogott> |
switching labs to puppet3 |
[production] |
17:03 |
<awight> |
update crm from b38497a9d0ef75fe2b20b03b649ac13a5e3f47a7 to b6815d29de97b80a0ab65db576213a604f0c7cb9 |
[production] |
16:30 |
<manybubbles> |
upgrading elastic1003 - upgrade is going well so far so I'm going to stop watching it as closely and let it be more automated |
[production] |
15:28 |
<manybubbles> |
elastic1001 went well, doing 1002 by hand again |
[production] |
15:17 |
<anomie> |
Synchronized php-1.24wmf8/extensions/Wikidata: SWAT: Wikidata entity suggester bug fixes [[gerrit:138339]] (duration: 00m 16s) |
[production] |
15:12 |
<greg-g> |
mw1151 still "permission denied" during deploys |
[production] |
15:12 |
<anomie> |
Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable TemplateData GUI on Portuguese Wikipedia [[gerrit:137986]] (duration: 00m 14s) |
[production] |
15:09 |
<anomie> |
Synchronized php-1.24wmf7/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWSaveDialog.js: SWAT: VE fix for focus regression [[gerrit:137978]] (duration: 00m 15s) |
[production] |
15:06 |
<andrewbogott> |
beta updating all instances to puppet 3 via a cherry-pick of https://gerrit.wikimedia.org/r/#/c/137898/ on deployment-salt |
[production] |
15:05 |
<anomie> |
Synchronized php-1.24wmf8/extensions/VisualEditor/modules/ve-mw/: SWAT: VE fix for focus regression and alignment issues [[gerrit:137971]] [[gerrit:138122]] (duration: 00m 14s) |
[production] |
15:01 |
<manybubbles> |
successfully synced plugins, upgrading elastic1001 to make sure everything is working ok with it - then we'll run through the others more quickly |
[production] |
14:57 |
<manybubbles> |
syncing elasticsearch plugins for 1.2.1 - any elasticsearch restart from here on out needs to come with 1.2.1 or the node will break. |
[production] |
14:54 |
<manybubbles> |
starting Elasticsearch upgrade with elastic1001 |
[production] |
07:14 |
<springle> |
disabled puppet on analytics1021 to avoid kafka broker restarting with missing mount |
[production] |
05:15 |
<springle> |
xtrabackup clone db1046 to db1020 |
[production] |
04:44 |
<springle> |
umount /dev/sdf on analytics1021, fs in r/o mode, kafka broker not running. no checks yet |
[production] |
03:24 |
<LocalisationUpdate> |
ResourceLoader cache refresh completed at Mon Jun 9 03:23:05 UTC 2014 (duration 23m 4s) |
[production] |
02:29 |
<LocalisationUpdate> |
completed (1.24wmf8) at 2014-06-09 02:28:08+00:00 |
[production] |
02:15 |
<LocalisationUpdate> |
completed (1.24wmf7) at 2014-06-09 02:14:46+00:00 |
[production] |
2014-06-07
§
|
23:48 |
<hoo> |
Fixed four CentralAuth log entries on meta which were logged for WikiSets/0 |
[production] |
21:36 |
<manybubbles> |
that means I turned off puppet and shut down Elasticsearch on elastic1017 - you can expect the cluster to go yellow for half an hour or so while the other nodes take rebuild the redundency that elastic1017 had |
[production] |
21:35 |
<manybubbles> |
after consulting logs - elastic1017 has had high io wait since it was deployed - I'm taking it out of rotation |
[production] |
21:31 |
<manybubbles> |
elastic1017 is sick - thrashing to death on io - restarting Elasticsearch to see if it recovers unthrashed |
[production] |
17:56 |
<godog> |
restarted ES on elastic1017.eqiad.wmnet (at 17:22 UTC) |
[production] |
03:24 |
<LocalisationUpdate> |
ResourceLoader cache refresh completed at Sat Jun 7 03:23:32 UTC 2014 (duration 23m 31s) |
[production] |
02:31 |
<LocalisationUpdate> |
completed (1.24wmf8) at 2014-06-07 02:29:57+00:00 |
[production] |
02:17 |
<LocalisationUpdate> |
completed (1.24wmf7) at 2014-06-07 02:16:30+00:00 |
[production] |
2014-06-06
§
|
23:51 |
<Krinkle> |
Restarted Jenkins, force stopped Zuul, started Zuul, configure Jenkins via web interface (disable Gearman, save, enable German); Seems to be back up now, finally. |
[production] |
22:52 |
<mutante> |
same for rhenium, titanium, bast1001, calcium, carbon, ytterbium, stat1003 |
[production] |
22:43 |
<RoanKattouw> |
Restarting Jenkins didn't help, jobs still aren't making it across from Zuul into Jenkins |
[production] |
22:36 |
<RoanKattouw> |
Restarting stuck Jenkins |
[production] |
22:35 |
<mutante> |
same for holmium, hafnium, silver, netmon1001, magnesium, neon, antimony |
[production] |
22:17 |
<mutante> |
upgraded ssl packages on zirconium |
[production] |
21:57 |
<Krinkle> |
Took Jenkins slave on gallium temporarily offline and back online to resolve possible stagnation |
[production] |
20:56 |
<awight_> |
updated crm from ded541894a70922e098fb3ea48306c8ec0f0f6aa to b38497a9d0ef75fe2b20b03b649ac13a5e3f47a7 |
[production] |
18:25 |
<mwalker> |
updating payments from e823354822c7a35e6c2069d3e72180a45dbc89dc to b4c5cf1bceb70d65eae28cdd0873036dc33c8992 for globalcollect oid hack |
[production] |
14:04 |
<hashar> |
Gerrit back. chase rebooted it :) |
[production] |
13:55 |
<hashar> |
Gerrit having some troubles: error: RPC failed; result=22, HTTP code = 503 (while cloning CirrusSearch ) |
[production] |
12:58 |
<cmjohnson1> |
replacing raid controller db1020 |
[production] |
06:12 |
<Tim> |
on osmium installed nodejs for testing |
[production] |
04:24 |
<LocalisationUpdate> |
ResourceLoader cache refresh completed at Fri Jun 6 04:23:08 UTC 2014 (duration 23m 7s) |
[production] |
03:13 |
<LocalisationUpdate> |
completed (1.24wmf8) at 2014-06-06 03:12:19+00:00 |
[production] |