2016-01-27
ยง
|
21:19 |
<cscott> |
updated OCG to version 64050af0456a43344b32e3e93561a79207565eaf (should be no-op after yesterday's deploy) |
[releng] |
20:24 |
<chasemp> |
master stop, truncate accounting log to accounting.01272016, master start |
[tools] |
19:48 |
<YuviPanda> |
started nfs-exports daemon on labstore1001, had been dead for a few days |
[production] |
19:34 |
<chasemp> |
master start grid master |
[tools] |
19:31 |
<mutante> |
stat1002 - redis.exceptions.ConnectionError: Error connecting to mira.codfw.wmnet:6379. timed out. |
[production] |
19:31 |
<mutante> |
stat1002 - running puppet, was reported as last run about 4 hours ago but not deactivated |
[production] |
19:23 |
<chasemp> |
stopped master |
[tools] |
19:14 |
<dduvall@mira> |
rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.11 |
[production] |
19:11 |
<YuviPanda> |
depooled tools-webgrid-1405 to prep for restart, lots of stuck processes |
[tools] |
18:49 |
<jynus@mira> |
Synchronized wmf-config/db-eqiad.php: Repool pc1006 after cloning (duration: 02m 25s) |
[production] |
18:48 |
<bd808> |
HHVM on mw1019 still dying on a regular basis with "Lost parent, LightProcess exiting" |
[production] |
18:29 |
<valhallasw`cloud> |
job 2551539 is ifttt, which is also running as 2700629. Killing 2551539 . |
[tools] |
18:26 |
<valhallasw`cloud> |
messages repeatedly reports "01/27/2016 18:26:17|worker|tools-grid-master|E|execd@tools-webgrid-generic-1405.tools.eqiad.wmflabs reports running job (2551539.1/master) in queue "webgrid-generic@tools-webgrid-generic-1405.tools.eqiad.wmflabs" that was not supposed to be there - killing". SSH'ing there to investigate |
[tools] |
18:24 |
<valhallasw`cloud> |
'sleep' test job also seems to work without issues |
[tools] |
18:23 |
<valhallasw`cloud> |
no errors in log file, qstat works |
[tools] |
18:23 |
<chasemp> |
master sge restarted post dump and restart for jobs db |
[tools] |
18:22 |
<valhallasw`cloud> |
messages file reports 'Wed Jan 27 18:21:39 UTC 2016 db_load_sge_maint_pre_jobs_dump_01272016' |
[tools] |
18:20 |
<chasemp> |
master db_load -f /root/sge_maint_pre_jobs_dump_01272016 sge_job |
[tools] |
18:19 |
<valhallasw`cloud> |
dumped jobs database to /root/sge_maint_pre_jobs_dump_01272016, 4.6M |
[tools] |
18:17 |
<valhallasw`cloud> |
SGE Configuration successfully saved to /root/sge_maint_01272016 directory. |
[tools] |
18:14 |
<chasemp> |
grid master stopped |
[tools] |
18:00 |
<csteipp> |
deploy patch for T103239 |
[production] |
17:50 |
<csteipp> |
deploy patch for T97157 |
[production] |
17:46 |
<jynus> |
migrating ruthenium parsoid-test database to m5-master |
[production] |
17:27 |
<elukey> |
rebooting analytics105* hosts to upgrade their kernel |
[production] |
17:16 |
<elukey> |
rebooting analytics1035.eqiad.wmnet for kernel upgrade |
[production] |
16:22 |
<thcipriani@mira> |
Synchronized php-1.27.0-wmf.11/extensions/CentralAuth/includes/CentralAuthUtils.php: SWAT: Preserve certain keys when updating central session [[gerrit:266672]] (duration: 02m 28s) |
[production] |
16:11 |
<thcipriani@mira> |
Synchronized php-1.27.0-wmf.11/extensions/CentralAuth/includes/session/CentralAuthSessionProvider.php: SWAT: Avoid forceHTTPS cookie flapping if core and CA are setting the same cookie [[gerrit:266671]] (duration: 02m 26s) |
[production] |
16:03 |
<elukey> |
rebooting analytics 1043 -> 1050 for kernel upgrade. |
[production] |
15:47 |
<elukey> |
rebooting analytics 1026, 1040 -> 1042 due to kernel upgrade. |
[production] |
14:58 |
<jynus> |
cloning persercache contents from pc1003 to pc1006 |
[production] |
14:45 |
<elukey> |
rebooting analytics 1036 to 1039 for kernel upgrade |
[production] |
14:35 |
<elukey> |
analytics 1035 hasn't been rebooted because it is a Hadoop Journal Node (will be restarted in the end) |
[production] |
14:04 |
<elukey> |
rebooting analytics 1032 to 1035 for kernel upgrades |
[production] |
14:03 |
<jynus@mira> |
Synchronized wmf-config/db-eqiad.php: Depool pc1003 for cloning to pc1006 (duration: 02m 30s) |
[production] |
13:59 |
<jynus> |
about to going new hardware/OS/mariadb-only for parsercache service |
[production] |
13:32 |
<elukey> |
rebooting analytics1030/1031 for kernel upgrade |
[production] |
13:15 |
<akosiaris> |
rebooting fermium for kernel upgrades |
[production] |
13:10 |
<elukey> |
rebooting analytics1029 for kernel upgrade |
[production] |
12:29 |
<moritzm> |
rebooting analytics1028 for kernel update |
[production] |
10:29 |
<hashar> |
triggered bunch of browser tests, deployment-redis01 was dead/faulty |
[releng] |
10:25 |
<ema> |
restarting apache2 and hhvm on mw1119 |
[production] |
10:08 |
<hashar> |
mass restarting redis-server process on deployment-redis01 (for https://phabricator.wikimedia.org/T124677 ) |
[releng] |
10:07 |
<hashar> |
mass restarting redis-server process on deployment-redis01 |
[releng] |
09:00 |
<hashar> |
beta: commenting out "latency-monitor-threshold 100" parameter from any /etc/redis/redis.conf we have ( https://phabricator.wikimedia.org/T124677 ). Puppet will not reapply it unless distribution is Jessie |
[releng] |
03:19 |
<ebernhardson@mira> |
Synchronized wmf-config/CirrusSearch-production.php: Correct invalid cirrus shard configuration (duration: 02m 59s) |
[production] |
02:55 |
<l10nupdate@tin> |
ResourceLoader cache refresh completed at Wed Jan 27 02:55:21 UTC 2016 (duration 7m 13s) |
[production] |
02:48 |
<mwdeploy@tin> |
sync-l10n completed (1.27.0-wmf.11) (duration: 10m 25s) |
[production] |
02:23 |
<mwdeploy@tin> |
sync-l10n completed (1.27.0-wmf.10) (duration: 09m 51s) |
[production] |
01:59 |
<ori@mira> |
Synchronized docroot and w: Icc4f6134b0: Add a speed experiment which inlines the top stylesheet (duration: 02m 28s) |
[production] |