2016-01-27
§
|
23:49 |
<jzerebecki> |
integration-slave-precise-1011:~$ sudo -i /etc/init.d/salt-minion restart |
[releng] |
23:46 |
<jzerebecki> |
work around https://phabricator.wikimedia.org/T117710 : salt --show-timeout '*slave*' cmd.run 'rm -rf /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm/src/skins/BlueSky' |
[releng] |
23:07 |
<YuviPanda> |
removed all members of templatetiger, added self instead, removed active shell sessions |
[tools] |
22:36 |
<robh> |
restarting parsoid-rt-client service on ruthenium |
[production] |
22:29 |
<ottomata> |
starting mysqldump of MobileWebSectionUsage_14321266 from db1047 into m4-master |
[production] |
22:29 |
<ottomata> |
starting mysqldump of MobileWebSectionUsage_14321266 from db1047 into m4-master |
[analytics] |
21:45 |
<yurik> |
updated graphoid on scb* |
[production] |
21:29 |
<mdholloway> |
mobileapps deployed 6f35859 |
[production] |
21:26 |
<cscott> |
updated OCG to version 64050af0456a43344b32e3e93561a79207565eaf |
[production] |
21:26 |
<ori@mira> |
Synchronized docroot and w: (no message) (duration: 02m 26s) |
[production] |
21:19 |
<cscott> |
updated OCG to version 64050af0456a43344b32e3e93561a79207565eaf (should be no-op after yesterday's deploy) |
[releng] |
20:24 |
<chasemp> |
master stop, truncate accounting log to accounting.01272016, master start |
[tools] |
19:48 |
<YuviPanda> |
started nfs-exports daemon on labstore1001, had been dead for a few days |
[production] |
19:34 |
<chasemp> |
master start grid master |
[tools] |
19:31 |
<mutante> |
stat1002 - redis.exceptions.ConnectionError: Error connecting to mira.codfw.wmnet:6379. timed out. |
[production] |
19:31 |
<mutante> |
stat1002 - running puppet, was reported as last run about 4 hours ago but not deactivated |
[production] |
19:23 |
<chasemp> |
stopped master |
[tools] |
19:14 |
<dduvall@mira> |
rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.27.0-wmf.11 |
[production] |
19:11 |
<YuviPanda> |
depooled tools-webgrid-1405 to prep for restart, lots of stuck processes |
[tools] |
18:49 |
<jynus@mira> |
Synchronized wmf-config/db-eqiad.php: Repool pc1006 after cloning (duration: 02m 25s) |
[production] |
18:48 |
<bd808> |
HHVM on mw1019 still dying on a regular basis with "Lost parent, LightProcess exiting" |
[production] |
18:29 |
<valhallasw`cloud> |
job 2551539 is ifttt, which is also running as 2700629. Killing 2551539 . |
[tools] |
18:26 |
<valhallasw`cloud> |
messages repeatedly reports "01/27/2016 18:26:17|worker|tools-grid-master|E|execd@tools-webgrid-generic-1405.tools.eqiad.wmflabs reports running job (2551539.1/master) in queue "webgrid-generic@tools-webgrid-generic-1405.tools.eqiad.wmflabs" that was not supposed to be there - killing". SSH'ing there to investigate |
[tools] |
18:24 |
<valhallasw`cloud> |
'sleep' test job also seems to work without issues |
[tools] |
18:23 |
<valhallasw`cloud> |
no errors in log file, qstat works |
[tools] |
18:23 |
<chasemp> |
master sge restarted post dump and restart for jobs db |
[tools] |
18:22 |
<valhallasw`cloud> |
messages file reports 'Wed Jan 27 18:21:39 UTC 2016 db_load_sge_maint_pre_jobs_dump_01272016' |
[tools] |
18:20 |
<chasemp> |
master db_load -f /root/sge_maint_pre_jobs_dump_01272016 sge_job |
[tools] |
18:19 |
<valhallasw`cloud> |
dumped jobs database to /root/sge_maint_pre_jobs_dump_01272016, 4.6M |
[tools] |
18:17 |
<valhallasw`cloud> |
SGE Configuration successfully saved to /root/sge_maint_01272016 directory. |
[tools] |
18:14 |
<chasemp> |
grid master stopped |
[tools] |
18:00 |
<csteipp> |
deploy patch for T103239 |
[production] |
17:50 |
<csteipp> |
deploy patch for T97157 |
[production] |
17:46 |
<jynus> |
migrating ruthenium parsoid-test database to m5-master |
[production] |
17:27 |
<elukey> |
rebooting analytics105* hosts to upgrade their kernel |
[production] |
17:16 |
<elukey> |
rebooting analytics1035.eqiad.wmnet for kernel upgrade |
[production] |
16:22 |
<thcipriani@mira> |
Synchronized php-1.27.0-wmf.11/extensions/CentralAuth/includes/CentralAuthUtils.php: SWAT: Preserve certain keys when updating central session [[gerrit:266672]] (duration: 02m 28s) |
[production] |
16:11 |
<thcipriani@mira> |
Synchronized php-1.27.0-wmf.11/extensions/CentralAuth/includes/session/CentralAuthSessionProvider.php: SWAT: Avoid forceHTTPS cookie flapping if core and CA are setting the same cookie [[gerrit:266671]] (duration: 02m 26s) |
[production] |
16:03 |
<elukey> |
rebooting analytics 1043 -> 1050 for kernel upgrade. |
[production] |
15:47 |
<elukey> |
rebooting analytics 1026, 1040 -> 1042 due to kernel upgrade. |
[production] |
14:58 |
<jynus> |
cloning persercache contents from pc1003 to pc1006 |
[production] |
14:45 |
<elukey> |
rebooting analytics 1036 to 1039 for kernel upgrade |
[production] |
14:35 |
<elukey> |
analytics 1035 hasn't been rebooted because it is a Hadoop Journal Node (will be restarted in the end) |
[production] |