2018-04-11
§
|
10:50 |
<moritzm> |
installing openssl updates |
[production] |
10:43 |
<marostegui> |
Drop table prefstats in s2 - T154490 |
[production] |
10:33 |
<marostegui> |
Drop table prefstats in s4 - T154490 |
[production] |
10:31 |
<marostegui> |
Drop table prefstats in s6 - T154490 |
[production] |
10:28 |
<marostegui> |
Drop table prefstats in s5 - T154490 |
[production] |
10:04 |
<jynus> |
start reimage of es2015 |
[production] |
10:00 |
<moritzm> |
installing java security updates on kafka/jumbo cluster |
[production] |
09:57 |
<jynus@tin> |
Synchronized wmf-config/db-codfw.php: Repool es2014, depool es2015 (duration: 01m 02s) |
[production] |
09:52 |
<moritzm> |
installing java security updates on kafka/analytics cluster |
[production] |
09:29 |
<arturo> |
doing some testing in labtestvirt2001 mounting instance's qcow2 files into /home/aborrero/mnt |
[production] |
09:17 |
<jynus> |
start reimage of es2014 |
[production] |
09:08 |
<jynus@tin> |
Synchronized wmf-config/db-codfw.php: Depool es2014 (duration: 01m 03s) |
[production] |
09:03 |
<ema> |
restart pybal on lvs1003 for UDP monitoring config changes https://gerrit.wikimedia.org/r/#/c/425251/ |
[production] |
08:59 |
<moritzm> |
reimaging mw1265 to stretch (T174431) |
[production] |
08:18 |
<jynus> |
rerunning eqiad misc backups |
[production] |
08:03 |
<marostegui@tin> |
Synchronized wmf-config/db-codfw.php: Repool db2069 as candidate master for x1 - T191275 (duration: 01m 03s) |
[production] |
07:45 |
<ema> |
cp2022: restart varnish-be due to child process crash https://phabricator.wikimedia.org/P6979 T191229 |
[production] |
07:27 |
<marostegui> |
Stop MySQL on db2033 to copy its data away before reimaging - T191275 |
[production] |
07:08 |
<vgutierrez> |
Reimaging lvs5003.eqsin as stretch (2nd attempt) |
[production] |
06:49 |
<elukey> |
restart Yarn Resource Manager daemons on analytics100[12] to pick up the new Prometheus configuration file |
[production] |
06:20 |
<marostegui> |
Stop MySQL on db2033 to clone db2069 - T191275 |
[production] |
06:17 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Add db2069 to the config as depooled x1 slave - T191275 (duration: 01m 03s) |
[production] |
06:15 |
<marostegui@tin> |
Synchronized wmf-config/db-codfw.php: Add db2069 to the config as depooled x1 slave - T191275 (duration: 01m 01s) |
[production] |
05:28 |
<Krinkle> |
manual coal back-fill still running with the normal coal disabled via systemd. Will restore normal coal when I wake up. |
[production] |
05:22 |
<marostegui> |
Deploy schema change on codfw s8 master (db2045) with replication enabled (this will generate lag on codfw) - T187089 T185128 T153182 |
[production] |
05:17 |
<marostegui> |
Reload haproxy on dbprox1010 to repool labsdb1010 |
[production] |
02:36 |
<l10nupdate@tin> |
scap sync-l10n completed (1.31.0-wmf.28) (duration: 05m 41s) |
[production] |
01:26 |
<zhuyifei1999_> |
undo that. load went down (2.03 -> 2.01). probably not worth it when it's running in a hypervisor T191572 |
[video] |
00:52 |
<zhuyifei1999_> |
set cpu affinity (gfg01$ taskset -p -c 0 29926; taskset -p -c 1 30181) on the main threads of two ffmpeg processes in an attempt to speed it up T191572 |
[video] |
00:12 |
<bstorm_> |
Updated views and indexes on labsdb1011 |
[production] |
00:03 |
<legoktm> |
deploying https://gerrit.wikimedia.org/r/425446 |
[releng] |
2018-04-10
§
|
23:32 |
<XioNoX> |
depolled eqsin due to router issue |
[production] |
23:04 |
<Krinkle> |
Seemingly from 22:53 - 23:03 global traffic dropped by 30-60%, presumably due to issues in eqiad where 10 Gbits dropped to 3 Gbits sharper than ever before. |
[production] |
22:49 |
<joal@tin> |
Finished deploy [analytics/refinery@33448cd]: Deploying fixes after todays deploy errors (duration: 04m 46s) |
[production] |
22:45 |
<joal@tin> |
Started deploy [analytics/refinery@33448cd]: Deploying fixes after todays deploy errors |
[production] |
22:43 |
<joal> |
Deploying refinery with scap |
[analytics] |
22:42 |
<joal> |
Refinery-source 0.0.61 deployed on archiva |
[analytics] |
21:18 |
<sbisson@tin> |
Finished deploy [kartotherian/deploy@8f3a903]: Rollback kartotherian to v0.0.35 (duration: 06m 27s) |
[production] |
21:12 |
<sbisson@tin> |
Started deploy [kartotherian/deploy@8f3a903]: Rollback kartotherian to v0.0.35 |
[production] |
20:43 |
<ottomata> |
bouncing main -> jumbo mirrormakers to blacklist job topics until we have time to investigate more |
[analytics] |
20:41 |
<sbisson@tin> |
Finished deploy [kartotherian/deploy@bdf70ed]: Deploying kartotherian pre-i18n everywhere (downgrade snapshot) (duration: 03m 45s) |
[production] |
20:38 |
<ottomata> |
restarted event* camus and refine cron jobs, puppet is reenabled on analytics1003 |
[analytics] |
20:37 |
<sbisson@tin> |
Started deploy [kartotherian/deploy@bdf70ed]: Deploying kartotherian pre-i18n everywhere (downgrade snapshot) |
[production] |
20:30 |
<mutante> |
deploy1001 - reinstalled with stretch - re-adding to puppet (T175288) |
[production] |
20:30 |
<mutante> |
deploy1001 - reinstalled with jessie - re-adding to puppet (T175288) |
[production] |
20:14 |
<ottomata> |
restart mirrormakers main -> jumbo (AGAIN) |
[analytics] |
20:13 |
<urandom> |
increasing sample change-prop sample rate to 20% (from 10) in dev environment -- T186751 |
[production] |
20:06 |
<thcipriani@tin> |
rebuilt and synchronized wikiversions files: testwiki back to 1.31.0-wmf.28 |
[production] |
20:02 |
<sbisson@tin> |
Finished deploy [kartotherian/deploy@6e4d666]: Deploying kartotherian pre-i18n everywhere (duration: 04m 34s) |
[production] |
19:58 |
<sbisson@tin> |
Started deploy [kartotherian/deploy@6e4d666]: Deploying kartotherian pre-i18n everywhere |
[production] |