2017-07-05
§
|
13:18 |
<apergos> |
power cycled dataset1001, crashed, unresponsivle on mgmt console |
[production] |
13:18 |
<zfilipin@tin> |
Synchronized dblists/closed.dblist: SWAT: [[gerrit:361686|Reopen nlwikinews (T168764)]] (duration: 02m 50s) |
[production] |
13:16 |
<elukey> |
reboot conf2001 for kernel updates |
[production] |
13:09 |
<moritzm> |
rebooting restbase1010 for kernel update |
[production] |
12:49 |
<marostegui> |
Force BBU relearn on db1016 - T166344 |
[production] |
12:36 |
<marostegui> |
Move labsdb1010 main general replication thread to a named replication thread called db1095 - T153743 |
[production] |
12:33 |
<marostegui> |
Stop all replication threads on db1095 for maintenance - T153743 |
[production] |
12:32 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Repool db1085 - T153743 (duration: 02m 49s) |
[production] |
12:29 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Repool db1051 - T168661 (duration: 02m 50s) |
[production] |
12:16 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Depool db1051 - T168661 (duration: 02m 51s) |
[production] |
12:11 |
<apergos> |
puppet is currently disabled again on snapshots 1,5,6,7 and on dataset1001; we saw the same nfs issue shortly after reboot, with no dump processes going, as snapshots 5,6,7 had not remounted the filesystem |
[production] |
11:20 |
<moritzm> |
rebooting wtp2* servers for kernel update |
[production] |
11:14 |
<moritzm> |
rebooting restbase1009 for kernel update |
[production] |
10:56 |
<hashar> |
restarting Jenkins for plugin upgrades |
[production] |
10:45 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Repool db1072 - T168661 (duration: 02m 59s) |
[production] |
10:41 |
<marostegui> |
Run redact_sanitarium on s6 databases db1102 - T153743 |
[production] |
10:41 |
<moritzm> |
rebooting wtp1001 for kernel update |
[production] |
10:37 |
<moritzm> |
rebooting restbase1008 for kernel update |
[production] |
10:32 |
<apergos> |
rebooting snapshot hosts to clean up hung nfs client processes |
[production] |
10:30 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Repool db1072 - T168661 (duration: 02m 51s) |
[production] |
10:24 |
<apergos> |
rebooted dataset1001 to unstick nfsd and pick up new kernel, re-enabled puppet |
[production] |
10:14 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Repool db1066 - T168661 (duration: 02m 50s) |
[production] |
10:11 |
<moritzm> |
rebooting restbase1007 for kernel update |
[production] |
10:01 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Depool db1066 - T168661 (duration: 02m 50s) |
[production] |
09:57 |
<marostegui> |
Deploy alter table on s1 eqiad hosts - T168661 |
[production] |
09:48 |
<godog> |
move 'instances' graphite hierarchy out of the way, do not delete yet - T143405 |
[production] |
09:27 |
<marostegui> |
Stop MySQL on db1085 for maintenance - T153743 |
[production] |
09:21 |
<godog> |
upload nginx_1.11.10-1+wmf2 to jessie-wikimedia and nginx_1.11.10-1+wmf2~stretch1 to stretch-wikimedia |
[production] |
09:17 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Depool db1085 - T153743 (duration: 02m 50s) |
[production] |
08:44 |
<apergos> |
puppet disabled and processes accessing dataset1001 exported filesystem shot, on: stat1002,3, snapshot1001,5,6,7, while investigation continues |
[production] |
07:27 |
<moritzm> |
rebooting restbase-dev* for kernel update |
[production] |
07:13 |
<moritzm> |
rebooting notebook* hosts |
[production] |
05:18 |
<marostegui> |
Deploy alter table on s3 master - db1075 - T168661 |
[production] |
05:13 |
<marostegui> |
Deploy alter table on s7 master - db1062 - T168661 |
[production] |
05:08 |
<marostegui> |
Force a relearn on db1046's BBU - T166141 |
[production] |
02:27 |
<l10nupdate@tin> |
scap sync-l10n completed (1.30.0-wmf.7) (duration: 10m 23s) |
[production] |
2017-07-04
§
|
21:40 |
<volans> |
ACK'ed puppet not running on stat100[2-3],snapshot100[1,5-7] due to NFS overloaded on dataset1001 - T169680 |
[production] |
16:54 |
<jynus> |
dropping ukwikimedia from several labsdbhosts |
[production] |
16:10 |
<moritzm> |
rebooting radium for kernel update |
[production] |
15:09 |
<mobrovac@tin> |
Finished deploy [citoid/deploy@9d22567]: Fallback to crossRef (T165105) and use MarcXML (T165105) (duration: 02m 52s) |
[production] |
15:06 |
<mobrovac@tin> |
Started deploy [citoid/deploy@9d22567]: Fallback to crossRef (T165105) and use MarcXML (T165105) |
[production] |
15:02 |
<godog> |
set operations/debs/nginx as hidden and update description |
[production] |
14:57 |
<ema> |
pybal 1.13.7 uploaded to apt.w.o, testing it on pybal-test2001 T82747 T154759 |
[production] |
14:31 |
<godog> |
copy nginx from jessie-wikimedia to stretch-wikimedia |
[production] |
14:15 |
<paravoid> |
reset db2038's iLO |
[production] |
13:06 |
<filippo@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=ms-fe2005.codfw.wmnet |
[production] |
11:47 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Remove comments from db1039 status - T166208 (duration: 02m 50s) |
[production] |
11:25 |
<joal@tin> |
Finished deploy [analytics/refinery@88cbb9e]: Regular weekly deploy (2) - Bug patch (duration: 03m 38s) |
[production] |
11:21 |
<joal@tin> |
Started deploy [analytics/refinery@88cbb9e]: Regular weekly deploy (2) - Bug patch |
[production] |
11:15 |
<elukey> |
powercycle elastic1018, host unreachable |
[production] |