2016-03-07
§
|
14:41 |
<jynus> |
powercycling mw1033 (unresponsive) |
[production] |
14:39 |
<jynus@tin> |
Synchronized wmf-config/db-codfw.php: Depool db2038, db2039, db2040 (duration: 02m 59s) |
[production] |
13:42 |
<jynus> |
performing schema change on db2038 (s5: T120513), lag on that server expected |
[production] |
12:52 |
<godog> |
repool ms-fe1003 |
[production] |
12:13 |
<godog> |
depool ms-fe1003 for trusty upgrade T125024 |
[production] |
10:51 |
<moritzm> |
reimaging iron to jessie |
[production] |
10:24 |
<godog> |
disable puppet on graphite1001 / graphite2001 / labmon1001 before merging https://gerrit.wikimedia.org/r/#/c/274716 |
[production] |
10:04 |
<moritzm> |
uploaded kernel-wedge 2.93+wmf1 for jessie-wikimedia to carbon (needed to build modern kernels) |
[production] |
08:41 |
<_joe_> |
disabled puppet on mw1026-69, cleaning up puppet facts and certs, then shutting them down |
[production] |
02:32 |
<mwdeploy@tin> |
sync-l10n completed (1.27.0-wmf.15) (duration: 14m 27s) |
[production] |
2016-03-04
§
|
22:41 |
<gwicke> |
restbase1005: `nodetool stop -- CLEANUP; nodetool stop -- COMPACTION` |
[production] |
22:29 |
<matt_flaschen> |
Ran P2709 against DB manually to work around T127693 |
[production] |
21:58 |
<mutante> |
bast2001 if your ssh client shows the fingerprint as base64 SHA256, the new default, you can ssh -o FingerprintHash=md5 bast2001.wikimedia.org to compare |
[production] |
21:29 |
<mutante> |
bast2001 - reinstalled with jessie, fingerprints on https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/bast2001.wikimedia.org |
[production] |
21:17 |
<mutante> |
bast2001 - revoke and sign new puppet cert / salt keys |
[production] |
21:01 |
<mutante> |
bast2001 - rebooting into PXE for T128899 |
[production] |
20:00 |
<volans> |
Added logging to post-merge hook on palladium T128895 |
[production] |
17:28 |
<ebernhardson@tin> |
Synchronized wmf-config/CirrusSearch-labs.php: prod nop, enables https in beta cluster for elasticsearch connections (duration: 00m 33s) |
[production] |
17:25 |
<jynus> |
chgrp recursive on tin to wikidev on .git/objects |
[production] |
16:25 |
<jynus> |
changing in a hot way db1047 replication filters |
[production] |
15:58 |
<ottomata> |
puppet disabled on stat1003 for reportupdater deployment, paused until dan is out of meetings |
[production] |
14:56 |
<cmjohnson1> |
rebooting iron to fix virtual console problem |
[production] |
14:34 |
<jynus> |
upgrade and restart dbstore2002 to apply new replication filters |
[production] |
14:28 |
<apergos> |
all services back in operation from dataset1001 |
[production] |
14:20 |
<apergos> |
web service restored for dumps/download.wikimedia.org |
[production] |
14:16 |
<moritzm> |
installing perl security updates |
[production] |
14:04 |
<moritzm> |
installing postgres security updates on labsdb1004 |
[production] |
14:02 |
<bblack> |
puppet back online for all caches (ipsec changes complete) |
[production] |
13:48 |
<moritzm> |
installing pillow security updates |
[production] |
13:41 |
<bblack> |
disabling puppet on esams,ulsfo,codfw caches for ipsec changes, to minimize alertspam... |
[production] |
13:39 |
<urandom> |
canceling doomed bootstrap on restbase1009-a.eqiad.wmnet |
[production] |
13:31 |
<apergos> |
dumps/download wikimedia.org service interrupted now while server is being upgraded |
[production] |
13:03 |
<apergos> |
nfs filesystem from dataset1001 now unavailable as we prep for upgrade |
[production] |
11:19 |
<jynus> |
deploying new replication check algorithm cross-fleet |
[production] |
10:30 |
<volans> |
Start copying data from es200[124] to es201[123] (ETA ~16-17h) T127330 |
[production] |
10:07 |
<volans@tin> |
Synchronized wmf-config/db-codfw.php: Update codfw external storage servers topology T127330 (duration: 00m 39s) |
[production] |
09:10 |
<moritzm> |
re-imaging iron with jessie |
[production] |
08:41 |
<jynus> |
downtiming all mysql replicas lag for 2 hours to test new alert check |
[production] |
06:24 |
<mutante> |
gerrit being restarted for config change 274741 |
[production] |
03:34 |
<urandom> |
Starting `nodetool cleanup' on restbase100{1,2,7-a,7-b}.eqiad.wmnet and restbase1010-a : T95253 |
[production] |
03:28 |
<urandom> |
starting decomission of restbase1009.eqiad.wmnet : T95253 |
[production] |