2018-03-06
ยง
|
14:07 |
<gehel> |
rebooting maps1* (eqiad) for kernel security update |
[production] |
14:07 |
<moritzm> |
rebooting pybal-test for kernel security update |
[production] |
14:00 |
<_joe_> |
SWAT is suspended for investigation on tin's git status |
[production] |
14:00 |
<moritzm> |
rebooting oxygen for kernel security update |
[production] |
13:21 |
<arturo> |
T188994 in some servers there was some race in the dpkg lock between apt-upgrade and puppet. Also, I forgot to use DEBIAN_FRONTEND=noninteractive, so debconf prompts happened and stalled dpkg operations. Already solved, but some puppet alerts were produced |
[tools] |
13:16 |
<moritzm> |
powercycling ms-be1038, stuck after reboot |
[production] |
13:10 |
<marostegui> |
Deploy schema change on db1094 - T187089 T185128 T153182 |
[production] |
13:09 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Depool db1094 for alter table (duration: 00m 58s) |
[production] |
12:58 |
<arturo> |
T188994 upgrading packages in jessie nodes from the oldstable source |
[tools] |
12:55 |
<moritzm> |
rebooting URL downloaders for kernel security update |
[production] |
12:51 |
<mobrovac@tin> |
Finished deploy [cpjobqueue/deploy@9b0b947]: refreshLinks: Increase concurrency to 100 - T185052 (duration: 00m 34s) |
[production] |
12:50 |
<mobrovac@tin> |
Started deploy [cpjobqueue/deploy@9b0b947]: refreshLinks: Increase concurrency to 100 - T185052 |
[production] |
12:43 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Repool db1086 after alter table (duration: 00m 58s) |
[production] |
12:33 |
<moritzm> |
rebooting mwlog* for kernel security update |
[production] |
12:04 |
<moritzm> |
rebooting graphite hosts in eqiad for kernel security update |
[production] |
11:42 |
<arturo> |
clush -w @all "sudo DEBIAN_FRONTEND=noninteractive apt-get autoclean" <-- free space in filesystem |
[tools] |
11:41 |
<arturo> |
aborrero@tools-clushmaster-01:~$ clush -w @all "sudo DEBIAN_FRONTEND=noninteractive apt-get autoremove -y" <-- we did in canary servers last week and it went fine. So run in fleet-wide |
[tools] |
11:36 |
<arturo> |
(ubuntu) removed linux-image-3.13.0-142-generic and linux-image-3.13.0-137-generic (T188911) |
[tools] |
11:33 |
<arturo> |
removing unused kernel packages in ubuntu nodes |
[tools] |
11:29 |
<moritzm> |
rebooting k8s masters for kernel security update |
[production] |
11:08 |
<arturo> |
aborrero@tools-clushmaster-01:~$ clush -w @all "sudo rm /etc/apt/preferences.d/* ; sudo puppet agent -t -v" <--- rebuild directory, it contains stale files across all the cluster |
[tools] |
11:05 |
<elukey> |
reboot analytics10[28,35,52] for kernel updates (one at the time, hadoop hdfs journal nodes) |
[production] |
10:46 |
<moritzm> |
powercycling ms-be1021, stuck after reboot |
[production] |
10:45 |
<akosiaris@tin> |
Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 01m 22s) |
[production] |
10:43 |
<moritzm> |
rearming keyholder on naos after reboot |
[production] |
10:39 |
<akosiaris> |
emergency add a captcha in metawiki contact pages like https://meta.wikimedia.org/wiki/Special:Contact/Stewards to stop bot abuse. phab Task to be filed later on |
[production] |
10:39 |
<godog> |
reboot ms-be1013 to try fix disk ordering |
[production] |
10:35 |
<moritzm> |
rebooting naos for kernel security update |
[production] |
10:32 |
<moritzm> |
rearming keyholder on tin after reboot |
[production] |
10:30 |
<gehel> |
kafka poller active on all production wdqs nodes - T188252 |
[production] |
10:28 |
<moritzm> |
rebooting tin for kernel security update |
[production] |
10:20 |
<gehel> |
reboot completed for maps2* and maps-test* |
[production] |
10:19 |
<elukey> |
restart webrequest-load-wf-upload-2018-3-6-7 (failed due to reboots) |
[analytics] |
10:08 |
<elukey> |
re-starting mysql consumers on eventlog1001 |
[analytics] |
09:51 |
<moritzm> |
rebooting graphite hosts in codfw for kernel security update |
[production] |
09:42 |
<marostegui> |
Stop MySQL on db1107 for mariadb and kernel upgrade |
[production] |
09:41 |
<vgutierrez> |
pybal_1.15.0_all.deb to apt.wikimedia.org jessie-wikimedia |
[production] |
09:41 |
<elukey> |
stop eventlogging's mysql consumers for db1107 (el master) kernel updates |
[analytics] |
09:40 |
<marostegui> |
Start proxysql on wasat |
[production] |
09:38 |
<moritzm> |
rebooting wezen for kernel security update |
[production] |
09:27 |
<elukey> |
reboot kafka2001 (eventbus codfw) for kernel updates |
[production] |
09:24 |
<marostegui> |
Deploy schema change on db1086 - T187089 T185128 T153182 |
[production] |
09:18 |
<marostegui> |
Stop and reboot db1086 for kernel and mariadb upgrade |
[production] |
09:17 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Depool db1086 for alter table (duration: 00m 57s) |
[production] |
09:17 |
<moritzm> |
rebooting swift backend servers in eqiad for kernel security update |
[production] |
09:17 |
<moritzm> |
rebooting wwift backend servers in eqiad for kernel security update |
[production] |
09:13 |
<marostegui@tin> |
Synchronized wmf-config/db-eqiad.php: Repool db1101:3317 after alter table (duration: 00m 57s) |
[production] |
09:05 |
<gehel> |
rolling restart of maps* for kernel upgrade |
[production] |
08:50 |
<elukey> |
reboot meitnerium (archiva) for kernel updates |
[production] |
08:38 |
<paravoid> |
rebooting furud |
[production] |