1901-1950 of 10000 results (42ms)
2018-03-06 ยง
15:03 <chasemp> rebooted tools-worker 1001-1008 [tools]
15:02 <hashar@tin> Synchronized wmf-config/InitialiseSettings.php: Article counts: Change 'comma' method to 'any' - T188472 (duration: 01m 00s) [production]
14:58 <arturo> drain and reboot tools-worker-1010 [tools]
14:50 <vgutierrez> update pybal to 1.15.0 on lvs1010 [production]
14:46 <hashar> tin: /srv/mediawiki-staging/php-1.31.0-wmf.23 rebased on tip of https://gerrit.wikimedia.org/r/#/c/416686/ (that revert a merge of master branch) [production]
14:42 <gehel> rebooting maps1* (eqiad) for kernel security update completed [production]
14:37 <chasemp> @tools-bastion-03:~$ webservice restart --backend=kubernetes [tools.replag]
14:36 <ottomata> beginning migration of webrequest text varnishkafka logs from Kafka analytics to Kafka jumbo-eqiad T185136 [production]
14:27 <chasemp> multiple tools running on k8s workers report issues reading replica.my.cnf file atm [tools]
14:27 <chasemp> reboot tools-worker-100[12] [tools]
14:23 <chasemp> downtime icinga alert for k8s workers ready [tools]
14:21 <moritzm> rebooting labweb* for kernel security update [production]
14:13 <moritzm> rebooting sca* for kernel security update [production]
14:07 <gehel> rebooting maps1* (eqiad) for kernel security update [production]
14:07 <moritzm> rebooting pybal-test for kernel security update [production]
14:00 <_joe_> SWAT is suspended for investigation on tin's git status [production]
14:00 <moritzm> rebooting oxygen for kernel security update [production]
13:21 <arturo> T188994 in some servers there was some race in the dpkg lock between apt-upgrade and puppet. Also, I forgot to use DEBIAN_FRONTEND=noninteractive, so debconf prompts happened and stalled dpkg operations. Already solved, but some puppet alerts were produced [tools]
13:16 <moritzm> powercycling ms-be1038, stuck after reboot [production]
13:10 <marostegui> Deploy schema change on db1094 - T187089 T185128 T153182 [production]
13:09 <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1094 for alter table (duration: 00m 58s) [production]
12:58 <arturo> T188994 upgrading packages in jessie nodes from the oldstable source [tools]
12:55 <moritzm> rebooting URL downloaders for kernel security update [production]
12:51 <mobrovac@tin> Finished deploy [cpjobqueue/deploy@9b0b947]: refreshLinks: Increase concurrency to 100 - T185052 (duration: 00m 34s) [production]
12:50 <mobrovac@tin> Started deploy [cpjobqueue/deploy@9b0b947]: refreshLinks: Increase concurrency to 100 - T185052 [production]
12:43 <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1086 after alter table (duration: 00m 58s) [production]
12:33 <moritzm> rebooting mwlog* for kernel security update [production]
12:04 <moritzm> rebooting graphite hosts in eqiad for kernel security update [production]
11:42 <arturo> clush -w @all "sudo DEBIAN_FRONTEND=noninteractive apt-get autoclean" <-- free space in filesystem [tools]
11:41 <arturo> aborrero@tools-clushmaster-01:~$ clush -w @all "sudo DEBIAN_FRONTEND=noninteractive apt-get autoremove -y" <-- we did in canary servers last week and it went fine. So run in fleet-wide [tools]
11:36 <arturo> (ubuntu) removed linux-image-3.13.0-142-generic and linux-image-3.13.0-137-generic (T188911) [tools]
11:33 <arturo> removing unused kernel packages in ubuntu nodes [tools]
11:29 <moritzm> rebooting k8s masters for kernel security update [production]
11:08 <arturo> aborrero@tools-clushmaster-01:~$ clush -w @all "sudo rm /etc/apt/preferences.d/* ; sudo puppet agent -t -v" <--- rebuild directory, it contains stale files across all the cluster [tools]
11:05 <elukey> reboot analytics10[28,35,52] for kernel updates (one at the time, hadoop hdfs journal nodes) [production]
10:46 <moritzm> powercycling ms-be1021, stuck after reboot [production]
10:45 <akosiaris@tin> Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 01m 22s) [production]
10:43 <moritzm> rearming keyholder on naos after reboot [production]
10:39 <akosiaris> emergency add a captcha in metawiki contact pages like https://meta.wikimedia.org/wiki/Special:Contact/Stewards to stop bot abuse. phab Task to be filed later on [production]
10:39 <godog> reboot ms-be1013 to try fix disk ordering [production]
10:35 <moritzm> rebooting naos for kernel security update [production]
10:32 <moritzm> rearming keyholder on tin after reboot [production]
10:30 <gehel> kafka poller active on all production wdqs nodes - T188252 [production]
10:28 <moritzm> rebooting tin for kernel security update [production]
10:20 <gehel> reboot completed for maps2* and maps-test* [production]
10:19 <elukey> restart webrequest-load-wf-upload-2018-3-6-7 (failed due to reboots) [analytics]
10:08 <elukey> re-starting mysql consumers on eventlog1001 [analytics]
09:51 <moritzm> rebooting graphite hosts in codfw for kernel security update [production]
09:42 <marostegui> Stop MySQL on db1107 for mariadb and kernel upgrade [production]
09:41 <vgutierrez> pybal_1.15.0_all.deb to apt.wikimedia.org jessie-wikimedia [production]