3001-3050 of 10000 results (65ms)
2018-03-06 ยง
15:50 <oblivian@tin> Synchronized wmf-config: Expose etcd last modified index (duration: 01m 00s) [production]
15:50 <madhuvishy> Rebooting tools-worker-1011 [tools]
15:45 <moritzm> rebooting ununpentium for kernel security update [production]
15:39 <oblivian@tin> Finished scap: Deploying Expose the latest modified index seen by EtcdConfig (duration: 09m 49s) [production]
15:34 <bd808> Restarting webservice (T188998) [tools.orphantalk]
15:29 <oblivian@tin> Started scap: Deploying Expose the latest modified index seen by EtcdConfig [production]
15:28 <moritzm> rebooting bromine for kernel security update [production]
15:19 <mobrovac@tin> Synchronized php-1.31.0-wmf.23/includes/jobqueue/JobQueueSecondTestQueue.php: [JobQueueSecondTestQueue] Support read-only mode - T185052 (duration: 00m 58s) [production]
15:09 <vgutierrez> update to pybal 1.15.0 on lvs5003 [production]
15:08 <chasemp> tools-k8s-master-01:~# kubectl uncordon tools-worker-1011.tools.eqiad.wmflabs [tools]
15:03 <arturo> drain and reboot tools-worker-1011 [tools]
15:03 <chasemp> rebooted tools-worker 1001-1008 [tools]
15:02 <hashar@tin> Synchronized wmf-config/InitialiseSettings.php: Article counts: Change 'comma' method to 'any' - T188472 (duration: 01m 00s) [production]
14:58 <arturo> drain and reboot tools-worker-1010 [tools]
14:50 <vgutierrez> update pybal to 1.15.0 on lvs1010 [production]
14:46 <hashar> tin: /srv/mediawiki-staging/php-1.31.0-wmf.23 rebased on tip of https://gerrit.wikimedia.org/r/#/c/416686/ (that revert a merge of master branch) [production]
14:42 <gehel> rebooting maps1* (eqiad) for kernel security update completed [production]
14:37 <chasemp> @tools-bastion-03:~$ webservice restart --backend=kubernetes [tools.replag]
14:36 <ottomata> beginning migration of webrequest text varnishkafka logs from Kafka analytics to Kafka jumbo-eqiad T185136 [production]
14:27 <chasemp> multiple tools running on k8s workers report issues reading replica.my.cnf file atm [tools]
14:27 <chasemp> reboot tools-worker-100[12] [tools]
14:23 <chasemp> downtime icinga alert for k8s workers ready [tools]
14:21 <moritzm> rebooting labweb* for kernel security update [production]
14:13 <moritzm> rebooting sca* for kernel security update [production]
14:07 <gehel> rebooting maps1* (eqiad) for kernel security update [production]
14:07 <moritzm> rebooting pybal-test for kernel security update [production]
14:00 <_joe_> SWAT is suspended for investigation on tin's git status [production]
14:00 <moritzm> rebooting oxygen for kernel security update [production]
13:21 <arturo> T188994 in some servers there was some race in the dpkg lock between apt-upgrade and puppet. Also, I forgot to use DEBIAN_FRONTEND=noninteractive, so debconf prompts happened and stalled dpkg operations. Already solved, but some puppet alerts were produced [tools]
13:16 <moritzm> powercycling ms-be1038, stuck after reboot [production]
13:10 <marostegui> Deploy schema change on db1094 - T187089 T185128 T153182 [production]
13:09 <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1094 for alter table (duration: 00m 58s) [production]
12:58 <arturo> T188994 upgrading packages in jessie nodes from the oldstable source [tools]
12:55 <moritzm> rebooting URL downloaders for kernel security update [production]
12:51 <mobrovac@tin> Finished deploy [cpjobqueue/deploy@9b0b947]: refreshLinks: Increase concurrency to 100 - T185052 (duration: 00m 34s) [production]
12:50 <mobrovac@tin> Started deploy [cpjobqueue/deploy@9b0b947]: refreshLinks: Increase concurrency to 100 - T185052 [production]
12:43 <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1086 after alter table (duration: 00m 58s) [production]
12:33 <moritzm> rebooting mwlog* for kernel security update [production]
12:04 <moritzm> rebooting graphite hosts in eqiad for kernel security update [production]
11:42 <arturo> clush -w @all "sudo DEBIAN_FRONTEND=noninteractive apt-get autoclean" <-- free space in filesystem [tools]
11:41 <arturo> aborrero@tools-clushmaster-01:~$ clush -w @all "sudo DEBIAN_FRONTEND=noninteractive apt-get autoremove -y" <-- we did in canary servers last week and it went fine. So run in fleet-wide [tools]
11:36 <arturo> (ubuntu) removed linux-image-3.13.0-142-generic and linux-image-3.13.0-137-generic (T188911) [tools]
11:33 <arturo> removing unused kernel packages in ubuntu nodes [tools]
11:29 <moritzm> rebooting k8s masters for kernel security update [production]
11:08 <arturo> aborrero@tools-clushmaster-01:~$ clush -w @all "sudo rm /etc/apt/preferences.d/* ; sudo puppet agent -t -v" <--- rebuild directory, it contains stale files across all the cluster [tools]
11:05 <elukey> reboot analytics10[28,35,52] for kernel updates (one at the time, hadoop hdfs journal nodes) [production]
10:46 <moritzm> powercycling ms-be1021, stuck after reboot [production]
10:45 <akosiaris@tin> Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 01m 22s) [production]
10:43 <moritzm> rearming keyholder on naos after reboot [production]
10:39 <akosiaris> emergency add a captcha in metawiki contact pages like https://meta.wikimedia.org/wiki/Special:Contact/Stewards to stop bot abuse. phab Task to be filed later on [production]