1601-1650 of 10000 results (73ms)
2022-12-19 ยง
14:14 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host moss-fe1002.eqiad.wmnet [production]
14:08 <oblivian@deploy1002> Synchronized README: Null sync to force a redeployment of the php-fpm base image (duration: 13m 04s) [production]
14:06 <moritzm> installing giflib security updates [production]
13:58 <moritzm> installing glibc security updates [production]
13:42 <_joe_> purge old docker images from deploy1002 by hand [production]
13:27 <moritzm> installing PHP 7.3 security updates on buster [production]
13:19 <phedenskog@deploy1002> Finished deploy [performance/navtiming@6aedc70]: (no justification provided) (duration: 00m 08s) [production]
13:19 <phedenskog@deploy1002> Started deploy [performance/navtiming@6aedc70]: (no justification provided) [production]
12:13 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for an-presto[1006-1010].eqiad.wnet [production]
12:13 <btullis@cumin1001> START - Cookbook sre.hosts.remove-downtime for an-presto[1006-1010].eqiad.wnet [production]
12:10 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-presto[1001-1005].eqiad.wmnet with reason: Trying five of the new preto servers instead of the original five [production]
12:09 <btullis@cumin1001> START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-presto[1001-1005].eqiad.wmnet with reason: Trying five of the new preto servers instead of the original five [production]
11:43 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on 10 hosts with reason: Reverting presto cluster size from 15 to 5 as a test [production]
11:43 <btullis@cumin1001> START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on 10 hosts with reason: Reverting presto cluster size from 15 to 5 as a test [production]
11:29 <elukey@cumin1001> END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) check 1 in ml-serve-codfw: maintenance [production]
11:29 <elukey@cumin1001> END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) [production]
11:29 <elukey@cumin1001> START - Cookbook sre.discovery.service-route [production]
11:29 <elukey@cumin1001> START - Cookbook sre.k8s.pool-depool-cluster check 1 in ml-serve-codfw: maintenance [production]
11:27 <btullis@cumin1001> END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons. [production]
10:51 <btullis@cumin1001> START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons. [production]
10:36 <dcausse@deploy1002> Finished deploy [wikimedia/discovery/analytics@b4d31fb]: incoming_link: relax sensor timeout to default 7d (duration: 02m 28s) [production]
10:33 <dcausse@deploy1002> Started deploy [wikimedia/discovery/analytics@b4d31fb]: incoming_link: relax sensor timeout to default 7d [production]
10:28 <taavi@deploy1002> Finished scap: Backport for [[gerrit:868869|Only preload getPageData if there's thread data for the page (T325477)]] (duration: 07m 58s) [production]
10:23 <elukey@cumin1001> END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) [production]
10:23 <elukey@cumin1001> END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) [production]
10:23 <elukey@cumin1001> START - Cookbook sre.discovery.service-route [production]
10:23 <elukey@cumin1001> START - Cookbook sre.k8s.pool-depool-cluster [production]
10:21 <taavi@deploy1002> taavi and taavi: Backport for [[gerrit:868869|Only preload getPageData if there's thread data for the page (T325477)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet [production]
10:20 <taavi@deploy1002> Started scap: Backport for [[gerrit:868869|Only preload getPageData if there's thread data for the page (T325477)]] [production]
09:59 <moritzm> update bullseye netboot image for Bullseye 11.6 point release T325186 [production]
09:49 <aqu@deploy1002> Finished deploy [airflow-dags/analytics@6ac3269]: Fix bug fix in HDFS usage pipeline [airflow-dags@6ac3269] (duration: 00m 13s) [production]
09:48 <aqu@deploy1002> Started deploy [airflow-dags/analytics@6ac3269]: Fix bug fix in HDFS usage pipeline [airflow-dags@6ac3269] [production]
09:47 <aqu@deploy1002> Finished deploy [airflow-dags/analytics_test@6ac3269]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@6ac3269] (duration: 00m 11s) [production]
09:47 <aqu@deploy1002> Started deploy [airflow-dags/analytics_test@6ac3269]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@6ac3269] [production]
09:30 <aqu@deploy1002> Finished deploy [analytics/refinery@2d53aff] (thin): Fix bug fix in HDFS usage pipeline THIN [analytics/refinery@2d53aff] (duration: 00m 08s) [production]
09:29 <aqu@deploy1002> Started deploy [analytics/refinery@2d53aff] (thin): Fix bug fix in HDFS usage pipeline THIN [analytics/refinery@2d53aff] [production]
09:29 <aqu@deploy1002> Finished deploy [analytics/refinery@2d53aff]: Fix bug fix in HDFS usage pipeline [analytics/refinery@2d53aff] (duration: 08m 02s) [production]
09:21 <aqu@deploy1002> Started deploy [analytics/refinery@2d53aff]: Fix bug fix in HDFS usage pipeline [analytics/refinery@2d53aff] [production]
09:20 <aqu@deploy1002> Finished deploy [analytics/refinery@2d53aff] (hadoop-test): Fix bug fix in HDFS usage pipeline TEST [analytics/refinery@2d53aff] (duration: 01m 14s) [production]
09:19 <aqu@deploy1002> Started deploy [analytics/refinery@2d53aff] (hadoop-test): Fix bug fix in HDFS usage pipeline TEST [analytics/refinery@2d53aff] [production]
09:17 <aqu> About to deploy analytics/refinery (bug fix in HDFS usage pipeline) [production]
09:15 <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:868867|Emergency: discussiontoolspageinfo return empty response in non-talk ns (T325477)]] (duration: 09m 24s) [production]
09:14 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . [production]
09:11 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . [production]
09:07 <ladsgroup@deploy1002> ladsgroup and ladsgroup: Backport for [[gerrit:868867|Emergency: discussiontoolspageinfo return empty response in non-talk ns (T325477)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet [production]
09:06 <elukey@deploy1002> helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . [production]
09:05 <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:868867|Emergency: discussiontoolspageinfo return empty response in non-talk ns (T325477)]] [production]
09:04 <dcausse> restarting blazegraph on wdqs1015 (BlazegraphFreeAllocatorsDecreasingRapidly) [production]
09:03 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . [production]
09:02 <elukey@deploy1002> helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . [production]