2022-12-19
ยง
|
14:32 |
<oblivian@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mw-debug: apply |
[production] |
14:32 |
<oblivian@deploy1002> |
helmfile [codfw] START helmfile.d/services/mw-debug: apply |
[production] |
14:20 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe1002.eqiad.wmnet |
[production] |
14:20 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for an-presto[1011-1015].eqiad.wnet |
[production] |
14:20 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.remove-downtime for an-presto[1011-1015].eqiad.wnet |
[production] |
14:14 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host moss-fe1002.eqiad.wmnet |
[production] |
14:08 |
<oblivian@deploy1002> |
Synchronized README: Null sync to force a redeployment of the php-fpm base image (duration: 13m 04s) |
[production] |
14:06 |
<moritzm> |
installing giflib security updates |
[production] |
13:58 |
<moritzm> |
installing glibc security updates |
[production] |
13:42 |
<_joe_> |
purge old docker images from deploy1002 by hand |
[production] |
13:27 |
<moritzm> |
installing PHP 7.3 security updates on buster |
[production] |
13:19 |
<phedenskog@deploy1002> |
Finished deploy [performance/navtiming@6aedc70]: (no justification provided) (duration: 00m 08s) |
[production] |
13:19 |
<phedenskog@deploy1002> |
Started deploy [performance/navtiming@6aedc70]: (no justification provided) |
[production] |
12:13 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for an-presto[1006-1010].eqiad.wnet |
[production] |
12:13 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.remove-downtime for an-presto[1006-1010].eqiad.wnet |
[production] |
12:10 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-presto[1001-1005].eqiad.wmnet with reason: Trying five of the new preto servers instead of the original five |
[production] |
12:09 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-presto[1001-1005].eqiad.wmnet with reason: Trying five of the new preto servers instead of the original five |
[production] |
11:43 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on 10 hosts with reason: Reverting presto cluster size from 15 to 5 as a test |
[production] |
11:43 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on 10 hosts with reason: Reverting presto cluster size from 15 to 5 as a test |
[production] |
11:29 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) check 1 in ml-serve-codfw: maintenance |
[production] |
11:29 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) |
[production] |
11:29 |
<elukey@cumin1001> |
START - Cookbook sre.discovery.service-route |
[production] |
11:29 |
<elukey@cumin1001> |
START - Cookbook sre.k8s.pool-depool-cluster check 1 in ml-serve-codfw: maintenance |
[production] |
11:27 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons. |
[production] |
10:51 |
<btullis@cumin1001> |
START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons. |
[production] |
10:36 |
<dcausse@deploy1002> |
Finished deploy [wikimedia/discovery/analytics@b4d31fb]: incoming_link: relax sensor timeout to default 7d (duration: 02m 28s) |
[production] |
10:33 |
<dcausse@deploy1002> |
Started deploy [wikimedia/discovery/analytics@b4d31fb]: incoming_link: relax sensor timeout to default 7d |
[production] |
10:28 |
<taavi@deploy1002> |
Finished scap: Backport for [[gerrit:868869|Only preload getPageData if there's thread data for the page (T325477)]] (duration: 07m 58s) |
[production] |
10:23 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) |
[production] |
10:23 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) |
[production] |
10:23 |
<elukey@cumin1001> |
START - Cookbook sre.discovery.service-route |
[production] |
10:23 |
<elukey@cumin1001> |
START - Cookbook sre.k8s.pool-depool-cluster |
[production] |
10:21 |
<taavi@deploy1002> |
taavi and taavi: Backport for [[gerrit:868869|Only preload getPageData if there's thread data for the page (T325477)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet |
[production] |
10:20 |
<taavi@deploy1002> |
Started scap: Backport for [[gerrit:868869|Only preload getPageData if there's thread data for the page (T325477)]] |
[production] |
09:59 |
<moritzm> |
update bullseye netboot image for Bullseye 11.6 point release T325186 |
[production] |
09:49 |
<aqu@deploy1002> |
Finished deploy [airflow-dags/analytics@6ac3269]: Fix bug fix in HDFS usage pipeline [airflow-dags@6ac3269] (duration: 00m 13s) |
[production] |
09:48 |
<aqu@deploy1002> |
Started deploy [airflow-dags/analytics@6ac3269]: Fix bug fix in HDFS usage pipeline [airflow-dags@6ac3269] |
[production] |
09:47 |
<aqu@deploy1002> |
Finished deploy [airflow-dags/analytics_test@6ac3269]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@6ac3269] (duration: 00m 11s) |
[production] |
09:47 |
<aqu@deploy1002> |
Started deploy [airflow-dags/analytics_test@6ac3269]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@6ac3269] |
[production] |
09:30 |
<aqu@deploy1002> |
Finished deploy [analytics/refinery@2d53aff] (thin): Fix bug fix in HDFS usage pipeline THIN [analytics/refinery@2d53aff] (duration: 00m 08s) |
[production] |
09:29 |
<aqu@deploy1002> |
Started deploy [analytics/refinery@2d53aff] (thin): Fix bug fix in HDFS usage pipeline THIN [analytics/refinery@2d53aff] |
[production] |
09:29 |
<aqu@deploy1002> |
Finished deploy [analytics/refinery@2d53aff]: Fix bug fix in HDFS usage pipeline [analytics/refinery@2d53aff] (duration: 08m 02s) |
[production] |
09:21 |
<aqu@deploy1002> |
Started deploy [analytics/refinery@2d53aff]: Fix bug fix in HDFS usage pipeline [analytics/refinery@2d53aff] |
[production] |
09:20 |
<aqu@deploy1002> |
Finished deploy [analytics/refinery@2d53aff] (hadoop-test): Fix bug fix in HDFS usage pipeline TEST [analytics/refinery@2d53aff] (duration: 01m 14s) |
[production] |
09:19 |
<aqu@deploy1002> |
Started deploy [analytics/refinery@2d53aff] (hadoop-test): Fix bug fix in HDFS usage pipeline TEST [analytics/refinery@2d53aff] |
[production] |
09:17 |
<aqu> |
About to deploy analytics/refinery (bug fix in HDFS usage pipeline) |
[production] |
09:15 |
<ladsgroup@deploy1002> |
Finished scap: Backport for [[gerrit:868867|Emergency: discussiontoolspageinfo return empty response in non-talk ns (T325477)]] (duration: 09m 24s) |
[production] |
09:14 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . |
[production] |
09:11 |
<elukey@deploy1002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . |
[production] |
09:07 |
<ladsgroup@deploy1002> |
ladsgroup and ladsgroup: Backport for [[gerrit:868867|Emergency: discussiontoolspageinfo return empty response in non-talk ns (T325477)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet |
[production] |