5101-5150 of 10000 results (45ms)
2021-03-09 §
18:12 <brennen@deploy1002> Started scap: testwikis wikis to 1.36.0-wmf.34 [production]
18:10 <mbsantos@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' . [production]
18:09 <Majavah> set deployment-db05 to read-only to avoid issues with T276968 [releng]
18:05 <mbsantos@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' . [production]
18:04 <marxarelli> deleting shut down memc* deployment-prep instances to free up quota for replacement db instances (T276968) [releng]
18:03 <mbsantos@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' . [production]
18:02 <marxarelli> deleting shut down memc* deployment-prep instances to free up quota for replacement db instances (T276968) [production]
18:02 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1085.eqiad.wmnet with reason: REIMAGE [production]
18:00 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1085.eqiad.wmnet with reason: REIMAGE [production]
17:50 <papaul> rebooting db2073 for firmware upgrade [production]
17:25 <marxarelli> seeing "[ 2886.337845] EXT4-fs error (device vda3): ext4_validate_block_bitmap:" for deployment-db05 [releng]
17:22 <marxarelli> restarting deployment-db05 via horizon [releng]
17:22 <marxarelli> deployment-db05 seems to be acting up (intermittent connection failures) which is causing issues with beta-update-databases-eqiad, which is (possibly) causing post-merge jobs to pile up [releng]
17:01 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1077.eqiad.wmnet with reason: REIMAGE [production]
17:00 <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: 3119d7a703a38b328fa634db64b2929d54829884: sqwiki: Fix deployment of Growth features (duration: 01m 00s) [production]
16:59 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1077.eqiad.wmnet with reason: REIMAGE [production]
16:47 <marxarelli> still seeing "JobOffer[deployment-deploy01 #3] rejected beta-scap-eqiad: Waiting for next available executor on ‘deployment-deploy01’" despite available executors [releng]
16:46 <pt1979@cumin2001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
16:41 <pt1979@cumin2001> START - Cookbook sre.dns.netbox [production]
16:40 <elukey> reimage analytics1077 to buster [production]
16:40 <elukey> reimage analytics1077 to buster [analytics]
16:33 <aborrero@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1027.eqiad.wmnet [production]
16:32 <jayme@deploy1002> helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. [production]
16:31 <jayme@deploy1002> helmfile [staging-codfw] START helmfile.d/admin 'sync'. [production]
16:31 <brennen> 1.36.0-wmf.34 was branched at e175899921535f83e168145cbe942489475607db for T274938 [production]
16:27 <arturo> rebooting cloudvirt1027 (T275753) [admin]
16:27 <aborrero@cumin1001> START - Cookbook sre.hosts.reboot-single for host cloudvirt1027.eqiad.wmnet [production]
16:26 <marxarelli> builds once again being scheduled on deployment-deploy01 [releng]
16:24 <marxarelli> cycling gearman plugin on integration.wikimedia.org [releng]
16:21 <marostegui@cumin1001> dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14708 and previous config saved to /var/cache/conftool/dbconfig/20210309-162116-root.json [production]
16:16 <marxarelli> taking deployment-deploy01 agent offline to mitigate stuck post-merge jobs [releng]
16:06 <marostegui@cumin1001> dbctl commit (dc=all): 'db1175 (re)pooling @ 80%: 10', diff saved to https://phabricator.wikimedia.org/P14707 and previous config saved to /var/cache/conftool/dbconfig/20210309-160613-root.json [production]
15:56 <moritzm> imported prometheus-ircd-exporter 0.2 to apt.wikimedia.org T224579 [production]
15:51 <marostegui@cumin1001> dbctl commit (dc=all): 'db1175 (re)pooling @ 60%: 10', diff saved to https://phabricator.wikimedia.org/P14706 and previous config saved to /var/cache/conftool/dbconfig/20210309-155109-root.json [production]
15:45 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1072.eqiad.wmnet with reason: REIMAGE [production]
15:43 <elukey@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1072.eqiad.wmnet with reason: REIMAGE [production]
15:37 <marostegui@cumin1001> dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repooling db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P14705 and previous config saved to /var/cache/conftool/dbconfig/20210309-153715-root.json [production]
15:36 <razzi> rebalance kafka partitions for webrequest_upload partition 13 [analytics]
15:36 <marostegui@cumin1001> dbctl commit (dc=all): 'db1175 (re)pooling @ 40%: 10', diff saved to https://phabricator.wikimedia.org/P14704 and previous config saved to /var/cache/conftool/dbconfig/20210309-153605-root.json [production]
15:35 <jmm@cumin2001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1008.eqiad.wmnet [production]
15:29 <jmm@cumin2001> START - Cookbook sre.hosts.reboot-single for host ms-fe1008.eqiad.wmnet [production]
15:28 <jmm@cumin2001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1007.eqiad.wmnet [production]
15:27 <otto@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Declare KaiOS / Inuka event streams - T267344 T267345 T267346 (duration: 00m 58s) [production]
15:22 <marostegui@cumin1001> dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 60%: Repooling db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P14703 and previous config saved to /var/cache/conftool/dbconfig/20210309-152212-root.json [production]
15:21 <marostegui@cumin1001> dbctl commit (dc=all): 'db1175 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P14702 and previous config saved to /var/cache/conftool/dbconfig/20210309-152102-root.json [production]
15:20 <otto@deploy1002> Synchronized wmf-config/InitialiseSettings.php: WikimediaEvents: Bump session_tick sampling rate to 10% (duration: 00m 58s) [production]
15:18 <elukey> reimage analytics1072 (hadoop hdfs journal node) to buster [production]
15:18 <elukey> reimage analytics1072 (hadoop hdfs journal node) to buster [analytics]
15:15 <jmm@cumin2001> START - Cookbook sre.hosts.reboot-single for host ms-fe1007.eqiad.wmnet [production]
15:15 <jmm@cumin2001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1006.eqiad.wmnet [production]