851-900 of 10000 results (87ms)
2023-01-23 §
04:59 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depool db2113 T327611', diff saved to https://phabricator.wikimedia.org/P43210 and previous config saved to /var/cache/conftool/dbconfig/20230123-045939-ladsgroup.json [production]
04:57 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Promote db2123 to s5 primary T327611', diff saved to https://phabricator.wikimedia.org/P43209 and previous config saved to /var/cache/conftool/dbconfig/20230123-045740-ladsgroup.json [production]
04:57 <Amir1> Starting s5 codfw failover from db2113 to db2123 - T327611 [production]
04:53 <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance [production]
04:53 <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance [production]
04:33 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Set db2123 with weight 0 T327611', diff saved to https://phabricator.wikimedia.org/P43208 and previous config saved to /var/cache/conftool/dbconfig/20230123-043324-ladsgroup.json [production]
04:33 <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T327611 [production]
04:32 <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T327611 [production]
04:02 <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance [production]
04:02 <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance [production]
03:56 <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance [production]
03:56 <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance [production]
03:54 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depool db2107 T327609', diff saved to https://phabricator.wikimedia.org/P43207 and previous config saved to /var/cache/conftool/dbconfig/20230123-035458-ladsgroup.json [production]
03:52 <Amir1> Starting s2 codfw failover from db2107 to db2104 - T327609 [production]
2023-01-20 §
18:22 <jynus> deploying new grants for backups on m1 T327155 [production]
16:15 <isaranto@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . [production]
16:15 <isaranto@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [production]
16:15 <isaranto@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . [production]
16:14 <isaranto@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . [production]
16:14 <isaranto@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . [production]
16:14 <isaranto@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . [production]
16:14 <isaranto@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . [production]
14:28 <elukey@deploy1002> helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. [production]
14:27 <elukey@deploy1002> helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. [production]
14:24 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [production]
14:24 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [production]
13:08 <jmm@cumin2002> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002" [production]
13:08 <moritzm> installing node-minimatch security updates [production]
13:01 <moritzm> installing libxstream-java security updates [production]
13:00 <sukhe> reprepro --ignore=wrongdistribution -C main include bullseye-wikimedia cadvisor_0.44.0+ds1-1~wmf1_amd64.changes: T325557 [production]
12:45 <jmm@cumin2002> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002" [production]
12:38 <jiji@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2040.codfw.wmnet with OS bullseye [production]
12:23 <jiji@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2040.codfw.wmnet with reason: host reimage [production]
12:20 <jiji@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mc2040.codfw.wmnet with reason: host reimage [production]
12:17 <moritzm> installing ping1003 T273509 [production]
12:04 <jiji@cumin1001> START - Cookbook sre.hosts.reimage for host mc2040.codfw.wmnet with OS bullseye [production]
12:03 <jiji@deploy1002> helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply [production]
12:02 <jiji@deploy1002> helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply [production]
10:50 <jmm@cumin2002> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002" [production]
10:49 <jmm@cumin2002> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002" [production]
10:32 <elukey> restart kubelet on ml-staging200* nodes (some fs-inotify-related issues with the istio-proxy of newly created containers) [production]
10:27 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . [production]
10:13 <moritzm> installing emacs security updates on bullseye [production]
10:13 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [production]
10:12 <moritzm> imported jenkins 2.375-2 to thirdparty/ci T326531 [production]
10:00 <jnuche@deploy1002> Installation of scap version "4.33.1" completed for 1 hosts [production]
10:00 <jnuche@deploy1002> Installing scap version "4.33.1" for 1 hosts [production]
08:59 <moritzm> installing ping2003 T273509 [production]
08:10 <elukey> restart kubelet on kubernetes2007 - node reported issues with it, marked as "notready" by the control plane [production]
07:58 <elukey> `apt-get clean` on doh4001 to free space (root partition almost filled) [production]