601-650 of 10000 results (71ms)
2023-01-23 §
04:02 <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance [production]
03:56 <ladsgroup@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance [production]
03:56 <ladsgroup@cumin1001> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance [production]
03:54 <ladsgroup@cumin1001> dbctl commit (dc=all): 'Depool db2107 T327609', diff saved to https://phabricator.wikimedia.org/P43207 and previous config saved to /var/cache/conftool/dbconfig/20230123-035458-ladsgroup.json [production]
03:52 <Amir1> Starting s2 codfw failover from db2107 to db2104 - T327609 [production]
2023-01-20 §
18:22 <jynus> deploying new grants for backups on m1 T327155 [production]
16:15 <isaranto@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . [production]
16:15 <isaranto@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [production]
16:15 <isaranto@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . [production]
16:14 <isaranto@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . [production]
16:14 <isaranto@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . [production]
16:14 <isaranto@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . [production]
16:14 <isaranto@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . [production]
14:28 <elukey@deploy1002> helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'. [production]
14:27 <elukey@deploy1002> helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'. [production]
14:24 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [production]
14:24 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [production]
13:08 <jmm@cumin2002> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002" [production]
13:08 <moritzm> installing node-minimatch security updates [production]
13:01 <moritzm> installing libxstream-java security updates [production]
13:00 <sukhe> reprepro --ignore=wrongdistribution -C main include bullseye-wikimedia cadvisor_0.44.0+ds1-1~wmf1_amd64.changes: T325557 [production]
12:45 <jmm@cumin2002> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002" [production]
12:38 <jiji@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2040.codfw.wmnet with OS bullseye [production]
12:23 <jiji@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2040.codfw.wmnet with reason: host reimage [production]
12:20 <jiji@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mc2040.codfw.wmnet with reason: host reimage [production]
12:17 <moritzm> installing ping1003 T273509 [production]
12:04 <jiji@cumin1001> START - Cookbook sre.hosts.reimage for host mc2040.codfw.wmnet with OS bullseye [production]
12:03 <jiji@deploy1002> helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply [production]
12:02 <jiji@deploy1002> helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply [production]
10:50 <jmm@cumin2002> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002" [production]
10:49 <jmm@cumin2002> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002" [production]
10:32 <elukey> restart kubelet on ml-staging200* nodes (some fs-inotify-related issues with the istio-proxy of newly created containers) [production]
10:27 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . [production]
10:13 <moritzm> installing emacs security updates on bullseye [production]
10:13 <elukey@deploy1002> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . [production]
10:12 <moritzm> imported jenkins 2.375-2 to thirdparty/ci T326531 [production]
10:00 <jnuche@deploy1002> Installation of scap version "4.33.1" completed for 1 hosts [production]
10:00 <jnuche@deploy1002> Installing scap version "4.33.1" for 1 hosts [production]
08:59 <moritzm> installing ping2003 T273509 [production]
08:10 <elukey> restart kubelet on kubernetes2007 - node reported issues with it, marked as "notready" by the control plane [production]
07:58 <elukey> `apt-get clean` on doh4001 to free space (root partition almost filled) [production]
01:55 <ejegg> payments-wiki upgraded from 3cf03933 to 3d882ac7 [production]
01:12 <ejegg> payments-wiki upgraded from fcb9ab60 to 3cf03933 [production]
2023-01-19 §
21:46 <jiji@cumin1001> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2039.codfw.wmnet with OS bullseye [production]
21:42 <jdrewniak@deploy1002> Finished scap: Backport for [[gerrit:881677|Enable Page tools on viwiki and itwiki (T327348)]] (duration: 10m 38s) [production]
21:33 <jdrewniak@deploy1002> jdlrobson and jdrewniak: Backport for [[gerrit:881677|Enable Page tools on viwiki and itwiki (T327348)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet [production]
21:31 <jiji@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2039.codfw.wmnet with reason: host reimage [production]
21:31 <jdrewniak@deploy1002> Started scap: Backport for [[gerrit:881677|Enable Page tools on viwiki and itwiki (T327348)]] [production]
21:27 <jdrewniak@deploy1002> Finished scap: Backport for [[gerrit:881612|Fix grid blowout with limited width turned off (T327423)]] (duration: 08m 26s) [production]
21:27 <jiji@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mc2039.codfw.wmnet with reason: host reimage [production]