6301-6350 of 10000 results (82ms)
2023-01-26 ยง
12:42 <sukhe@cumin2002> START - Cookbook sre.hosts.downtime for 3:00:00 on cp3051.esams.wmnet with reason: T323717 [production]
12:42 <sukhe@puppetmaster1001> conftool action : set/pooled=no; selector: name=cp3051.esams.wmnet,service=ats-be [production]
12:42 <sukhe@puppetmaster1001> conftool action : set/pooled=no; selector: name=cp3051.esams.wmnet,service=cdn [production]
12:41 <sukhe> depool cp3051.esams.wmnet for firmware update testing: T323717 [production]
12:41 <btullis@cumin1001> START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet [production]
12:40 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet [production]
12:29 <mvernon@cumin2002> END (PASS) - Cookbook sre.swift.roll-restart-reboot-proxies (exit_code=0) rolling restart_daemons on A:eqiad and not A:thanos-fe and A:swift-fe or A:thanos-fe [production]
12:15 <btullis@cumin1001> START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet [production]
12:10 <hnowlan@puppetmaster1001> conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet [production]
12:10 <mvernon@cumin2002> START - Cookbook sre.swift.roll-restart-reboot-proxies rolling restart_daemons on A:eqiad and not A:thanos-fe and A:swift-fe or A:thanos-fe [production]
12:03 <jbond> enable profile::base::firewall::defs_from_etcd: true globally [production]
11:56 <jbond@cumin1001> END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd-client-ssl._tcp.wikimedia.org on all recursors [production]
11:56 <jbond@cumin1001> START - Cookbook sre.dns.wipe-cache _etcd-client-ssl._tcp.wikimedia.org on all recursors [production]
11:49 <hnowlan@puppetmaster1001> conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet [production]
11:49 <hnowlan@puppetmaster1001> conftool action : set/weight=10; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet [production]
11:48 <ayounsi@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flowspec1001 [production]
11:48 <ayounsi@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
11:48 <ayounsi@cumin1001> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flowspec1001 decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001" [production]
11:46 <ayounsi@cumin1001> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flowspec1001 decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001" [production]
11:44 <ayounsi@cumin1001> START - Cookbook sre.dns.netbox [production]
11:40 <ayounsi@cumin1001> START - Cookbook sre.hosts.decommission for hosts flowspec1001 [production]
11:36 <cgoubert@cumin1001> conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux [production]
11:29 <jgiannelos@deploy1002> helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync [production]
11:29 <jgiannelos@deploy1002> helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync [production]
11:28 <hnowlan@puppetmaster1001> conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet [production]
11:08 <marostegui@cumin1001> dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43405 and previous config saved to /var/cache/conftool/dbconfig/20230126-110822-root.json [production]
11:03 <hashar> Restarted Apache 2 on gerrit.wikimedia.org [production]
10:55 <jayme@deploy1002> helmfile [eqiad] DONE helmfile.d/services/toolhub: apply [production]
10:55 <cgoubert@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
10:55 <cgoubert@cumin1001> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename aux-k8s-ingress service to k8s-ingress-aux - cgoubert@cumin1001" [production]
10:54 <jayme@deploy1002> helmfile [eqiad] START helmfile.d/services/toolhub: apply [production]
10:54 <jayme@deploy1002> helmfile [codfw] DONE helmfile.d/services/toolhub: apply [production]
10:53 <marostegui@cumin1001> dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43404 and previous config saved to /var/cache/conftool/dbconfig/20230126-105317-root.json [production]
10:53 <jayme@deploy1002> helmfile [codfw] START helmfile.d/services/toolhub: apply [production]
10:46 <cgoubert@cumin1001> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename aux-k8s-ingress service to k8s-ingress-aux - cgoubert@cumin1001" [production]
10:45 <moritzm> installing postgresql-13 security updates [production]
10:43 <cgoubert@cumin1001> START - Cookbook sre.dns.netbox [production]
10:42 <joal@deploy1002> Finished deploy [airflow-dags/analytics@e52205b]: (no justification provided) (duration: 00m 11s) [production]
10:42 <joal@deploy1002> Started deploy [airflow-dags/analytics@e52205b]: (no justification provided) [production]
10:41 <claime> cgoubert@authdns1001:~$ sudo -i authdns-update [production]
10:38 <marostegui@cumin1001> dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43403 and previous config saved to /var/cache/conftool/dbconfig/20230126-103812-root.json [production]
10:34 <marostegui@cumin1001> dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43402 and previous config saved to /var/cache/conftool/dbconfig/20230126-103448-root.json [production]
10:32 <joal@deploy1002> Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - third after failure [analytics/refinery@8ed8435] (duration: 01m 16s) [production]
10:31 <joal@deploy1002> Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - third after failure [analytics/refinery@8ed8435] [production]
10:23 <marostegui@cumin1001> dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43401 and previous config saved to /var/cache/conftool/dbconfig/20230126-102307-root.json [production]
10:21 <joal@deploy1002> Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - Second after failure [analytics/refinery@8ed8435] (duration: 00m 04s) [production]
10:21 <joal@deploy1002> Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - Second after failure [analytics/refinery@8ed8435] [production]
10:19 <marostegui@cumin1001> dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43400 and previous config saved to /var/cache/conftool/dbconfig/20230126-101943-root.json [production]
10:08 <jbond@cumin1001> END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts sretest1002.eqiad.wmnet [production]
10:08 <jbond@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet [production]