2023-01-26
ยง
|
12:42 |
<sukhe@cumin2002> |
START - Cookbook sre.hosts.downtime for 3:00:00 on cp3051.esams.wmnet with reason: T323717 |
[production] |
12:42 |
<sukhe@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=cp3051.esams.wmnet,service=ats-be |
[production] |
12:42 |
<sukhe@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=cp3051.esams.wmnet,service=cdn |
[production] |
12:41 |
<sukhe> |
depool cp3051.esams.wmnet for firmware update testing: T323717 |
[production] |
12:41 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet |
[production] |
12:40 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet |
[production] |
12:29 |
<mvernon@cumin2002> |
END (PASS) - Cookbook sre.swift.roll-restart-reboot-proxies (exit_code=0) rolling restart_daemons on A:eqiad and not A:thanos-fe and A:swift-fe or A:thanos-fe |
[production] |
12:15 |
<btullis@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet |
[production] |
12:10 |
<hnowlan@puppetmaster1001> |
conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet |
[production] |
12:10 |
<mvernon@cumin2002> |
START - Cookbook sre.swift.roll-restart-reboot-proxies rolling restart_daemons on A:eqiad and not A:thanos-fe and A:swift-fe or A:thanos-fe |
[production] |
12:03 |
<jbond> |
enable profile::base::firewall::defs_from_etcd: true globally |
[production] |
11:56 |
<jbond@cumin1001> |
END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd-client-ssl._tcp.wikimedia.org on all recursors |
[production] |
11:56 |
<jbond@cumin1001> |
START - Cookbook sre.dns.wipe-cache _etcd-client-ssl._tcp.wikimedia.org on all recursors |
[production] |
11:49 |
<hnowlan@puppetmaster1001> |
conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet |
[production] |
11:49 |
<hnowlan@puppetmaster1001> |
conftool action : set/weight=10; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet |
[production] |
11:48 |
<ayounsi@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flowspec1001 |
[production] |
11:48 |
<ayounsi@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
11:48 |
<ayounsi@cumin1001> |
END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flowspec1001 decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001" |
[production] |
11:46 |
<ayounsi@cumin1001> |
START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flowspec1001 decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001" |
[production] |
11:44 |
<ayounsi@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
11:40 |
<ayounsi@cumin1001> |
START - Cookbook sre.hosts.decommission for hosts flowspec1001 |
[production] |
11:36 |
<cgoubert@cumin1001> |
conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux |
[production] |
11:29 |
<jgiannelos@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync |
[production] |
11:29 |
<jgiannelos@deploy1002> |
helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync |
[production] |
11:28 |
<hnowlan@puppetmaster1001> |
conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet |
[production] |
11:08 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43405 and previous config saved to /var/cache/conftool/dbconfig/20230126-110822-root.json |
[production] |
11:03 |
<hashar> |
Restarted Apache 2 on gerrit.wikimedia.org |
[production] |
10:55 |
<jayme@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/toolhub: apply |
[production] |
10:55 |
<cgoubert@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
10:55 |
<cgoubert@cumin1001> |
END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename aux-k8s-ingress service to k8s-ingress-aux - cgoubert@cumin1001" |
[production] |
10:54 |
<jayme@deploy1002> |
helmfile [eqiad] START helmfile.d/services/toolhub: apply |
[production] |
10:54 |
<jayme@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/toolhub: apply |
[production] |
10:53 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43404 and previous config saved to /var/cache/conftool/dbconfig/20230126-105317-root.json |
[production] |
10:53 |
<jayme@deploy1002> |
helmfile [codfw] START helmfile.d/services/toolhub: apply |
[production] |
10:46 |
<cgoubert@cumin1001> |
START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename aux-k8s-ingress service to k8s-ingress-aux - cgoubert@cumin1001" |
[production] |
10:45 |
<moritzm> |
installing postgresql-13 security updates |
[production] |
10:43 |
<cgoubert@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
10:42 |
<joal@deploy1002> |
Finished deploy [airflow-dags/analytics@e52205b]: (no justification provided) (duration: 00m 11s) |
[production] |
10:42 |
<joal@deploy1002> |
Started deploy [airflow-dags/analytics@e52205b]: (no justification provided) |
[production] |
10:41 |
<claime> |
cgoubert@authdns1001:~$ sudo -i authdns-update |
[production] |
10:38 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43403 and previous config saved to /var/cache/conftool/dbconfig/20230126-103812-root.json |
[production] |
10:34 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43402 and previous config saved to /var/cache/conftool/dbconfig/20230126-103448-root.json |
[production] |
10:32 |
<joal@deploy1002> |
Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - third after failure [analytics/refinery@8ed8435] (duration: 01m 16s) |
[production] |
10:31 |
<joal@deploy1002> |
Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - third after failure [analytics/refinery@8ed8435] |
[production] |
10:23 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43401 and previous config saved to /var/cache/conftool/dbconfig/20230126-102307-root.json |
[production] |
10:21 |
<joal@deploy1002> |
Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - Second after failure [analytics/refinery@8ed8435] (duration: 00m 04s) |
[production] |
10:21 |
<joal@deploy1002> |
Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - Second after failure [analytics/refinery@8ed8435] |
[production] |
10:19 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43400 and previous config saved to /var/cache/conftool/dbconfig/20230126-101943-root.json |
[production] |
10:08 |
<jbond@cumin1001> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts sretest1002.eqiad.wmnet |
[production] |
10:08 |
<jbond@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet |
[production] |