2024-09-18
ยง
|
11:46 |
<dreamyjazz@deploy1003> |
Started scap sync-world: Backport for [[gerrit:1073755|Hooks: Re-order checks to verify that request user is same as Special:Contributions user (T375061)]] |
[production] |
11:43 |
<XioNoX> |
update pfw3-codfw dhcp-relay target 0 T375011 |
[production] |
11:43 |
<tchin@deploy1003> |
Finished deploy [analytics/refinery@bc0be94] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@bc0be94a] (duration: 03m 57s) |
[production] |
11:39 |
<tchin@deploy1003> |
Started deploy [analytics/refinery@bc0be94] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@bc0be94a] |
[production] |
11:39 |
<tchin@deploy1003> |
Finished deploy [analytics/refinery@bc0be94] (thin): Regular analytics weekly train THIN [analytics/refinery@bc0be94a] (duration: 05m 50s) |
[production] |
11:33 |
<tchin@deploy1003> |
Started deploy [analytics/refinery@bc0be94] (thin): Regular analytics weekly train THIN [analytics/refinery@bc0be94a] |
[production] |
11:32 |
<tchin@deploy1003> |
Finished deploy [analytics/refinery@bc0be94]: Regular analytics weekly train [analytics/refinery@bc0be94a] (duration: 09m 06s) |
[production] |
11:23 |
<tchin@deploy1003> |
Started deploy [analytics/refinery@bc0be94]: Regular analytics weekly train [analytics/refinery@bc0be94a] |
[production] |
11:23 |
<tchin> |
Deploying refinery |
[production] |
11:16 |
<brouberol@deploy1003> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply |
[production] |
11:15 |
<brouberol@deploy1003> |
helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply |
[production] |
10:54 |
<stevemunene@cumin1002> |
START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye |
[production] |
10:25 |
<stevemunene@cumin1002> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye |
[production] |
10:20 |
<elukey> |
restart poolcounterd on poolcounter2003 (not serving any traffic atm, tried to clear old/stale conns) |
[production] |
10:14 |
<elukey@deploy1003> |
Finished scap sync-world: Backport for [[gerrit:1073427|Swap poolcounter2004 with poolcounter2006 (T332015)]] (duration: 07m 08s) |
[production] |
10:09 |
<elukey@deploy1003> |
elukey: Continuing with sync |
[production] |
10:09 |
<elukey@deploy1003> |
elukey: Backport for [[gerrit:1073427|Swap poolcounter2004 with poolcounter2006 (T332015)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |
10:07 |
<elukey@deploy1003> |
Started scap sync-world: Backport for [[gerrit:1073427|Swap poolcounter2004 with poolcounter2006 (T332015)]] |
[production] |
09:26 |
<tappof@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet |
[production] |
09:11 |
<stevemunene@cumin1002> |
START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye |
[production] |
09:11 |
<tappof@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet |
[production] |
09:01 |
<moritzm> |
drain ganeti2026 T373104 |
[production] |
08:41 |
<tappof> |
centrallog2002 upgrade to bookworm in progress https://phabricator.wikimedia.org/T353912 |
[production] |
08:32 |
<elukey> |
install openjdk-17-jdk on puppetserver1002 to get some useful tools like jmap - T373527 |
[production] |
08:30 |
<jnuche@deploy1003> |
rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.23 refs T373642 |
[production] |
08:25 |
<brouberol@deploy1003> |
helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. |
[production] |
08:25 |
<brouberol@deploy1003> |
helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. |
[production] |
08:21 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2017.codfw.wmnet |
[production] |
08:16 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2017.codfw.wmnet |
[production] |
08:15 |
<jnuche@deploy1003> |
rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.23 refs T373642 |
[production] |
07:45 |
<volans@cumin1002> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
07:45 |
<volans@cumin1002> |
END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fixed asset tag for db1179 - volans@cumin1002" |
[production] |
07:43 |
<volans@cumin1002> |
START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fixed asset tag for db1179 - volans@cumin1002" |
[production] |
07:33 |
<volans@cumin1002> |
START - Cookbook sre.dns.netbox |
[production] |
06:39 |
<moritzm> |
installing curl security updates |
[production] |
06:05 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'T374807', diff saved to https://phabricator.wikimedia.org/P69250 and previous config saved to /var/cache/conftool/dbconfig/20240918-060549-arnaudb.json |
[production] |
06:03 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Promote db2220 to s7 primary T374807', diff saved to https://phabricator.wikimedia.org/P69249 and previous config saved to /var/cache/conftool/dbconfig/20240918-060332-arnaudb.json |
[production] |
06:02 |
<arnaudb> |
Starting s7 codfw failover from db2218 to db2220 - T374807 |
[production] |
05:49 |
<arnaudb@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T374807 |
[production] |
05:49 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Remove db2220 from API/vslow/dump T374807', diff saved to https://phabricator.wikimedia.org/P69248 and previous config saved to /var/cache/conftool/dbconfig/20240918-054921-arnaudb.json |
[production] |
05:49 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Set db2220 with weight 0 T374807', diff saved to https://phabricator.wikimedia.org/P69247 and previous config saved to /var/cache/conftool/dbconfig/20240918-054909-arnaudb.json |
[production] |
05:48 |
<arnaudb@cumin1002> |
START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 T374807 |
[production] |
05:47 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'T374804', diff saved to https://phabricator.wikimedia.org/P69246 and previous config saved to /var/cache/conftool/dbconfig/20240918-054729-arnaudb.json |
[production] |
05:45 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Promote db2179 to s4 primary T374804', diff saved to https://phabricator.wikimedia.org/P69245 and previous config saved to /var/cache/conftool/dbconfig/20240918-054515-arnaudb.json |
[production] |
05:43 |
<arnaudb> |
Starting s4 codfw failover from db2140 to db2179 - T374804 |
[production] |
05:38 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Remove db2179 from API/vslow/dump T374804', diff saved to https://phabricator.wikimedia.org/P69244 and previous config saved to /var/cache/conftool/dbconfig/20240918-053807-arnaudb.json |
[production] |
05:37 |
<arnaudb@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s4 T374804 |
[production] |
05:36 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Set db2179 with weight 0 T374804', diff saved to https://phabricator.wikimedia.org/P69243 and previous config saved to /var/cache/conftool/dbconfig/20240918-053633-arnaudb.json |
[production] |
05:36 |
<arnaudb@cumin1002> |
START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s4 T374804 |
[production] |
05:33 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'T375047', diff saved to https://phabricator.wikimedia.org/P69242 and previous config saved to /var/cache/conftool/dbconfig/20240918-053357-arnaudb.json |
[production] |