2021-11-17
ยง
|
16:27 |
<cmooney@cumin2002> |
END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts rpki2001.codfw.wmnet |
[production] |
16:21 |
<XioNoX> |
move cr1-codfw<->cr2-eqdfw link to BO cable |
[production] |
16:19 |
<cmooney@cumin2002> |
START - Cookbook sre.hosts.decommission for hosts rpki2001.codfw.wmnet |
[production] |
16:06 |
<XioNoX> |
move cr1-codfw:xe-5/3/0 to BO cable |
[production] |
16:04 |
<XioNoX> |
re-enable Telia BGP on cr1-codfw |
[production] |
16:01 |
<btullis@cumin1001> |
START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. |
[production] |
15:59 |
<bblack> |
netbox: added ganeti01 and ganeti02 cluster definitions for drmrs |
[production] |
15:58 |
<XioNoX> |
disable Telia BGP on cr1-codfw |
[production] |
15:55 |
<XioNoX> |
move codfw-ulsfo link to break-out cable |
[production] |
15:46 |
<mutante> |
restarting pybal on lvs1015 |
[production] |
15:43 |
<_joe_> |
restarting pybal on lvs2009 |
[production] |
15:42 |
<mutante> |
restarting pybal on lvs1016 |
[production] |
15:39 |
<_joe_> |
restarting pybal on lvs2010 |
[production] |
15:35 |
<XioNoX> |
drain ulsfo-codfw link |
[production] |
14:47 |
<moritzm> |
installing perl bugfix updates from Bullseye point release |
[production] |
14:22 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Ganeti update tests |
[production] |
14:22 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Ganeti update tests |
[production] |
13:49 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Change weights on s5 special slaves in eqiad T263127', diff saved to https://phabricator.wikimedia.org/P17755 and previous config saved to /var/cache/conftool/dbconfig/20211117-134942-marostegui.json |
[production] |
13:48 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Remove recentchanges from s5 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P17754 and previous config saved to /var/cache/conftool/dbconfig/20211117-134835-marostegui.json |
[production] |
13:20 |
<aborrero@cumin1001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host cloudbackup1001-dev.eqiad.wmnet |
[production] |
13:10 |
<aborrero@cumin1001> |
START - Cookbook sre.ganeti.makevm for new host cloudbackup1001-dev.eqiad.wmnet |
[production] |
13:02 |
<Lucas_WMDE> |
UTC morning backport+config window done |
[production] |
12:54 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet |
[production] |
12:50 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet |
[production] |
12:26 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
12:24 |
<lucaswerkmeister-wmde@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:739467|Enable disambiguator notifications on 6 Wikipedias (T293319)]] (duration: 01m 04s) |
[production] |
12:22 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
12:22 |
<btullis@cumin1001> |
END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons. |
[production] |
12:17 |
<topranks> |
Re-pooling ulsfo after completing routing changes on cr3-ulsfo and cr4-ulsfo (T295672) |
[production] |
12:12 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
12:11 |
<btullis@cumin1001> |
START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons. |
[production] |
12:11 |
<moritzm> |
failover ganeti master in test cluster to ganeti-test2003 |
[production] |
12:09 |
<lucaswerkmeister-wmde@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:739391|Enable more languages for Section Translation in testwiki (T294223)]] (duration: 01m 52s) |
[production] |
12:08 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . |
[production] |
11:09 |
<moritzm> |
installing testvm2002 |
[production] |
10:51 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Remove recentchangeslinked from s5 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P17753 and previous config saved to /var/cache/conftool/dbconfig/20211117-105120-marostegui.json |
[production] |
10:45 |
<dcausse> |
restarting blazegraph on wdqs1013 (jvm stuck) |
[production] |
10:45 |
<topranks> |
Commencing manual config on cr3-ulsfo and cr4-ulsfo (site depooled) to reconfigure iBGP (T295672) |
[production] |
10:42 |
<hnowlan> |
replaced all references to deploy1001 with deploy1002 in all .git/DEPLOY_HEAD directories on deploy1002:/srv/deployment |
[production] |
10:41 |
<ema> |
A:cp re-enable puppet after testing https://gerrit.wikimedia.org/r/c/operations/puppet/+/738949/ T293879 |
[production] |
10:37 |
<jayme> |
imported wmf-certificates 0~20211110-1 to stretch-wikimedia,buster-wikimedia,bullseye-wikimedia |
[production] |
10:31 |
<ema> |
A:cp disable-puppet to merge and test https://gerrit.wikimedia.org/r/c/operations/puppet/+/738949/ T293879 |
[production] |
10:28 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet |
[production] |
10:18 |
<mmandere@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6016.drmrs.wmnet with OS buster |
[production] |
10:18 |
<jmm@cumin2002> |
START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet |
[production] |
10:14 |
<topranks> |
De-pool ulsfo in DNS to allow safe reconfiguration / test of changes to CR routers iBGP (T295672) |
[production] |
10:01 |
<hnowlan@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' . |
[production] |
10:01 |
<hnowlan@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' . |
[production] |
10:00 |
<moritzm> |
running "gnt-cluster upgrade --to 2.16" on ganeti test cluster |
[production] |
09:59 |
<hnowlan@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' . |
[production] |