2022-10-05
ยง
|
12:18 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1030.eqiad.wmnet with reason: host reimage |
[production] |
12:16 |
<aborrero@cumin1001> |
END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host cloudnet1006.eqiad.wmnet with OS bullseye |
[production] |
12:15 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1030.eqiad.wmnet with reason: host reimage |
[production] |
12:13 |
<aborrero@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1005.eqiad.wmnet with OS bullseye |
[production] |
12:02 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reimage for host ganeti1030.eqiad.wmnet with OS bullseye |
[production] |
11:54 |
<aborrero@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage |
[production] |
11:53 |
<XioNoX> |
fix MTU between eqiad core routers and cloudsw - T315838 |
[production] |
11:52 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1029.eqiad.wmnet with OS bullseye |
[production] |
11:52 |
<aborrero@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage |
[production] |
11:49 |
<aborrero@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage |
[production] |
11:49 |
<aborrero@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage |
[production] |
11:37 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1029.eqiad.wmnet with reason: host reimage |
[production] |
11:33 |
<aborrero@cumin1001> |
START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye |
[production] |
11:33 |
<aborrero@cumin1001> |
START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye |
[production] |
11:33 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1029.eqiad.wmnet with reason: host reimage |
[production] |
11:20 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.reimage for host ganeti1029.eqiad.wmnet with OS bullseye |
[production] |
11:04 |
<moritzm> |
running "gnt-cluster upgrade --to 3.0" for ganeti/eqiad T311687 |
[production] |
11:01 |
<vgutierrez> |
repool cp2036 - T319394 |
[production] |
10:53 |
<vgutierrez> |
powercycle cp2036 - T319394 |
[production] |
10:52 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: apply |
[production] |
10:51 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply |
[production] |
10:51 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply |
[production] |
10:50 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply |
[production] |
10:48 |
<vgutierrez@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=cp2036.codfw.wmnet |
[production] |
10:46 |
<hoo> |
Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for commonswiki |
[production] |
10:45 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: apply |
[production] |
10:44 |
<hoo@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for commonswiki (duration: 03m 51s) |
[production] |
10:44 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply |
[production] |
10:44 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply |
[production] |
10:43 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply |
[production] |
10:36 |
<moritzm> |
installing gdk-pixbuf security updates |
[production] |
09:52 |
<hoo> |
Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of ruwikinews |
[production] |
09:51 |
<hoo> |
Ran extensions/Wikibase/client/maintenance/PopulateUnexpectedUnconnectedPagePageProp.php for all of arwiki |
[production] |
09:32 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: apply |
[production] |
09:31 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply |
[production] |
09:31 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply |
[production] |
09:31 |
<hoo@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for ruwikinews (duration: 03m 39s) |
[production] |
09:30 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply |
[production] |
09:21 |
<moritzm> |
upgrading ganeti/eqiad nodes to Ganeti 3 T311687 |
[production] |
09:20 |
<dcausse> |
restarting blazegraph on wdqs1014 (BlazegraphFreeAllocatorsDecreasingRapidly) |
[production] |
09:15 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] DONE helmfile.d/services/mwdebug: apply |
[production] |
09:11 |
<mwdebug-deploy@deploy1002> |
helmfile [codfw] START helmfile.d/services/mwdebug: apply |
[production] |
09:11 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply |
[production] |
09:10 |
<mwdebug-deploy@deploy1002> |
helmfile [eqiad] START helmfile.d/services/mwdebug: apply |
[production] |
09:09 |
<hoo@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for arwiki (duration: 03m 49s) |
[production] |
09:06 |
<moritzm> |
reimport ganeti 3.0.1-1~bpo10+1 to component/ganeti3 (got removed alongside via a reprepro bug/misfeature when the bullseye component was removed) |
[production] |
07:54 |
<elukey> |
restart kafka on kafka-logging1003 to pick up new PKI TLS settings |
[production] |
07:50 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging1003.eqiad.wmnet with reason: Kafka PKI upgrade |
[production] |
07:49 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging1003.eqiad.wmnet with reason: Kafka PKI upgrade |
[production] |
06:55 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'es2030 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35360 and previous config saved to /var/cache/conftool/dbconfig/20221005-065519-root.json |
[production] |