2020-12-14
§
|
10:34 |
<godog> |
add 100G to prometheus 'global' in codfw |
[production] |
10:32 |
<akosiaris> |
Adding kubernetes codfw staging cluster configuration to cr*-codfw |
[production] |
10:17 |
<marostegui> |
Stop mysql on db2131 to clone db2142 |
[production] |
10:16 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db2131 to clone db2142', diff saved to https://phabricator.wikimedia.org/P13542 and previous config saved to /var/cache/conftool/dbconfig/20201214-101611-marostegui.json |
[production] |
10:12 |
<ladsgroup@deploy1001> |
Synchronized php-1.36.0-wmf.21/extensions/Wikibase/client/includes: [[gerrit:648283|Avoid loading the whole item in every client page view (T269960)]] (duration: 00m 25s) |
[production] |
10:03 |
<ladsgroup@deploy1001> |
Scap failed!: 4/9 canaries failed their endpoint checks(https://en.wikipedia.org) |
[production] |
09:51 |
<godog> |
swift codfw-prod: more weight to ms-be20[58-61] - T269337 |
[production] |
09:45 |
<aborrero@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on cloudvirt1024.eqiad.wmnet with reason: T269419 |
[production] |
09:45 |
<aborrero@cumin1001> |
START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on cloudvirt1024.eqiad.wmnet with reason: T269419 |
[production] |
08:40 |
<godog> |
swift eqiad-prod: add weight to ms-be106[0-3] - T268435 |
[production] |
2020-12-11
§
|
22:05 |
<dduvall@deploy1001> |
helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . |
[production] |
22:02 |
<dduvall@deploy1001> |
helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' . |
[production] |
21:59 |
<dduvall@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' . |
[production] |
21:57 |
<akosiaris> |
add docker-ce_18.06.3~ce~3-0~debian_amd64.deb to apt.wikimedia.org stretch-wikimedia/thirdparty/k8s |
[production] |
21:46 |
<Amir1> |
Running schema changes on wikitech database for T269348 |
[production] |
21:45 |
<akosiaris@deploy1001> |
helmfile [staging-codfw] START helmfile.d/admin 'sync'. |
[production] |
21:42 |
<akosiaris@deploy1001> |
helmfile [staging-codfw] START helmfile.d/admin 'sync'. |
[production] |
21:41 |
<dduvall@deploy1001> |
helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' . |
[production] |
21:38 |
<dduvall@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' . |
[production] |
21:35 |
<akosiaris@deploy1001> |
helmfile [staging-codfw] START helmfile.d/admin 'sync'. |
[production] |
21:33 |
<dduvall@deploy1001> |
helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . |
[production] |
20:27 |
<razzi@cumin1001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) |
[production] |
20:11 |
<otto@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: Un-migrtate Growth EventLogging schema HomepageVisit back to EventLogging-backend on all wikis (this is a server side event which is not yet ready to migrate) - T267333 (duration: 00m 58s) |
[production] |
19:28 |
<razzi@cumin1001> |
START - Cookbook sre.ganeti.makevm |
[production] |
19:18 |
<razzi@cumin1001> |
END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) |
[production] |
18:47 |
<razzi@cumin1001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) |
[production] |
18:30 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hadoop.upgrade-bigtop-distro (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 |
[production] |
18:19 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.upgrade-bigtop-distro for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 |
[production] |
18:19 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001 |
[production] |
18:13 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001 |
[production] |
18:13 |
<mutante> |
doc1001 restarted apache2 just in case after DOC_PATH change |
[production] |
17:53 |
<razzi@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
17:52 |
<razzi@cumin1001> |
START - Cookbook sre.ganeti.makevm |
[production] |
17:48 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hadoop.upgrade-bigtop-distro (exit_code=99) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 |
[production] |
17:41 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.upgrade-bigtop-distro for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 |
[production] |
16:40 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hadoop.upgrade-bigtop-distro (exit_code=99) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 |
[production] |
16:28 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.upgrade-bigtop-distro for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 |
[production] |
16:15 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001 |
[production] |
16:10 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001 |
[production] |
15:35 |
<jbond@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE |
[production] |
15:33 |
<jbond@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE |
[production] |
15:20 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hadoop.upgrade-bigtop-distro (exit_code=99) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 |
[production] |
15:15 |
<jbond@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE |
[production] |
15:12 |
<jbond@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE |
[production] |
15:10 |
<jayme@deploy1001> |
helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. |
[production] |
15:06 |
<elukey@cumin1001> |
START - Cookbook sre.hadoop.upgrade-bigtop-distro for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 |
[production] |
14:59 |
<jayme@deploy1001> |
helmfile [staging-codfw] START helmfile.d/admin 'sync'. |
[production] |
14:45 |
<jayme@deploy1001> |
helmfile [staging-codfw] START helmfile.d/admin 'sync'. |
[production] |
14:30 |
<jbond@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE |
[production] |
14:28 |
<jbond@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE |
[production] |