2020-02-07
ยง
|
22:20 |
<jeh> |
ceph: round 2 OSD failover and recovery testing on cloudcephosd1003.wikimedia.org T240718 |
[production] |
20:47 |
<mutante> |
OS install on new install_server VMs worked on second attempt, issues are gone. signed puppet certs for install1003.eqiad.wmnet, install2003.codfw.wmnet, initial puppet runs (T224576) |
[production] |
20:42 |
<jeh> |
ceph: OSD failover and recovery testing on cloudcephosd1003.wikimedia.org T240718 |
[production] |
20:32 |
<mutante> |
ganeti: attempting to reinstall install1003 which failed last time |
[production] |
17:38 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool es1019 after on-site maintenance T243963', diff saved to https://phabricator.wikimedia.org/P10350 and previous config saved to /var/cache/conftool/dbconfig/20200207-173850-marostegui.json |
[production] |
17:36 |
<twentyafterfour@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: sync InitializeSettings again for lols refs T233866 (duration: 01m 03s) |
[production] |
17:32 |
<twentyafterfour@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: sync https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/570929 refs T233866 (duration: 01m 02s) |
[production] |
17:25 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool es1019 after on-site maintenance T243963', diff saved to https://phabricator.wikimedia.org/P10349 and previous config saved to /var/cache/conftool/dbconfig/20200207-172541-marostegui.json |
[production] |
17:22 |
<twentyafterfour@deploy1001> |
rebuilt and synchronized wikiversions files: roll back all wikis to 1.35.0-wmf.16 refs T233866 |
[production] |
17:19 |
<marostegui> |
Start MySQL on es1019 after onsite maintenance T243963 |
[production] |
16:46 |
<filippo@cumin1001> |
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) |
[production] |
16:38 |
<filippo@cumin1001> |
START - Cookbook sre.ganeti.makevm |
[production] |
16:13 |
<XioNoX> |
remove MSS clamping from eqiad/eqord/knams/esams |
[production] |
16:05 |
<andrew@deploy1001> |
Finished deploy [horizon/deploy@bc777d6]: Fix for T243422 (duration: 03m 45s) |
[production] |
16:04 |
<vgutierrez> |
pooling cp4030 with buster - T242093 |
[production] |
16:03 |
<bblack> |
removing GRE MTU mitigations from cp[135]xxx - T232602 |
[production] |
16:01 |
<andrew@deploy1001> |
Started deploy [horizon/deploy@bc777d6]: Fix for T243422 |
[production] |
15:50 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
15:48 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
15:25 |
<vgutierrez> |
depool & reimage cp4030 as buster - T242093 |
[production] |
15:21 |
<vgutierrez> |
pooling cp4031 with buster - T242093 |
[production] |
15:20 |
<vgutierrez> |
pooling ncredir3001 running buster - T243391 |
[production] |
15:18 |
<marostegui> |
Restart all instances on db1124 and db1125 to pick up a new replication filter - T240094 |
[production] |
15:11 |
<marostegui> |
Restart all instances on db2094 and db2095 to pick up a new replication filter - T240094 |
[production] |
14:56 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
14:53 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
14:43 |
<hoo@deploy1001> |
Synchronized wmf-config/Wikibase.php: REVERT: Wikibase Client: Fix setting name typo (T244529) (duration: 01m 40s) |
[production] |
14:43 |
<Amir1> |
ladsgroup@mwmaint1002:~$ mwscript createAndPromote.php --wiki=zhwiki --force "Amir Sarabadani (WMDE)" --sysop (T244578) |
[production] |
14:40 |
<hoo@deploy1001> |
Scap failed!: 9/11 canaries failed their endpoint checks(http://en.wikipedia.org) |
[production] |
14:38 |
<hoo@deploy1001> |
Synchronized wmf-config/Wikibase.php: Wikibase Client: Fix setting name typo (T244529) (duration: 01m 20s) |
[production] |
14:33 |
<vgutierrez> |
depool and reimage ncredir3001 as buster - T243391 |
[production] |
14:32 |
<vgutierrez> |
depool & reimage cp4031 as buster - T242093 |
[production] |
14:23 |
<vgutierrez> |
pooling ncredir3002 running buster - T243391 |
[production] |
13:26 |
<vgutierrez> |
pooling cp4021 with buster - T242093 |
[production] |
13:05 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
13:03 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
12:51 |
<vgutierrez> |
depool and reimage ncredir3002 as buster - T243391 |
[production] |
12:42 |
<vgutierrez> |
depool & reimage cp4021 as buster - T242093 |
[production] |
12:08 |
<akosiaris@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
12:08 |
<akosiaris@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
11:58 |
<akosiaris@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
11:57 |
<akosiaris@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
11:25 |
<vgutierrez> |
pooling ncredir5001 running buster - T243391 |
[production] |
11:24 |
<vgutierrez> |
pooling cp4022 with buster - T242093 |
[production] |
11:09 |
<akosiaris> |
undo wikifeeds experiments |
[production] |
11:07 |
<akosiaris@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' . |
[production] |
10:42 |
<akosiaris@deploy1001> |
helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' . |
[production] |
10:40 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
10:37 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
10:36 |
<akosiaris> |
conduct experiments with stopping/starting uwsgi-ores on ores2001 T242705 |
[production] |