2023-03-27
§
|
06:36 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1120 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45944 and previous config saved to /var/cache/conftool/dbconfig/20230327-063642-root.json |
[production] |
05:40 |
<kart_> |
Updated cxserver to 2023-03-17-133444-production (T332379 + build changes) |
[production] |
05:38 |
<kartik@deploy2002> |
helmfile [codfw] DONE helmfile.d/services/cxserver: apply |
[production] |
05:37 |
<kartik@deploy2002> |
helmfile [codfw] START helmfile.d/services/cxserver: apply |
[production] |
05:28 |
<kartik@deploy2002> |
helmfile [eqiad] DONE helmfile.d/services/cxserver: apply |
[production] |
05:28 |
<kartik@deploy2002> |
helmfile [eqiad] START helmfile.d/services/cxserver: apply |
[production] |
05:24 |
<kartik@deploy2002> |
helmfile [staging] DONE helmfile.d/services/cxserver: apply |
[production] |
05:23 |
<kartik@deploy2002> |
helmfile [staging] START helmfile.d/services/cxserver: apply |
[production] |
05:19 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1120 T332292', diff saved to https://phabricator.wikimedia.org/P45942 and previous config saved to /var/cache/conftool/dbconfig/20230327-051941-root.json |
[production] |
05:14 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2132,2160].codfw.wmnet,db[1101,1117,1164].eqiad.wmnet with reason: m1 master switch T331510 |
[production] |
05:14 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on db[2132,2160].codfw.wmnet,db[1101,1117,1164].eqiad.wmnet with reason: m1 master switch T331510 |
[production] |
2023-03-25
§
|
07:54 |
<hashar@deploy2002> |
Finished deploy [integration/docroot@ab848e3]: build: Updating eslint-config-wikimedia to 0.24.0 (duration: 00m 08s) |
[production] |
07:54 |
<hashar@deploy2002> |
Started deploy [integration/docroot@ab848e3]: build: Updating eslint-config-wikimedia to 0.24.0 |
[production] |
00:59 |
<dzahn@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on doc1002.eqiad.wmnet with reason: WIP-known-to-be-debugged-new-host |
[production] |
00:58 |
<dzahn@cumin1001> |
START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on doc1002.eqiad.wmnet with reason: WIP-known-to-be-debugged-new-host |
[production] |
00:57 |
<mutante> |
doc1002 - issue is mismatched UIDs again, most likely. doc-uploader is debmonitor on new host |
[production] |
00:56 |
<mutante> |
doc1002 - manually running rsync to doc2002 - which failed with status 23 when started by timer |
[production] |
00:09 |
<tzatziki> |
removing 2 files for legal compliance |
[production] |
2023-03-24
§
|
23:58 |
<denisse@cumin1001> |
END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "doc2002 - denisse@cumin1001 - T332819" |
[production] |
23:57 |
<denisse@cumin1001> |
START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "doc2002 - denisse@cumin1001 - T332819" |
[production] |
23:50 |
<tzatziki> |
removing 1 file for legal compliance |
[production] |
21:08 |
<mutante> |
mwmaint1002 ferm rules for rsyncd_access from miscweb removed by puppet after I4fe17f397856361 which reverted a8af0339bde14018e8. manually deleted rsyncd config and stopped rsync service. complete noop on mwmaint2002 which is currently the active mwmaint server. T328907 |
[production] |
18:50 |
<ebernhardson@deploy2002> |
Finished deploy [airflow-dags/search@fc69bf4]: Make mw rev recommendation create start_date configurable (duration: 00m 13s) |
[production] |
18:50 |
<ebernhardson@deploy2002> |
Started deploy [airflow-dags/search@fc69bf4]: Make mw rev recommendation create start_date configurable |
[production] |
18:30 |
<ebernhardson@deploy2002> |
Finished deploy [airflow-dags/search@220221d]: set start dates from transfer_to_es dags (duration: 00m 16s) |
[production] |
18:30 |
<ebernhardson@deploy2002> |
Started deploy [airflow-dags/search@220221d]: set start dates from transfer_to_es dags |
[production] |
18:00 |
<ebernhardson@deploy2002> |
Finished deploy [airflow-dags/search@e3c41fb]: bump discolytics to 0.10.0, and add transfer_to_es dag (duration: 00m 20s) |
[production] |
18:00 |
<ebernhardson@deploy2002> |
Started deploy [airflow-dags/search@e3c41fb]: bump discolytics to 0.10.0, and add transfer_to_es dag |
[production] |
17:55 |
<ebernhardson@deploy2002> |
Finished deploy [airflow-dags/search@822dfed]: dump discolytics to 0.10.0, and add transfer_to_es dag (duration: 00m 06s) |
[production] |
17:55 |
<ebernhardson@deploy2002> |
Started deploy [airflow-dags/search@822dfed]: dump discolytics to 0.10.0, and add transfer_to_es dag |
[production] |
15:39 |
<elukey@deploy2002> |
helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . |
[production] |
15:39 |
<elukey@deploy2002> |
helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . |
[production] |
15:37 |
<elukey@deploy2002> |
helmfile [eqiad] DONE helmfile.d/services/changeprop: sync |
[production] |
15:36 |
<elukey@deploy2002> |
helmfile [eqiad] START helmfile.d/services/changeprop: sync |
[production] |
15:35 |
<elukey@deploy2002> |
helmfile [codfw] DONE helmfile.d/services/changeprop: sync |
[production] |
15:35 |
<elukey@deploy2002> |
helmfile [codfw] START helmfile.d/services/changeprop: sync |
[production] |
15:09 |
<elukey@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . |
[production] |
14:59 |
<elukey@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . |
[production] |
14:24 |
<zabe> |
zabe@mwmaint2002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki wikimaniawiki "2024:Expressions of Interest" "Wikimania:Expressions of Interest" "Zabe" --reason "per request [[:phab:T332917|T332917]]" # T332917 |
[production] |
11:45 |
<mvernon@cumin2002> |
END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ms-be2067.codfw.wmnet |
[production] |
11:44 |
<mvernon@cumin2002> |
START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-be2067.codfw.wmnet |
[production] |
11:01 |
<elukey@deploy2002> |
helmfile [staging] DONE helmfile.d/services/changeprop: sync |
[production] |
11:01 |
<elukey@deploy2002> |
helmfile [staging] START helmfile.d/services/changeprop: sync |
[production] |
10:55 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on krb2002.codfw.wmnet with reason: Non-functional, WIP for Bullseye update |
[production] |
10:55 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on krb2002.codfw.wmnet with reason: Non-functional, WIP for Bullseye update |
[production] |
10:35 |
<elukey@deploy2002> |
helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . |
[production] |
10:00 |
<marostegui> |
Upgrade db1204 to mariadb 10.6 T330861 |
[production] |
08:57 |
<hashar> |
Fixed up Gerrit > GitHub replication which broke at 5:00 UTC by updating the Github RSA ssh host key T332972 |
[production] |
05:37 |
<hashar> |
gerrit: refreshed ssh host key for `github.com` |
[production] |
05:28 |
<hashar> |
Restarted Gerrit |
[production] |