2024-04-25
§
|
07:48 |
<hashar@deploy1002> |
hashar: Backport for [[gerrit:1023441|logging: do not explicitly set blackhole handler (T228838)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) |
[production] |
07:47 |
<arnaudb@cumin1002> |
START - Cookbook sre.hosts.reimage for host db1241.eqiad.wmnet with OS bookworm |
[production] |
07:45 |
<hashar@deploy1002> |
Started scap: Backport for [[gerrit:1023441|logging: do not explicitly set blackhole handler (T228838)]] |
[production] |
07:45 |
<arnaudb@cumin1002> |
dbctl commit (dc=all): 'Depool db1241', diff saved to https://phabricator.wikimedia.org/P61195 and previous config saved to /var/cache/conftool/dbconfig/20240425-074516-arnaudb.json |
[production] |
07:44 |
<arnaudb@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1241.eqiad.wmnet with reason: T362746 |
[production] |
07:44 |
<arnaudb@cumin1002> |
START - Cookbook sre.hosts.downtime for 2:00:00 on db1241.eqiad.wmnet with reason: T362746 |
[production] |
07:38 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe |
[production] |
07:33 |
<jmm@cumin2002> |
START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe |
[production] |
07:15 |
<jelto@cumin1002> |
END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version |
[production] |
07:08 |
<jelto@cumin1002> |
START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version |
[production] |
06:58 |
<moritzm> |
installing glibc security updates |
[production] |
06:34 |
<moritzm> |
uninstalling redis on netbox hosts, it uses the central Redis servers for a while now |
[production] |
05:54 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'Depooling db2170 (T352010)', diff saved to https://phabricator.wikimedia.org/P61194 and previous config saved to /var/cache/conftool/dbconfig/20240425-055431-ladsgroup.json |
[production] |
05:54 |
<ladsgroup@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance |
[production] |
05:54 |
<ladsgroup@cumin1002> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance |
[production] |
05:54 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2153 (T352010)', diff saved to https://phabricator.wikimedia.org/P61193 and previous config saved to /var/cache/conftool/dbconfig/20240425-055408-ladsgroup.json |
[production] |
05:39 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P61192 and previous config saved to /var/cache/conftool/dbconfig/20240425-053901-ladsgroup.json |
[production] |
05:36 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'Depooling db2161 (T352010)', diff saved to https://phabricator.wikimedia.org/P61191 and previous config saved to /var/cache/conftool/dbconfig/20240425-053608-ladsgroup.json |
[production] |
05:36 |
<ladsgroup@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance |
[production] |
05:35 |
<ladsgroup@cumin1002> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance |
[production] |
05:35 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2154 (T352010)', diff saved to https://phabricator.wikimedia.org/P61190 and previous config saved to /var/cache/conftool/dbconfig/20240425-053545-ladsgroup.json |
[production] |
05:23 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P61189 and previous config saved to /var/cache/conftool/dbconfig/20240425-052354-ladsgroup.json |
[production] |
05:20 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P61188 and previous config saved to /var/cache/conftool/dbconfig/20240425-052038-ladsgroup.json |
[production] |
05:08 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2153 (T352010)', diff saved to https://phabricator.wikimedia.org/P61187 and previous config saved to /var/cache/conftool/dbconfig/20240425-050845-ladsgroup.json |
[production] |
05:05 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P61186 and previous config saved to /var/cache/conftool/dbconfig/20240425-050531-ladsgroup.json |
[production] |
04:50 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2154 (T352010)', diff saved to https://phabricator.wikimedia.org/P61185 and previous config saved to /var/cache/conftool/dbconfig/20240425-045023-ladsgroup.json |
[production] |
2024-04-24
§
|
21:52 |
<dzahn@cumin2002> |
END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: security release T363349 |
[production] |
21:36 |
<ryankemper> |
[Elastic] T361268 Pooled new hosts: `elastic110[3-7]` |
[production] |
21:35 |
<ryankemper@puppetmaster1001> |
conftool action : set/weight=10:pooled=yes; selector: name=elastic110[3-7]\.eqiad\.wmnet |
[production] |
20:38 |
<denisse> |
Disabling Puppet on the Prometheus PoP hosts as part of the cergen to CFSSL migration - T360414 |
[production] |
20:38 |
<denisse@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on prometheus6002.drmrs.wmnet,prometheus5002.eqsin.wmnet,prometheus3003.esams.wmnet,prometheus4002.ulsfo.wmnet with reason: Downtiming the Prometheus PoP hosts as part of the cergen to CFSSL migration - T360414 |
[production] |
20:37 |
<denisse@cumin2002> |
START - Cookbook sre.hosts.downtime for 0:30:00 on prometheus6002.drmrs.wmnet,prometheus5002.eqsin.wmnet,prometheus3003.esams.wmnet,prometheus4002.ulsfo.wmnet with reason: Downtiming the Prometheus PoP hosts as part of the cergen to CFSSL migration - T360414 |
[production] |
20:37 |
<denisse> |
Downtiming the Prometheus PoP hosts as part of the cergen to CFSSL migration - T360414 |
[production] |
20:24 |
<ebernhardson@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
20:24 |
<ebernhardson@deploy1002> |
helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
19:32 |
<dzahn@cumin2002> |
START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release T363349 |
[production] |
19:27 |
<cstone> |
payments-wiki upgraded from 1895e43b to c7ab847d |
[production] |
19:15 |
<bking@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
19:15 |
<bking@deploy1002> |
helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
19:14 |
<eevans@cumin1002> |
END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Apply truststore changes — T352647 - eevans@cumin1002 |
[production] |
19:08 |
<inflatador> |
bking@deploy1002 stop `consumer-cloudelastic` release to test alerting T359213 |
[production] |
19:07 |
<bking@deploy1002> |
helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
19:06 |
<bking@deploy1002> |
helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply |
[production] |
19:03 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'Depooling db2153 (T352010)', diff saved to https://phabricator.wikimedia.org/P61181 and previous config saved to /var/cache/conftool/dbconfig/20240424-190237-ladsgroup.json |
[production] |
19:03 |
<ladsgroup@cumin1002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance |
[production] |
19:02 |
<ladsgroup@cumin1002> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance |
[production] |
19:02 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2146 (T352010)', diff saved to https://phabricator.wikimedia.org/P61180 and previous config saved to /var/cache/conftool/dbconfig/20240424-190214-ladsgroup.json |
[production] |
18:57 |
<amastilovic@deploy1002> |
Finished deploy [airflow-dags/analytics@3f994d5]: (no justification provided) (duration: 00m 28s) |
[production] |
18:57 |
<amastilovic@deploy1002> |
Started deploy [airflow-dags/analytics@3f994d5]: (no justification provided) |
[production] |
18:47 |
<ladsgroup@cumin1002> |
dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P61179 and previous config saved to /var/cache/conftool/dbconfig/20240424-184707-ladsgroup.json |
[production] |