2023-01-19
§
|
10:24 |
<filippo@cumin1001> |
START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on webperf2004.codfw.wmnet with reason: decom |
[production] |
10:19 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002" |
[production] |
10:17 |
<claime> |
Restarted maintenance scripts on mwmaint1002.eqiad.wmnet |
[production] |
10:17 |
<jmm@cumin2002> |
START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002" |
[production] |
10:17 |
<cgoubert@cumin1001> |
END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0) |
[production] |
10:15 |
<cgoubert@cumin1001> |
START - Cookbook sre.switchdc.mediawiki.08-start-maintenance |
[production] |
10:13 |
<cgoubert@cumin1001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint1002.eqiad.wmnet |
[production] |
10:07 |
<cgoubert@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host mwmaint1002.eqiad.wmnet |
[production] |
10:06 |
<cgoubert@cumin1001> |
END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0) |
[production] |
10:06 |
<cgoubert@cumin1001> |
START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance |
[production] |
10:05 |
<claime> |
Stopping maintenance scripts on mwmaint1002.eqiad.wmnet for reboot |
[production] |
09:55 |
<moritzm> |
installing ping3003 T273509 |
[production] |
09:27 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ldap-corp[1001,2001].wikimedia.org with reason: Decommissioning |
[production] |
09:27 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ldap-corp[1001,2001].wikimedia.org with reason: Decommissioning |
[production] |
09:24 |
<jnuche@deploy1002> |
rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.19 refs T325582 |
[production] |
09:17 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2118.codfw.wmnet with reason: Maintenance |
[production] |
09:17 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 8:00:00 on db2118.codfw.wmnet with reason: Maintenance |
[production] |
09:16 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance |
[production] |
09:16 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance |
[production] |
08:26 |
<moritzm> |
installing sudo security updates |
[production] |
07:45 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance |
[production] |
07:45 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance |
[production] |
06:37 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance |
[production] |
06:36 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance |
[production] |
06:35 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance |
[production] |
06:35 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance |
[production] |
06:11 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance |
[production] |
06:11 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance |
[production] |
06:06 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance |
[production] |
06:06 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance |
[production] |
06:04 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depool db2118 T327372', diff saved to https://phabricator.wikimedia.org/P43190 and previous config saved to /var/cache/conftool/dbconfig/20230119-060449-ladsgroup.json |
[production] |
06:03 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Promote db2121 to s7 primary T327372', diff saved to https://phabricator.wikimedia.org/P43189 and previous config saved to /var/cache/conftool/dbconfig/20230119-060316-ladsgroup.json |
[production] |
06:02 |
<Amir1> |
Starting s7 codfw failover from db2118 to db2121 - T327372 |
[production] |
05:42 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Set db2121 with weight 0 T327372', diff saved to https://phabricator.wikimedia.org/P43188 and previous config saved to /var/cache/conftool/dbconfig/20230119-054243-ladsgroup.json |
[production] |
05:42 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 T327372 |
[production] |
05:41 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 T327372 |
[production] |
2023-01-18
§
|
23:47 |
<zabe> |
run populateCulComment.php on all group0 and group1 wikis # T327290 |
[production] |
23:42 |
<cstone> |
civicrm upgraded from 164270b0 to f6093fb2 |
[production] |
22:35 |
<bking@cumin1001> |
END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G - bking@cumin1001 - T323646 |
[production] |
22:03 |
<bking@cumin1001> |
START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G - bking@cumin1001 - T323646 |
[production] |
21:50 |
<kindrobot> |
close UTC late backport window |
[production] |
21:50 |
<kindrobot@deploy1002> |
Finished scap: Backport for [[gerrit:881462|[config]: Undeploy GDI Safety Survey Wave 4 (T327296)]] (duration: 10m 45s) |
[production] |
21:41 |
<kindrobot@deploy1002> |
essexigyan and kindrobot: Backport for [[gerrit:881462|[config]: Undeploy GDI Safety Survey Wave 4 (T327296)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet |
[production] |
21:39 |
<kindrobot@deploy1002> |
Started scap: Backport for [[gerrit:881462|[config]: Undeploy GDI Safety Survey Wave 4 (T327296)]] |
[production] |
21:36 |
<kindrobot@deploy1002> |
Finished scap: Backport for [[gerrit:881451|Bump English Wikipedia event logging from 0.5 to 1% (T326892)]], [[gerrit:881431|Legacy Vector is not a responsive skin (T327256)]] (duration: 13m 01s) |
[production] |
21:25 |
<kindrobot@deploy1002> |
kindrobot and jdlrobson: Backport for [[gerrit:881451|Bump English Wikipedia event logging from 0.5 to 1% (T326892)]], [[gerrit:881431|Legacy Vector is not a responsive skin (T327256)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet |
[production] |
21:23 |
<kindrobot@deploy1002> |
Started scap: Backport for [[gerrit:881451|Bump English Wikipedia event logging from 0.5 to 1% (T326892)]], [[gerrit:881431|Legacy Vector is not a responsive skin (T327256)]] |
[production] |
21:08 |
<cwhite@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1037.eqiad.wmnet with OS bullseye |
[production] |
21:05 |
<cwhite@cumin2002> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1036.eqiad.wmnet with OS bullseye |
[production] |
21:03 |
<kindrobot> |
start UTC late backport window |
[production] |