2022-06-03
ยง
|
16:06 |
<herron@cumin1001> |
END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons. |
[production] |
15:50 |
<balloons> |
fix fix g3.cores4.ram8.disk20.swap24.ephem20 flavor to include swap. Convert to fix g3.cores4.ram8.disk20.swap8.ephem20 flavor T309821 |
[tools] |
15:50 |
<balloons> |
temp add 1.0G swap to sgeweblight hosts T309821 |
[tools] |
15:50 |
<balloons> |
fix fix g3.cores4.ram8.disk20.swap24.ephem20 flavor to include swap. Convert to fix g3.cores4.ram8.disk20.swap8.ephem20 flavor t309821 |
[tools] |
15:49 |
<balloons> |
temp add 1.0G swap to sgeweblight hosts t309821 |
[tools] |
14:58 |
<jhathaway@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: BDAT |
[production] |
14:58 |
<jhathaway@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: BDAT |
[production] |
14:25 |
<herron@cumin1001> |
START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons. |
[production] |
14:14 |
<inflatador> |
patching and restarting a few eqiad elastic hosts T309868 |
[production] |
14:02 |
<wm-bot> |
<root> Moved webservice from buster grid engine to kubernetes |
[tools.wikicontrib] |
13:58 |
<bd808> |
https://bitbucket.org/magnusmanske/glamtools/issues/91/fyi-once-flag-added-to-crontab-entries-for |
[tools.glamtools] |
13:43 |
<wm-bot> |
<root> Added -once flag to crontab entries for "next" and "just_added" tasks |
[tools.glamtools] |
13:25 |
<bd808> |
Upgrading fleet to tools-webservice 0.86 (T309821) |
[tools] |
13:20 |
<bd808> |
publish tools-webservice 0.86 (T309821) |
[tools] |
13:17 |
<bd808> |
publish tools-webservice 0.86 (T309821) |
[toolsbeta] |
12:46 |
<taavi> |
start webservicemonitor on tools-sgecron-01 T309821 |
[tools] |
12:37 |
<wm-bot> |
<jeanfred> Deploy 9e85ede (T309861) |
[tools.integraality] |
12:07 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1141 (T298560)', diff saved to https://phabricator.wikimedia.org/P29370 and previous config saved to /var/cache/conftool/dbconfig/20220603-120758-ladsgroup.json |
[production] |
12:07 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance |
[production] |
12:07 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance |
[production] |
12:07 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298560)', diff saved to https://phabricator.wikimedia.org/P29369 and previous config saved to /var/cache/conftool/dbconfig/20220603-120750-ladsgroup.json |
[production] |
11:52 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P29368 and previous config saved to /var/cache/conftool/dbconfig/20220603-115244-ladsgroup.json |
[production] |
11:37 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P29367 and previous config saved to /var/cache/conftool/dbconfig/20220603-113739-ladsgroup.json |
[production] |
11:24 |
<taavi> |
restart all pods, fully unresponsible |
[tools.fourohfour] |
11:22 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298560)', diff saved to https://phabricator.wikimedia.org/P29366 and previous config saved to /var/cache/conftool/dbconfig/20220603-112234-ladsgroup.json |
[production] |
10:36 |
<taavi> |
draining each sgeweblight node one by one, and removing the jobs stuck in 'deleting' too |
[tools] |
09:28 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp-test1001.wikimedia.org |
[production] |
09:28 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
09:24 |
<jmm@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
09:21 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.decommission for hosts idp-test1001.wikimedia.org |
[production] |
09:20 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp-test2001.wikimedia.org |
[production] |
09:20 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
09:15 |
<jmm@cumin2002> |
START - Cookbook sre.dns.netbox |
[production] |
09:11 |
<jmm@cumin2002> |
START - Cookbook sre.hosts.decommission for hosts idp-test2001.wikimedia.org |
[production] |
09:00 |
<cmooney@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
08:58 |
<jnuche@deploy1002> |
install-world aborted: (duration: 00m 03s) |
[production] |
08:56 |
<cmooney@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
08:56 |
<cmooney@cumin1001> |
END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) |
[production] |
07:33 |
<jayme@deploy1002> |
Finished deploy [restbase/deploy@6e39559] (dev-cluster): (no justification provided) (duration: 12m 38s) |
[production] |
07:20 |
<jayme@deploy1002> |
Started deploy [restbase/deploy@6e39559] (dev-cluster): (no justification provided) |
[production] |
07:16 |
<jayme> |
imported scap 4.8.2 to stretch-/buster-/bullseye-wikimedia - T309116 |
[production] |
05:25 |
<wm-bot2> |
rebooted buster weblight grid workers - cookbook ran by taavi@runko |
[toolsbeta] |
05:20 |
<wm-bot2> |
rebooting buster weblight grid workers - cookbook ran by taavi@runko |
[toolsbeta] |
05:20 |
<wm-bot2> |
rebooting stretch weblight grid workers - cookbook ran by taavi@runko |
[toolsbeta] |
05:19 |
<marostegui> |
Stop mysql on db1128 for on-site maintenance T309291 |
[production] |
05:05 |
<taavi> |
removing duplicate (there should be only one per tool) web service jobs from the grid T309821 |
[tools] |
04:52 |
<taavi> |
revert bd808's changes to profile::toolforge::active_proxy_host |
[tools] |
03:21 |
<bd808> |
Cleared queue error states after deploying new toolforge-webservice package (T309821) |
[tools] |
03:10 |
<bd808> |
publish tools-webservice 0.85 with hack for T309821 |
[tools] |
02:44 |
<ejegg> |
re-enabled fundraising scheduled jobs |
[production] |