2021-04-29
ยง
|
15:55 |
<razzi> |
restart hadoop-yarn-nodemanager and hadoop-hdfs-datanode on an-worker1100 for hadoop to recognize new disk /dev/sdl |
[analytics] |
15:44 |
<ryankemper> |
T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` (trying reimaging this host one final time, if this fails again will need to do a deeper investigation into what's going wrong here) |
[production] |
15:43 |
<ryankemper> |
[WDQS] `wdqs2001` is high on update lag but otherwise functioning; will repool when lag is caught up |
[production] |
15:38 |
<ottomata> |
enabling event_sanitized_main jobs - T273789 |
[analytics] |
15:37 |
<ryankemper> |
[WDQS] `sudo systemctl restart wdqs-blazegraph` && `sudo systemctl restart wdqs-updater` on `wdqs2001` |
[production] |
15:35 |
<ryankemper> |
[WDQS] ^ scratch that, depooled `wdqs2001` |
[production] |
15:34 |
<ryankemper> |
[WDQS] pooled `wdqs2001` |
[production] |
15:11 |
<dcaro> |
hard rebooting cloudmetrics1002, got hung again (T275605) |
[admin] |
14:57 |
<elukey> |
run mysql_upgrade on an-coord1001 to complete the buster upgrade - T278424 |
[analytics] |
14:44 |
<hnowlan> |
restored all eventlogging jobs to eventlog1003 |
[analytics] |
14:35 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog[1002-1003].eqiad.wmnet with reason: eventlog1003 migration |
[production] |
14:35 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog[1002-1003].eqiad.wmnet with reason: eventlog1003 migration |
[production] |
14:21 |
<hnowlan> |
bump eventlog1003 CPUs to 6 |
[analytics] |
13:53 |
<joal> |
Rerun failed pageview-hourly-wf-2021-4-29-11 and pageview-hourly-wf-2021-4-29-12 |
[analytics] |
13:44 |
<moritzm> |
installing Java security updates on stat* hosts |
[production] |
13:43 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1003.eqiad.wmnet with reason: eventlog1003 migration |
[production] |
13:43 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1003.eqiad.wmnet with reason: eventlog1003 migration |
[production] |
13:42 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1002.eqiad.wmnet with reason: eventlog1003 migration |
[production] |
13:42 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1002.eqiad.wmnet with reason: eventlog1003 migration |
[production] |
13:40 |
<otto@deploy1002> |
Finished deploy [analytics/refinery@b3c5820]: update event_sanitized_main allowlst on an-launcher1002 - T273789 (duration: 02m 59s) |
[production] |
13:37 |
<otto@deploy1002> |
Started deploy [analytics/refinery@b3c5820]: update event_sanitized_main allowlst on an-launcher1002 - T273789 |
[production] |
13:26 |
<arturo> |
project creation per T281277 |
[puppet-dev] |
13:26 |
<arturo> |
project creation per T281140 |
[image-suggestion-api] |
13:11 |
<moritzm> |
installing postgresql-11 security updates |
[production] |
13:09 |
<joal> |
Rerun failed pageview-hourly-wf-2021-4-29-11 |
[analytics] |
13:08 |
<jbond42> |
merge netbase change to manage /etc/services |
[production] |
13:07 |
<liw@deploy1002> |
Synchronized php: group1 wikis to 1.37.0-wmf.3 (duration: 01m 07s) |
[production] |
13:06 |
<liw@deploy1002> |
rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3 |
[production] |
12:36 |
<Amir1> |
upgrading Quiddity to admin in mailman3 |
[production] |
12:36 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1002.eqiad.wmnet with reason: Testing migration of processors to eventlog1003 |
[production] |
12:36 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1002.eqiad.wmnet with reason: Testing migration of processors to eventlog1003 |
[production] |
12:35 |
<hnowlan> |
restarting 2 processors on eventlog1002 |
[analytics] |
12:26 |
<moritzm> |
installing grub2 updates from buster point release |
[production] |
12:19 |
<Majavah> |
dropping jade_diff_judgement, jade_diff_label, jade_revision_judgement, jade_revision_label tables on all-labs.dblist T281418 |
[releng] |
12:06 |
<jbond42> |
update debmonitor.discover.wmnet ssl cert |
[production] |
12:02 |
<hnowlan> |
stopping processors on eventlog1002 to migrate to eventlog1003 |
[analytics] |
11:59 |
<ladsgroup@deploy1002> |
Synchronized wmf-config/extension-list: Config: [[gerrit:683454|Undeploy JADE from production, Part III (T281418)]] (duration: 01m 07s) |
[production] |
11:54 |
<ladsgroup@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:683453|Undeploy JADE from production, Part II (T281418)]], Part I (duration: 01m 06s) |
[production] |
11:50 |
<elukey> |
manual stop of one of the eventlog processors on eventlog1002 to see if 1003 takes it over |
[analytics] |
11:49 |
<ladsgroup@deploy1002> |
Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:683452|Undeploy JADE from production, Part I (T281418)]] (duration: 01m 07s) |
[production] |
11:45 |
<jmm@cumin2001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet |
[production] |
11:40 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet |
[production] |
11:38 |
<mbsantos@deploy1002> |
Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:683548|Enable suggested values in TemplateData and VisualEditor CommonSettings (T273857)]] (duration: 01m 07s) |
[production] |
11:34 |
<ladsgroup@deploy1002> |
Synchronized php-1.37.0-wmf.1/extensions/ContentTranslation/specials/SpecialContentTranslation.php: Backport: [[gerrit:683534|Another fix for token cookie handling (T281346)]] (duration: 01m 07s) |
[production] |
11:32 |
<ladsgroup@deploy1002> |
Synchronized php-1.37.0-wmf.3/extensions/ContentTranslation/specials/SpecialContentTranslation.php: Backport: [[gerrit:683533|Another fix for token cookie handling (T281346)]] (duration: 01m 08s) |
[production] |
11:32 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15658 and previous config saved to /var/cache/conftool/dbconfig/20210429-113211-root.json |
[production] |
11:24 |
<mbsantos@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:683547|Enable suggested values in TemplateData and VisualEditor InitialiseSettings (T273857)]] (duration: 01m 07s) |
[production] |
11:17 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15657 and previous config saved to /var/cache/conftool/dbconfig/20210429-111708-root.json |
[production] |
11:02 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15656 and previous config saved to /var/cache/conftool/dbconfig/20210429-110204-root.json |
[production] |
10:59 |
<moritzm> |
updating apt on buster (SUA 198), which eases bullseye upgrades T275873 |
[production] |