2021-04-29
ยง
|
16:28 |
<liw@deploy1002> |
rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.1" |
[production] |
16:27 |
<liw@deploy1002> |
sync-wikiversions aborted: Revert "group[0|1] wikis to [VERSION]" (duration: 00m 01s) |
[production] |
16:22 |
<ryankemper> |
T281498 `ryankemper@wdqs2004:~$ sudo depool` |
[production] |
16:20 |
<ryankemper> |
T281498 `ryankemper@wdqs2004:~$ sudo run-puppet-agent` |
[production] |
16:18 |
<otto@deploy1002> |
Finished deploy [analytics/refinery@b3c5820] (hadoop-test): update event_sanitized_main allowlst on an-launcher1002 - T273789 (duration: 02m 39s) |
[production] |
16:16 |
<arturo> |
add 1MB 10MB test files |
[tools.network-tests] |
16:15 |
<otto@deploy1002> |
Started deploy [analytics/refinery@b3c5820] (hadoop-test): update event_sanitized_main allowlst on an-launcher1002 - T273789 |
[production] |
16:12 |
<papaul> |
powerdown thanos-fe2001 for memory swap |
[production] |
15:55 |
<razzi> |
restart hadoop-yarn-nodemanager and hadoop-hdfs-datanode on an-worker1100 for hadoop to recognize new disk /dev/sdl |
[analytics] |
15:44 |
<ryankemper> |
T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` (trying reimaging this host one final time, if this fails again will need to do a deeper investigation into what's going wrong here) |
[production] |
15:43 |
<ryankemper> |
[WDQS] `wdqs2001` is high on update lag but otherwise functioning; will repool when lag is caught up |
[production] |
15:38 |
<ottomata> |
enabling event_sanitized_main jobs - T273789 |
[analytics] |
15:37 |
<ryankemper> |
[WDQS] `sudo systemctl restart wdqs-blazegraph` && `sudo systemctl restart wdqs-updater` on `wdqs2001` |
[production] |
15:35 |
<ryankemper> |
[WDQS] ^ scratch that, depooled `wdqs2001` |
[production] |
15:34 |
<ryankemper> |
[WDQS] pooled `wdqs2001` |
[production] |
15:11 |
<dcaro> |
hard rebooting cloudmetrics1002, got hung again (T275605) |
[admin] |
14:57 |
<elukey> |
run mysql_upgrade on an-coord1001 to complete the buster upgrade - T278424 |
[analytics] |
14:44 |
<hnowlan> |
restored all eventlogging jobs to eventlog1003 |
[analytics] |
14:35 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog[1002-1003].eqiad.wmnet with reason: eventlog1003 migration |
[production] |
14:35 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog[1002-1003].eqiad.wmnet with reason: eventlog1003 migration |
[production] |
14:21 |
<hnowlan> |
bump eventlog1003 CPUs to 6 |
[analytics] |
13:53 |
<joal> |
Rerun failed pageview-hourly-wf-2021-4-29-11 and pageview-hourly-wf-2021-4-29-12 |
[analytics] |
13:44 |
<moritzm> |
installing Java security updates on stat* hosts |
[production] |
13:43 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1003.eqiad.wmnet with reason: eventlog1003 migration |
[production] |
13:43 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1003.eqiad.wmnet with reason: eventlog1003 migration |
[production] |
13:42 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1002.eqiad.wmnet with reason: eventlog1003 migration |
[production] |
13:42 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1002.eqiad.wmnet with reason: eventlog1003 migration |
[production] |
13:40 |
<otto@deploy1002> |
Finished deploy [analytics/refinery@b3c5820]: update event_sanitized_main allowlst on an-launcher1002 - T273789 (duration: 02m 59s) |
[production] |
13:37 |
<otto@deploy1002> |
Started deploy [analytics/refinery@b3c5820]: update event_sanitized_main allowlst on an-launcher1002 - T273789 |
[production] |
13:26 |
<arturo> |
project creation per T281277 |
[puppet-dev] |
13:26 |
<arturo> |
project creation per T281140 |
[image-suggestion-api] |
13:11 |
<moritzm> |
installing postgresql-11 security updates |
[production] |
13:09 |
<joal> |
Rerun failed pageview-hourly-wf-2021-4-29-11 |
[analytics] |
13:08 |
<jbond42> |
merge netbase change to manage /etc/services |
[production] |
13:07 |
<liw@deploy1002> |
Synchronized php: group1 wikis to 1.37.0-wmf.3 (duration: 01m 07s) |
[production] |
13:06 |
<liw@deploy1002> |
rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3 |
[production] |
12:36 |
<Amir1> |
upgrading Quiddity to admin in mailman3 |
[production] |
12:36 |
<hnowlan@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1002.eqiad.wmnet with reason: Testing migration of processors to eventlog1003 |
[production] |
12:36 |
<hnowlan@cumin1001> |
START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1002.eqiad.wmnet with reason: Testing migration of processors to eventlog1003 |
[production] |
12:35 |
<hnowlan> |
restarting 2 processors on eventlog1002 |
[analytics] |
12:26 |
<moritzm> |
installing grub2 updates from buster point release |
[production] |
12:19 |
<Majavah> |
dropping jade_diff_judgement, jade_diff_label, jade_revision_judgement, jade_revision_label tables on all-labs.dblist T281418 |
[releng] |
12:06 |
<jbond42> |
update debmonitor.discover.wmnet ssl cert |
[production] |
12:02 |
<hnowlan> |
stopping processors on eventlog1002 to migrate to eventlog1003 |
[analytics] |
11:59 |
<ladsgroup@deploy1002> |
Synchronized wmf-config/extension-list: Config: [[gerrit:683454|Undeploy JADE from production, Part III (T281418)]] (duration: 01m 07s) |
[production] |
11:54 |
<ladsgroup@deploy1002> |
Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:683453|Undeploy JADE from production, Part II (T281418)]], Part I (duration: 01m 06s) |
[production] |
11:50 |
<elukey> |
manual stop of one of the eventlog processors on eventlog1002 to see if 1003 takes it over |
[analytics] |
11:49 |
<ladsgroup@deploy1002> |
Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:683452|Undeploy JADE from production, Part I (T281418)]] (duration: 01m 07s) |
[production] |
11:45 |
<jmm@cumin2001> |
END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet |
[production] |
11:40 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet |
[production] |