7201-7250 of 10000 results (52ms)
2021-04-29 ยง
16:22 <ryankemper> T281498 `ryankemper@wdqs2004:~$ sudo depool` [production]
16:20 <ryankemper> T281498 `ryankemper@wdqs2004:~$ sudo run-puppet-agent` [production]
16:18 <otto@deploy1002> Finished deploy [analytics/refinery@b3c5820] (hadoop-test): update event_sanitized_main allowlst on an-launcher1002 - T273789 (duration: 02m 39s) [production]
16:16 <arturo> add 1MB 10MB test files [tools.network-tests]
16:15 <otto@deploy1002> Started deploy [analytics/refinery@b3c5820] (hadoop-test): update event_sanitized_main allowlst on an-launcher1002 - T273789 [production]
16:12 <papaul> powerdown thanos-fe2001 for memory swap [production]
15:55 <razzi> restart hadoop-yarn-nodemanager and hadoop-hdfs-datanode on an-worker1100 for hadoop to recognize new disk /dev/sdl [analytics]
15:44 <ryankemper> T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` (trying reimaging this host one final time, if this fails again will need to do a deeper investigation into what's going wrong here) [production]
15:43 <ryankemper> [WDQS] `wdqs2001` is high on update lag but otherwise functioning; will repool when lag is caught up [production]
15:38 <ottomata> enabling event_sanitized_main jobs - T273789 [analytics]
15:37 <ryankemper> [WDQS] `sudo systemctl restart wdqs-blazegraph` && `sudo systemctl restart wdqs-updater` on `wdqs2001` [production]
15:35 <ryankemper> [WDQS] ^ scratch that, depooled `wdqs2001` [production]
15:34 <ryankemper> [WDQS] pooled `wdqs2001` [production]
15:11 <dcaro> hard rebooting cloudmetrics1002, got hung again (T275605) [admin]
14:57 <elukey> run mysql_upgrade on an-coord1001 to complete the buster upgrade - T278424 [analytics]
14:44 <hnowlan> restored all eventlogging jobs to eventlog1003 [analytics]
14:35 <hnowlan@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog[1002-1003].eqiad.wmnet with reason: eventlog1003 migration [production]
14:35 <hnowlan@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog[1002-1003].eqiad.wmnet with reason: eventlog1003 migration [production]
14:21 <hnowlan> bump eventlog1003 CPUs to 6 [analytics]
13:53 <joal> Rerun failed pageview-hourly-wf-2021-4-29-11 and pageview-hourly-wf-2021-4-29-12 [analytics]
13:44 <moritzm> installing Java security updates on stat* hosts [production]
13:43 <hnowlan@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1003.eqiad.wmnet with reason: eventlog1003 migration [production]
13:43 <hnowlan@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1003.eqiad.wmnet with reason: eventlog1003 migration [production]
13:42 <hnowlan@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1002.eqiad.wmnet with reason: eventlog1003 migration [production]
13:42 <hnowlan@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1002.eqiad.wmnet with reason: eventlog1003 migration [production]
13:40 <otto@deploy1002> Finished deploy [analytics/refinery@b3c5820]: update event_sanitized_main allowlst on an-launcher1002 - T273789 (duration: 02m 59s) [production]
13:37 <otto@deploy1002> Started deploy [analytics/refinery@b3c5820]: update event_sanitized_main allowlst on an-launcher1002 - T273789 [production]
13:26 <arturo> project creation per T281277 [puppet-dev]
13:26 <arturo> project creation per T281140 [image-suggestion-api]
13:11 <moritzm> installing postgresql-11 security updates [production]
13:09 <joal> Rerun failed pageview-hourly-wf-2021-4-29-11 [analytics]
13:08 <jbond42> merge netbase change to manage /etc/services [production]
13:07 <liw@deploy1002> Synchronized php: group1 wikis to 1.37.0-wmf.3 (duration: 01m 07s) [production]
13:06 <liw@deploy1002> rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3 [production]
12:36 <Amir1> upgrading Quiddity to admin in mailman3 [production]
12:36 <hnowlan@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1002.eqiad.wmnet with reason: Testing migration of processors to eventlog1003 [production]
12:36 <hnowlan@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1002.eqiad.wmnet with reason: Testing migration of processors to eventlog1003 [production]
12:35 <hnowlan> restarting 2 processors on eventlog1002 [analytics]
12:26 <moritzm> installing grub2 updates from buster point release [production]
12:19 <Majavah> dropping jade_diff_judgement, jade_diff_label, jade_revision_judgement, jade_revision_label tables on all-labs.dblist T281418 [releng]
12:06 <jbond42> update debmonitor.discover.wmnet ssl cert [production]
12:02 <hnowlan> stopping processors on eventlog1002 to migrate to eventlog1003 [analytics]
11:59 <ladsgroup@deploy1002> Synchronized wmf-config/extension-list: Config: [[gerrit:683454|Undeploy JADE from production, Part III (T281418)]] (duration: 01m 07s) [production]
11:54 <ladsgroup@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:683453|Undeploy JADE from production, Part II (T281418)]], Part I (duration: 01m 06s) [production]
11:50 <elukey> manual stop of one of the eventlog processors on eventlog1002 to see if 1003 takes it over [analytics]
11:49 <ladsgroup@deploy1002> Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:683452|Undeploy JADE from production, Part I (T281418)]] (duration: 01m 07s) [production]
11:45 <jmm@cumin2001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet [production]
11:40 <jmm@cumin2001> START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet [production]
11:38 <mbsantos@deploy1002> Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:683548|Enable suggested values in TemplateData and VisualEditor CommonSettings (T273857)]] (duration: 01m 07s) [production]
11:34 <ladsgroup@deploy1002> Synchronized php-1.37.0-wmf.1/extensions/ContentTranslation/specials/SpecialContentTranslation.php: Backport: [[gerrit:683534|Another fix for token cookie handling (T281346)]] (duration: 01m 07s) [production]