8351-8400 of 10000 results (31ms)
2021-04-29 ยง
18:23 <bstorm> removing one more etcd node via cookbook T279723 [tools]
18:19 <Majavah> deploy changes for T211393 [tools.openstack-browser]
18:12 <bstorm> removing an etcd node via cookbook T279723 [tools]
18:10 <krinkle@deploy1002> Synchronized php-1.37.0-wmf.3/includes/libs/objectcache/MemcachedBagOStuff.php: I926797a9d494a31, T281480 (duration: 01m 09s) [production]
18:10 <bstorm> added and removed an etcd node [toolsbeta]
17:13 <pt1979@cumin2001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
17:10 <pt1979@cumin2001> START - Cookbook sre.dns.netbox [production]
17:01 <pt1979@cumin2001> START - Cookbook sre.dns.netbox [production]
16:29 <ryankemper> T281498 `sudo -E cumin 'C:role::lvs::balancer' 'sudo run-puppet-agent'` [production]
16:28 <liw@deploy1002> rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.1" [production]
16:27 <liw@deploy1002> sync-wikiversions aborted: Revert "group[0|1] wikis to [VERSION]" (duration: 00m 01s) [production]
16:22 <ryankemper> T281498 `ryankemper@wdqs2004:~$ sudo depool` [production]
16:20 <ryankemper> T281498 `ryankemper@wdqs2004:~$ sudo run-puppet-agent` [production]
16:18 <otto@deploy1002> Finished deploy [analytics/refinery@b3c5820] (hadoop-test): update event_sanitized_main allowlst on an-launcher1002 - T273789 (duration: 02m 39s) [production]
16:16 <arturo> add 1MB 10MB test files [tools.network-tests]
16:15 <otto@deploy1002> Started deploy [analytics/refinery@b3c5820] (hadoop-test): update event_sanitized_main allowlst on an-launcher1002 - T273789 [production]
16:12 <papaul> powerdown thanos-fe2001 for memory swap [production]
15:55 <razzi> restart hadoop-yarn-nodemanager and hadoop-hdfs-datanode on an-worker1100 for hadoop to recognize new disk /dev/sdl [analytics]
15:44 <ryankemper> T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` (trying reimaging this host one final time, if this fails again will need to do a deeper investigation into what's going wrong here) [production]
15:43 <ryankemper> [WDQS] `wdqs2001` is high on update lag but otherwise functioning; will repool when lag is caught up [production]
15:38 <ottomata> enabling event_sanitized_main jobs - T273789 [analytics]
15:37 <ryankemper> [WDQS] `sudo systemctl restart wdqs-blazegraph` && `sudo systemctl restart wdqs-updater` on `wdqs2001` [production]
15:35 <ryankemper> [WDQS] ^ scratch that, depooled `wdqs2001` [production]
15:34 <ryankemper> [WDQS] pooled `wdqs2001` [production]
15:11 <dcaro> hard rebooting cloudmetrics1002, got hung again (T275605) [admin]
14:57 <elukey> run mysql_upgrade on an-coord1001 to complete the buster upgrade - T278424 [analytics]
14:44 <hnowlan> restored all eventlogging jobs to eventlog1003 [analytics]
14:35 <hnowlan@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog[1002-1003].eqiad.wmnet with reason: eventlog1003 migration [production]
14:35 <hnowlan@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog[1002-1003].eqiad.wmnet with reason: eventlog1003 migration [production]
14:21 <hnowlan> bump eventlog1003 CPUs to 6 [analytics]
13:53 <joal> Rerun failed pageview-hourly-wf-2021-4-29-11 and pageview-hourly-wf-2021-4-29-12 [analytics]
13:44 <moritzm> installing Java security updates on stat* hosts [production]
13:43 <hnowlan@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1003.eqiad.wmnet with reason: eventlog1003 migration [production]
13:43 <hnowlan@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1003.eqiad.wmnet with reason: eventlog1003 migration [production]
13:42 <hnowlan@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1002.eqiad.wmnet with reason: eventlog1003 migration [production]
13:42 <hnowlan@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1002.eqiad.wmnet with reason: eventlog1003 migration [production]
13:40 <otto@deploy1002> Finished deploy [analytics/refinery@b3c5820]: update event_sanitized_main allowlst on an-launcher1002 - T273789 (duration: 02m 59s) [production]
13:37 <otto@deploy1002> Started deploy [analytics/refinery@b3c5820]: update event_sanitized_main allowlst on an-launcher1002 - T273789 [production]
13:26 <arturo> project creation per T281277 [puppet-dev]
13:26 <arturo> project creation per T281140 [image-suggestion-api]
13:11 <moritzm> installing postgresql-11 security updates [production]
13:09 <joal> Rerun failed pageview-hourly-wf-2021-4-29-11 [analytics]
13:08 <jbond42> merge netbase change to manage /etc/services [production]
13:07 <liw@deploy1002> Synchronized php: group1 wikis to 1.37.0-wmf.3 (duration: 01m 07s) [production]
13:06 <liw@deploy1002> rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3 [production]
12:36 <Amir1> upgrading Quiddity to admin in mailman3 [production]
12:36 <hnowlan@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1002.eqiad.wmnet with reason: Testing migration of processors to eventlog1003 [production]
12:36 <hnowlan@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1002.eqiad.wmnet with reason: Testing migration of processors to eventlog1003 [production]
12:35 <hnowlan> restarting 2 processors on eventlog1002 [analytics]
12:26 <moritzm> installing grub2 updates from buster point release [production]