1101-1150 of 10000 results (28ms)
2021-04-26 §
09:45 <dcaro> draining cloudvirt2001-dev with the new cookbooks (T280641) [admin]
09:42 <moritzm> installing clamav security updates on otrs1001 [production]
09:38 <godog> reboot ms-be1062, kernel backtrace saved [production]
09:26 <filippo@cumin1001> conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad [production]
09:26 <jmm@cumin2001> START - Cookbook sre.ganeti.makevm for new host ldap-replica2006.wikimedia.org [production]
09:24 <jmm@cumin2001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica2005.wikimedia.org [production]
09:15 <jayme@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication [production]
09:15 <jayme@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication [production]
09:13 <jayme> imported etcd-mirror_0.0.6-2 to buster-wikimedia [production]
09:10 <jmm@cumin2001> START - Cookbook sre.ganeti.makevm for new host ldap-replica2005.wikimedia.org [production]
09:07 <jmm@cumin2001> END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ldap-replica2005failoid1002.wikimedia.org [production]
09:04 <jayme> imported etcd-mirror_0.0.6-1 to buster-wikimedia [production]
08:55 <jmm@cumin2001> START - Cookbook sre.ganeti.makevm for new host ldap-replica2005failoid1002.wikimedia.org [production]
08:49 <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: NOOP: f01a6dab70f74938dd51668809a181a8f551b6c8: GrowthExperiments: Enable community configuration on testwiki (T274520) (duration: 00m 57s) [production]
08:42 <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: NOOP: 88da8226823e59d1d19db9aeca3b5a5140c0c60c: GrowthExperiments: Do not enable community configuration outside of beta wikis (T274520) (duration: 00m 59s) [production]
08:28 <moritzm> update debmonitor to 0.2.9 on remaining hosts T281090 [production]
08:13 <moritzm> installing lxml security updates on stretch [production]
08:01 <elukey> restart hadoop-mapreduce-historyserver on an-master1001 after changes to the yarn ui user [analytics]
07:54 <jayme@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication [production]
07:54 <jayme@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication [production]
07:53 <filippo@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE [production]
07:51 <filippo@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE [production]
07:36 <elukey> re-enable timers after setting the capacity scheduler [analytics]
07:32 <godog> swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - T272836 [production]
07:31 <elukey> restart hadoop RM on an-master* to pick up capacity scheduler changes [analytics]
07:24 <moritzm> installing pear security updates [production]
07:09 <moritzm> removed rawdog from bullseye-wikimedia, needs Py2 T280989 [production]
06:44 <elukey> stop timers on an-launcher1002 again as prep step for capacity scheduler changes [analytics]
06:32 <elukey> roll restart of hadoop-yarn-nodemanagers to pick up new log4j settings - T276906 [analytics]
06:25 <elukey> re-enable timers [analytics]
06:24 <elukey> reboot an-coord1001 to pick up kernel security settings (after reimage) [production]
06:20 <elukey> reboot an-coord1001 to pick up kernel security settings [analytics]
05:57 <elukey> stop timers on an-launcher1002 to allow a reboot of an-coord1001 [analytics]
05:47 <marostegui@cumin1001> dbctl commit (dc=all): 'Add db1158 to dbctl, depooled, T258361', diff saved to https://phabricator.wikimedia.org/P15521 and previous config saved to /var/cache/conftool/dbconfig/20210426-054700-marostegui.json [production]
05:32 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1124.eqiad.wmnet with reason: REIMAGE [production]
05:30 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on db1124.eqiad.wmnet with reason: REIMAGE [production]
03:43 <kart_> Updated cxserver to 2021-04-21-044024-production (T279045) [production]
03:41 <kartik@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' . [production]
03:37 <kartik@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' . [production]
03:32 <kartik@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' . [production]
2021-04-25 §
15:23 <Amir1> sudo -u list /var/lib/mailman/bin/change_pw -l wikica-l -p $(pwgen -c1 -s 12) (T281066) [production]
2021-04-24 §
22:24 <bstorm> Rebooting labstore1007 from ilo after crash [production]
17:47 <James_F> Zuul: [mediawiki/extensions/MultimediaViewer] Drop Ruby selenium test job [releng]
16:19 <arturo> deleting 2 leaked VMs by hand: 6aefef6f-0723-499d-895f-314f4804c377 | fullstackd-20210424153344 and af8bc9bd-ea0a-4789-b8dd-cf5cf96c31cc | fullstackd-20210424074938 (puppet check step timed out) [admin-monitoring]
08:03 <joal> Rerun failed webrequest-druid-hourly-wf-2021-4-23-13 [analytics]
2021-04-23 §
22:14 <Krinkle> Reloading Zuul to deploy https://gerrit.wikimedia.org/r/682029 [releng]
21:36 <foks> removing 1 file for legal compliance [production]
21:02 <wm-bot> <root> Hard restart in an attempt to reset state information at the Toolforge front proxy [tools.simple]
20:59 <wm-bot> <root> Restarting webservice which seems to have died due to grid engine instability [tools.simple]
20:15 <mutante> [apt1001:~] $ sudo -i reprepro -C main includedeb bullseye-wikimedia /home/dzahn/rawdog_2.23-2_all.deb (T280989) [production]