7651-7700 of 10000 results (46ms)
2020-12-14 §
09:45 <aborrero@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on cloudvirt1024.eqiad.wmnet with reason: T269419 [production]
09:45 <aborrero@cumin1001> START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on cloudvirt1024.eqiad.wmnet with reason: T269419 [production]
08:40 <godog> swift eqiad-prod: add weight to ms-be106[0-3] - T268435 [production]
08:34 <hashar> deployment-prep restart puppetdb process on deployment-puppetdb03 # T248041 [releng]
07:55 <elukey> roll restart yarn daemons to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/649126 [analytics]
2020-12-13 §
09:11 <_dcaro> running backup purge script on cloudvirt1024 (T269419) [admin]
00:49 <wm-bot> <lucaswerkmeister> deployed bb0cbfc6cb (language code in parentheses) [tools.lexeme-forms]
2020-12-12 §
18:57 <wm-bot> <lucaswerkmeister> deployed 0ec650ea2f (autonyms on index page) [tools.lexeme-forms]
2020-12-11 §
23:31 <bstorm> increasing the output throttle for toolsbeta-test-k8s-haproxy-* nodes in order to figure out what's up with the timeouts [toolsbeta]
22:05 <dduvall@deploy1001> helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . [production]
22:02 <dduvall@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' . [production]
21:59 <dduvall@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' . [production]
21:57 <akosiaris> add docker-ce_18.06.3~ce~3-0~debian_amd64.deb to apt.wikimedia.org stretch-wikimedia/thirdparty/k8s [production]
21:52 <marxarelli> rolling back blubberoid:2020-12-11-212149-production [releng]
21:46 <Amir1> Running schema changes on wikitech database for T269348 [production]
21:45 <akosiaris@deploy1001> helmfile [staging-codfw] START helmfile.d/admin 'sync'. [production]
21:42 <akosiaris@deploy1001> helmfile [staging-codfw] START helmfile.d/admin 'sync'. [production]
21:41 <dduvall@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' . [production]
21:38 <dduvall@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' . [production]
21:35 <akosiaris@deploy1001> helmfile [staging-codfw] START helmfile.d/admin 'sync'. [production]
21:33 <dduvall@deploy1001> helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . [production]
21:32 <marxarelli> deploying blubberoid:2020-12-11-212149-production (refs https://gerrit.wikimedia.org/r/c/blubber/+/647120 and T263597) [releng]
20:27 <razzi@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [production]
20:11 <otto@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Un-migrtate Growth EventLogging schema HomepageVisit back to EventLogging-backend on all wikis (this is a server side event which is not yet ready to migrate) - T267333 (duration: 00m 58s) [production]
19:30 <ottomata> now ingesting Growth EventLogging schemas using event platform refine job; they are exclude-listed from eventlogging-processor. - T267333 [analytics]
19:28 <razzi@cumin1001> START - Cookbook sre.ganeti.makevm [production]
19:18 <razzi@cumin1001> END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) [production]
18:47 <razzi@cumin1001> END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [production]
18:30 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.upgrade-bigtop-distro (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 [production]
18:29 <bstorm> certificatesigningrequest.certificates.k8s.io "tool-production-error-tasks-metrics" deleted to stop maintain-kubeusers issues [tools]
18:19 <elukey@cumin1001> START - Cookbook sre.hadoop.upgrade-bigtop-distro for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 [production]
18:19 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001 [production]
18:13 <elukey@cumin1001> START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001 [production]
18:13 <mutante> doc1001 restarted apache2 just in case after DOC_PATH change [production]
17:53 <razzi@cumin1001> START - Cookbook sre.hosts.decommission [production]
17:52 <razzi@cumin1001> START - Cookbook sre.ganeti.makevm [production]
17:48 <elukey@cumin1001> END (FAIL) - Cookbook sre.hadoop.upgrade-bigtop-distro (exit_code=99) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 [production]
17:41 <elukey@cumin1001> START - Cookbook sre.hadoop.upgrade-bigtop-distro for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 [production]
16:40 <elukey@cumin1001> END (FAIL) - Cookbook sre.hadoop.upgrade-bigtop-distro (exit_code=99) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 [production]
16:28 <elukey@cumin1001> START - Cookbook sre.hadoop.upgrade-bigtop-distro for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 [production]
16:15 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001 [production]
16:10 <elukey@cumin1001> START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001 [production]
15:35 <jbond@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE [production]
15:33 <jbond@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE [production]
15:20 <elukey@cumin1001> END (FAIL) - Cookbook sre.hadoop.upgrade-bigtop-distro (exit_code=99) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 [production]
15:15 <jbond@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE [production]
15:12 <jbond@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE [production]
15:10 <jayme@deploy1001> helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. [production]
15:06 <elukey@cumin1001> START - Cookbook sre.hadoop.upgrade-bigtop-distro for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001 [production]
14:59 <jayme@deploy1001> helmfile [staging-codfw] START helmfile.d/admin 'sync'. [production]