2601-2650 of 10000 results (59ms)
2021-08-12 ยง
15:07 <filippo@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE [production]
15:04 <filippo@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE [production]
15:00 <btullis@cumin1001> START - Cookbook sre.hosts.decommission for hosts druid1002.eqiad.wmnet [production]
14:48 <papaul> reset to factory ps-test-d8-codfw [production]
14:35 <filippo@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE [production]
14:33 <filippo@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE [production]
14:33 <papaul> reset to factory ps2-test-d8-codfw [production]
14:25 <hnowlan> reenabling puppet on P:cassandra [production]
13:57 <hnowlan> disabling puppet on P:cassandra to test removal of cassandra-metrics-agent [production]
13:50 <effie> disable puppet on mediawiki hosts to merge 705852 [production]
13:39 <hnowlan@cumin1001> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001 [production]
13:31 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts druid1003.eqiad.wmnet [production]
13:20 <btullis@cumin1001> START - Cookbook sre.hosts.decommission for hosts druid1003.eqiad.wmnet [production]
13:03 <hnowlan@cumin1001> START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001 [production]
12:43 <godog> upgrade NIC firmware on thanos-be2* / thanos-fe2* - T286722 [production]
12:28 <hnowlan@cumin1001> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001 [production]
12:23 <filippo@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE [production]
12:18 <filippo@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE [production]
12:09 <godog> upgrade NIC firmware on thanos-be1* - T286722 [production]
12:08 <godog> upgrade NIC firmware on thanos-fe100[34] - T286722 [production]
12:04 <godog> upgrade NIC firmware on thanos-fe100[12] - T286722 [production]
11:56 <moritzm> installing openexr security updates [production]
11:47 <moritzm> installing bluez security updates on buster [production]
10:22 <jmm@cumin2002> END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Holger Knust out of all services on: 1743 hosts [production]
10:22 <jmm@cumin2002> START - Cookbook sre.idm.logout Logging Holger Knust out of all services on: 1743 hosts [production]
10:18 <marostegui@cumin1001> dbctl commit (dc=all): 'Pool db2107 into API', diff saved to https://phabricator.wikimedia.org/P17016 and previous config saved to /var/cache/conftool/dbconfig/20210812-101840-marostegui.json [production]
10:18 <mvolz@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' . [production]
10:13 <mvolz@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' . [production]
10:08 <mvolz@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' . [production]
09:49 <hnowlan@cumin1001> START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001 [production]
09:38 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
09:36 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
09:31 <lucaswerkmeister-wmde@deploy1002> Synchronized php-1.37.0-wmf.18/extensions/Wikibase/: Backport: [[gerrit:711714|Revert "Inject NamespaceInfo into EntitySourceDefinitionsConfigParser" (T288724)]] (2/2) (duration: 01m 12s) [production]
09:30 <kormat@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: Reconfiguring replication tree T284825 [production]
09:30 <kormat@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on 8 hosts with reason: Reconfiguring replication tree T284825 [production]
09:30 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
09:29 <lucaswerkmeister-wmde@deploy1002> Synchronized php-1.37.0-wmf.18/extensions/Wikibase/data-access/: Backport: [[gerrit:711714|Revert "Inject NamespaceInfo into EntitySourceDefinitionsConfigParser" (T288724)]] (1/2) (duration: 01m 08s) [production]
09:29 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
09:29 <marostegui@cumin1001> dbctl commit (dc=all): 'db2107 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P17015 and previous config saved to /var/cache/conftool/dbconfig/20210812-092909-root.json [production]
09:28 <kormat> reconfiguring replication tree for pc1 T284825 [production]
09:27 <kormat@deploy1002> Synchronized wmf-config/ProductionServices.php: Promote pc2011 to primary of pc1 T284825 (duration: 01m 10s) [production]
09:14 <marostegui@cumin1001> dbctl commit (dc=all): 'db2107 (re)pooling @ 80%: After reimage', diff saved to https://phabricator.wikimedia.org/P17014 and previous config saved to /var/cache/conftool/dbconfig/20210812-091406-root.json [production]
08:59 <marostegui@cumin1001> dbctl commit (dc=all): 'db2107 (re)pooling @ 60%: After reimage', diff saved to https://phabricator.wikimedia.org/P17013 and previous config saved to /var/cache/conftool/dbconfig/20210812-085902-root.json [production]
08:57 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
08:56 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
08:55 <dcaro@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudservices[1003-1004].wikimedia.org with reason: T288725 [production]
08:55 <dcaro@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on cloudservices[1003-1004].wikimedia.org with reason: T288725 [production]
08:53 <kormat@deploy1002> Synchronized wmf-config/ProductionServices.php: Adding new pc hosts (duration: 01m 09s) [production]
08:48 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org [production]
08:48 <jmm@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet [production]