51-100 of 10000 results (23ms)
2021-08-12 ยง
16:15 <mbsantos@deploy1002> Started deploy [tilerator/deploy@b88cf50]: maps2009: [production]
16:14 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [production]
16:14 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [production]
16:14 <mbsantos@deploy1002> Finished deploy [tilerator/deploy@b88cf50]: maps2010: (duration: 00m 23s) [production]
16:14 <mbsantos@deploy1002> Started deploy [tilerator/deploy@b88cf50]: maps2010: [production]
16:14 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [production]
16:14 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [production]
16:13 <mbsantos@deploy1002> Finished deploy [tilerator/deploy@b88cf50]: Deploy tilerator 1.1.7-beta.5 (duration: 02m 30s) [production]
16:10 <mbsantos@deploy1002> Started deploy [tilerator/deploy@b88cf50]: Deploy tilerator 1.1.7-beta.5 [production]
15:50 <papaul> powerdown ms-be2060 for relocation [production]
15:49 <mutante> netbox - deleted 2620:0:863:1:198:35:26:6/64 (along with 198.35.26.6) due to the previous error when running makevm cookbook (T288630) [production]
15:47 <mutante> netbox - deleted 198.35.26.6 (doh4002) [production]
15:44 <pt1979@cumin2002> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
15:37 <dzahn@cumin1001> END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh4002.wikimedia.org [production]
15:36 <pt1979@cumin2002> START - Cookbook sre.dns.netbox [production]
15:35 <dzahn@cumin1001> START - Cookbook sre.ganeti.makevm for new host doh4002.wikimedia.org [production]
15:33 <moritzm> importing openjdk-8 8u302-b08-1+deb11u1 to apt.wikimedia.org/component/jdk8 T287960 [production]
15:10 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts druid1002.eqiad.wmnet [production]
15:07 <filippo@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE [production]
15:04 <filippo@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE [production]
15:00 <btullis@cumin1001> START - Cookbook sre.hosts.decommission for hosts druid1002.eqiad.wmnet [production]
14:48 <papaul> reset to factory ps-test-d8-codfw [production]
14:35 <filippo@cumin1001> END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE [production]
14:33 <filippo@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE [production]
14:33 <papaul> reset to factory ps2-test-d8-codfw [production]
14:25 <hnowlan> reenabling puppet on P:cassandra [production]
13:57 <hnowlan> disabling puppet on P:cassandra to test removal of cassandra-metrics-agent [production]
13:50 <effie> disable puppet on mediawiki hosts to merge 705852 [production]
13:39 <hnowlan@cumin1001> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001 [production]
13:31 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts druid1003.eqiad.wmnet [production]
13:20 <btullis@cumin1001> START - Cookbook sre.hosts.decommission for hosts druid1003.eqiad.wmnet [production]
13:03 <hnowlan@cumin1001> START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001 [production]
12:43 <godog> upgrade NIC firmware on thanos-be2* / thanos-fe2* - T286722 [production]
12:28 <hnowlan@cumin1001> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001 [production]
12:23 <filippo@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE [production]
12:18 <filippo@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE [production]
12:09 <godog> upgrade NIC firmware on thanos-be1* - T286722 [production]
12:08 <godog> upgrade NIC firmware on thanos-fe100[34] - T286722 [production]
12:04 <godog> upgrade NIC firmware on thanos-fe100[12] - T286722 [production]
11:56 <moritzm> installing openexr security updates [production]
11:47 <moritzm> installing bluez security updates on buster [production]
10:22 <jmm@cumin2002> END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Holger Knust out of all services on: 1743 hosts [production]
10:22 <jmm@cumin2002> START - Cookbook sre.idm.logout Logging Holger Knust out of all services on: 1743 hosts [production]
10:18 <marostegui@cumin1001> dbctl commit (dc=all): 'Pool db2107 into API', diff saved to https://phabricator.wikimedia.org/P17016 and previous config saved to /var/cache/conftool/dbconfig/20210812-101840-marostegui.json [production]
10:18 <mvolz@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' . [production]
10:13 <mvolz@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' . [production]
10:08 <mvolz@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' . [production]
09:49 <hnowlan@cumin1001> START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001 [production]
09:38 <mwdebug-deploy@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]
09:36 <mwdebug-deploy@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [production]