6201-6250 of 10000 results (31ms)
2021-09-24 §
07:17 <elukey@cumin1001> END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 [production]
07:01 <elukey@cumin1001> START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 [production]
07:01 <elukey@cumin1001> END (ERROR) - Cookbook sre.hadoop.roll-restart-workers (exit_code=97) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 [production]
07:00 <elukey@cumin1001> START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 [production]
06:55 <elukey@cumin1001> START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001 [production]
06:53 <elukey@cumin1001> END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001 [production]
06:44 <elukey@cumin1001> START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons. - elukey@cumin1001 [production]
06:41 <elukey@cumin1001> END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - elukey@cumin1001 [production]
06:30 <elukey@cumin1001> START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - elukey@cumin1001 [production]
06:26 <elukey> restart archiva on archiva1002 to pick up new openjdk upgrades [production]
2021-09-23 §
16:13 <elukey> reboot an-worker1096 to see if megacli status for a new disk changes - T290805 [production]
15:09 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. [production]
15:09 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. [production]
15:06 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. [production]
15:06 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. [production]
14:19 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. [production]
14:19 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. [production]
13:09 <elukey> update pcc facts (after change in puppetdb's fact filter list, to allow partitions for analytics) [production]
07:01 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [production]
07:01 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [production]
06:59 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [production]
06:59 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [production]
06:57 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. [production]
06:57 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. [production]
06:55 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. [production]
06:55 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. [production]
06:55 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [production]
06:55 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [production]
2021-09-22 §
06:02 <elukey> update pcc facts [production]
2021-09-21 §
17:39 <elukey> update pcc facts [production]
15:39 <elukey> update pcc facts [production]
11:55 <elukey@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
11:46 <elukey@cumin1001> START - Cookbook sre.dns.netbox [production]
2021-09-20 §
13:39 <elukey@cumin1001> END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons. - elukey@cumin1001 [production]
13:20 <elukey@cumin1001> START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons. - elukey@cumin1001 [production]
2021-09-16 §
07:48 <elukey@puppetmaster1001> conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=eqiad [production]
2021-09-15 §
06:57 <elukey> shutdown ms-be2045 (again) after seeing T290881 [production]
06:02 <elukey> powercycle ms-be2045 - no ssh, no remote tty available [production]
2021-09-13 §
09:18 <elukey> upgrade rsyslog* on ml-serve* nodes to 8.1901.0-1+wmf2 [production]
09:11 <elukey> upload rsyslog* 8.1901.0-1+wmf2 to buster-wikimedia component/rsyslog-k8s - T277739 [production]
2021-09-10 §
08:14 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [production]
08:14 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [production]
08:14 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [production]
08:13 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [production]
08:12 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [production]
08:12 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [production]
07:31 <elukey@deploy1002> helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'. [production]
07:31 <elukey@deploy1002> helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'. [production]
06:02 <elukey@puppetmaster1001> conftool action : set/pooled=inactive; selector: name=mw2280.codfw.wmnet [production]
05:56 <elukey> powercycle mw2280 - no tty available in mgmt, no ssh, host frozen [production]