801-850 of 10000 results (29ms)
2020-10-28 §
16:34 <hnowlan@cumin1001> START - Cookbook sre.hosts.downtime [production]
16:34 <hnowlan@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
16:34 <hnowlan@cumin1001> START - Cookbook sre.hosts.downtime [production]
16:22 <hnowlan@puppetmaster1001> conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1004.eqiad.wmnet [production]
16:22 <hnowlan@puppetmaster1001> conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1004.eqiad.wmnet [production]
16:18 <hnowlan@puppetmaster1001> conftool action : set/pooled=no; selector: dc=eqiad,cluster=kartotherian,service=kartotherian,name=maps1004.eqiad.wmnet [production]
16:16 <hnowlan> Disabling tilerator in eqiad [production]
16:15 <hnowlan@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
16:15 <hnowlan@cumin1001> START - Cookbook sre.hosts.downtime [production]
16:06 <ppchelko@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . [production]
16:05 <ppchelko@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . [production]
16:03 <ppchelko@deploy1001> helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . [production]
15:51 <Amir1> restarting uwsgi on ores in eqiad [production]
15:49 <elukey@cumin1001> END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) [production]
15:33 <ppchelko@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' . [production]
15:33 <ppchelko@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' . [production]
15:24 <ppchelko@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' . [production]
15:24 <ppchelko@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' . [production]
15:23 <ppchelko@deploy1001> helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' . [production]
15:10 <godog> roll restart logstash5 in codfw [production]
14:50 <elukey@cumin1001> START - Cookbook sre.ganeti.makevm [production]
14:05 <jayme@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' . [production]
13:54 <jayme@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' . [production]
12:39 <moritzm> installing libdatetime-timezone-perl updates [production]
11:46 <XioNoX> configure urpf strict log-only on cr3-ulsfo:et-0/0/1.501 - T266561 [production]
10:39 <ema> due to T266651, cancel the entry above: A:cp upgrade libvmod-netmapper to 1.9-1 T266567 T264398 [production]
10:38 <elukey> clean up 10.64.5.7 and 2620:0:861:104:10:64:5:7 from Netbox (records mistakely allocated via the makevm cookbook) - T266648 [production]
10:35 <elukey@cumin1001> END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) [production]
10:25 <ema> A:cp (except cp3052, running varnish 5) upgrade libvmod-netmapper to 1.9-1 T266567 T264398 [production]
10:20 <elukey@cumin1001> START - Cookbook sre.ganeti.makevm [production]
09:54 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
09:52 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
09:50 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
09:49 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
09:26 <jayme> imported kubeyaml 0.0.3~20201027+git5f5556c-1 to buster-wikimedia [production]
09:04 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
09:02 <elukey@cumin1001> START - Cookbook sre.hosts.downtime [production]
08:37 <jynus> updated dump grants on db2093 [production]
07:53 <volans> upgraded python3-wmflib to 0.0.3 on the cumin hosts - T257905 [production]
07:40 <godog> update thanos-fe1002 to thanos 0.16.0 - T261281 [production]
07:22 <godog> swift codfw-prod: bump object weight for ms-be2057 - T261633 [production]
04:43 <ryankemper> T266492 Finished rolling restart of codfw cirrus cluster [production]
04:43 <ryankemper@cumin2001> END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0) [production]
02:58 <ryankemper> T266492 Beginning rolling restart of codfw cirrus cluster, 3 nodes at a time, on `ryankemper@cumin2001` tmux session `elasticsearch_restart_codfw` [production]
02:57 <ryankemper@cumin2001> START - Cookbook sre.elasticsearch.rolling-restart [production]
02:12 <eileen> tools revision changed from a2a91d6c6a to 087a596d3a [production]
00:40 <eileen> civicrm revision changed from 4fdfb8408b to e1d65b0f3a, config revision is f16003ab62 [production]
2020-10-27 §
22:20 <mutante> systemctl reset-failed on various servers to see which are coming back later from failed auto_restart and which don't [production]
21:40 <mutante> mwmaint2001 - systemctl reset-failed - mediawiki_job_parser_cache_purging.service [production]
20:56 <mutante> ms-be1057 is network down but running, NO-CARRIER on NIC, cable disconnected? [production]