1201-1250 of 10000 results (32ms)
2020-11-30 §
08:36 <marostegui> Compare data between clouddb1016:3315 labsdb1012 T267090 [production]
07:45 <marostegui@cumin1001> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [production]
07:41 <marostegui@cumin1001> START - Cookbook sre.hosts.decommission [production]
07:25 <marostegui@cumin1001> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [production]
07:18 <marostegui@cumin1001> START - Cookbook sre.hosts.decommission [production]
07:11 <marostegui> Deploy schema change on s1 codfw - T268004 [production]
07:05 <marostegui> Stop mysql on db1124:3318 to clone clouddb1016:3318, lag will show up on wikireplicas on s8 T267090 [production]
06:47 <marostegui@cumin1001> END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [production]
06:43 <marostegui@cumin1001> START - Cookbook sre.hosts.decommission [production]
04:26 <kart_> Updated cxserver to 2020-11-23-050106-production (T262253, T268410) [production]
04:18 <kartik@deploy1001> helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' . [production]
04:14 <kartik@deploy1001> helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' . [production]
04:11 <kartik@deploy1001> helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' . [production]
2020-11-29 §
21:11 <wm-bot> <lucaswerkmeister> deployed 915eb4016f (clarify German templates) [tools.lexeme-forms]
17:18 <andrewbogott> cleaning up some logfiles in tools-sgecron-01 — drive is full [admin]
2020-11-28 §
23:35 <Krenair> Re-scheduled 4 continuous jobs from tools-sgeexec-0908 as it appears to be broken, at about 23:20 UTC [tools]
18:23 <James_F> Zuul: Install CI for mediawiki/extensions/EncryptedUploads [releng]
05:00 <legoktm> reloading Zuul for https://gerrit.wikimedia.org/r/634626 [releng]
04:35 <Krenair> Ran `sudo -i kubectl -n tool-mdbot delete cm maintain-kubeusers` on tools-k8s-control-1 for T268904, seems to have regenerated ~tools.mdbot/.kube/config [tools]
2020-11-27 §
17:30 <hnowlan@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
17:30 <hnowlan@cumin1001> START - Cookbook sre.hosts.downtime [production]
16:08 <hashar> Successfully tagged docker-registry.discovery.wmnet/releng/helm-linter:0.2.11 for jayme / T251305 [releng]
15:50 <hnowlan@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [production]
15:50 <hnowlan@cumin1001> START - Cookbook sre.hosts.downtime [production]
15:13 <elukey@cumin1001> END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) [production]
15:06 <elukey@cumin1001> START - Cookbook sre.zookeeper.roll-restart-zookeeper [production]
14:56 <elukey@cumin1001> END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) [production]
14:51 <elukey> roll restart zookeeper on druid* nodes for openjdk upgrades [analytics]
14:50 <elukey> roll restart zookeeper on druid* nodes for openjdk upgrades [production]
14:50 <elukey@cumin1001> START - Cookbook sre.zookeeper.roll-restart-zookeeper [production]
10:52 <jayme> updated helmfile to 0.135.0-1 on deploy*,contint* [production]
10:51 <jayme> updated helm-diff to 3.1.3-1 on contint* [production]
10:49 <jayme> updated helm to 2.17.0-1 on deploy*,contint*,chartmuseum* [production]
10:29 <elukey> restart eventlogging_to_druid_editattemptstep_hourly on an-launcher1002 (failed) to see if the hive metastore works [analytics]
10:27 <elukey> restart oozie and presto-server on an-coord1001 for openjdk upgrades [analytics]
10:27 <elukey> restart hive server and metastore on an-coord1001 - openjdk upgrades + problem with high GC caused by a job [analytics]
10:06 <jayme> updated helm and helmfile on deploy2001 [production]
10:04 <jayme@deploy2001> helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . [production]
10:00 <jayme> imported helm 2.17.0 into buster-wikimedia and stretch-wikimedia [production]
08:55 <elukey@cumin1001> END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) [production]
08:05 <elukey> roll restart druid public cluster for openjdk upgrades [production]
08:05 <elukey> roll restart druid public cluster for openjdk upgrades [analytics]
08:04 <elukey@cumin1001> START - Cookbook sre.druid.roll-restart-workers [production]
06:39 <marostegui> Stop mysql on es1015 T268810 [production]
06:38 <marostegui@cumin1001> dbctl commit (dc=all): 'Remove es1015 from dbctl', diff saved to https://phabricator.wikimedia.org/P13454 and previous config saved to /var/cache/conftool/dbconfig/20201127-063846-marostegui.json [production]
06:30 <marostegui> Remove es1016 from tendril and zarcillo T268812 [production]
06:29 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [production]
06:25 <marostegui@cumin1001> START - Cookbook sre.hosts.decommission [production]
06:19 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool es1015 for decommissioning T268810', diff saved to https://phabricator.wikimedia.org/P13453 and previous config saved to /var/cache/conftool/dbconfig/20201127-061929-marostegui.json [production]
2020-11-26 §
22:58 <andrewbogott> deleting /var/log/haproxy logs older than 7 days in cloudcontrol100x. We need log rotation here it seems. [admin]