3851-3900 of 10000 results (43ms)
2023-04-24 §
08:43 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 0:15:00 on 34 hosts with reason: Enabling replication T335266 [production]
08:33 <marostegui> Enable replication eqiad -> codfw on s5 dbmaint eqiad T335266 [production]
08:32 <cgoubert@deploy2002> Finished scap: testing T329857 (duration: 14m 29s) [production]
08:32 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 26 hosts with reason: Enabling replication T335266 [production]
08:32 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 0:15:00 on 26 hosts with reason: Enabling replication T335266 [production]
08:29 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 27 hosts with reason: Enabling replication T335266 [production]
08:28 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 0:15:00 on 27 hosts with reason: Enabling replication T335266 [production]
08:28 <marostegui> Enable replication eqiad -> codfw on s6 dbmaint eqiad T335266 [production]
08:27 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 27 hosts with reason: Enabling replication T335266 [production]
08:26 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 0:15:00 on 27 hosts with reason: Enabling replication T335266 [production]
08:26 <marostegui> Enable replication eqiad -> codfw on s2 dbmaint eqiad T335266 [production]
08:25 <btullis@cumin1001> START - Cookbook sre.hosts.dhcp for host an-worker1110.eqiad.wmnet [production]
08:21 <btullis@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-worker1110.eqiad.wmnet with reason: Upgrading RAID controller firmware [production]
08:21 <btullis@cumin1001> START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-worker1110.eqiad.wmnet with reason: Upgrading RAID controller firmware [production]
08:20 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 10 hosts with reason: Enabling replication T335266 [production]
08:20 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 0:15:00 on 10 hosts with reason: Enabling replication T335266 [production]
08:20 <marostegui> Enable replication eqiad -> codfw on x1 dbmaint eqiad T335266 [production]
08:18 <cgoubert@deploy2002> Started scap: testing T329857 [production]
08:17 <marostegui> Enable replication eqiad -> codfw on es5 dbmaint eqiad T335266 [production]
08:14 <claime> Deploying 909302 on deploy2002 for T329857 [production]
08:10 <claime> Disabling puppet on deploy2002 - T329857 [production]
08:09 <claime> Deploying 909302 on deploy1002 for T329857 [production]
08:08 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 6 hosts with reason: Enabling replication T335266 [production]
08:08 <marostegui> Enable replication eqiad -> codfw on es4 dbmaint eqiad T335266 [production]
08:08 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 0:15:00 on 6 hosts with reason: Enabling replication T335266 [production]
08:07 <marostegui> Enable replication eqiad -> codfw on pc3 dbmaint eqiad T335266 [production]
08:06 <marostegui> Enable replication eqiad -> codfw on pc2 dbmaint eqiad T335266 [production]
08:05 <marostegui> Enable replication eqiad -> codfw on pc1 dbmaint eqiad T335266 [production]
07:53 <mvernon@cumin2002> END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.41 in codfw [production]
07:51 <mvernon@cumin2002> START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.41 in codfw [production]
07:45 <jelto@cumin2002> END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1004.wikimedia.org with OS bullseye [production]
07:44 <mvernon@cumin2002> END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.59 in codfw [production]
07:42 <mvernon@cumin2002> START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.59 in codfw [production]
07:39 <dcausse> restarting blazegraph on wdqs1005 (stuck for 3+days) [production]
07:38 <mvernon@cumin2002> END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.4a in codfw [production]
07:36 <mvernon@cumin2002> START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.4a in codfw [production]
07:24 <jelto@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage [production]
07:21 <jelto@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage [production]
07:06 <jelto@cumin2002> START - Cookbook sre.hosts.reimage for host gitlab1004.wikimedia.org with OS bullseye [production]
2023-04-23 §
18:41 <wm-bot> <lucaswerkmeister> deployed 0f96d60736 (Punjabi adverbs) [tools.lexeme-forms]
18:00 <wm-bot> <lucaswerkmeister> deployed 75af96b851 (Punjabi adjectives) [tools.lexeme-forms]
16:55 <Krinkle> Fix profile::tlsproxy::envoy::global_cert_name in Horizon for webperf host to use '%{facts.fqdn}' instead of performance.discovery.wmnet as the latter doesn't resolve / would be an invalid cert for https://deployment-webperf21, ref T291015 [deployment-prep]
16:55 <Krinkle> Fix profile::tlsproxy::envoy::global_cert_name in Horizon for webperf host to use '%{facts.fqdn}' instead of performance.discovery.wmnet as the latter doesn't resolve / would be an invalid cert for https://deployment-webperf21, ref T291015 [releng]
15:27 <wm-bot> <lucaswerkmeister> deployed 934f5cffdb (Yoruba adjectives) [tools.lexeme-forms]
2023-04-22 §
21:50 <hashar> Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/908941 [releng]
21:47 <hashar> Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/909624 [releng]
19:35 <Krinkle> Create excimer_ui_server db user in Beta Cluster on deployment-db13 based on prod grants. Password stored in deployment-puppetmaster04:/var/lib/git/labs/private/ (passwords::excimer_ui_server::$excimer_db_pass). Picked db13 because recommendationapi is here (also prod m2), and because mdb (created for this purpose originally) appears broken since several OS iterations (likely forgotten due to unusual name) T301637, T331956 [releng]
18:05 <Krinkle> Fix database hostname dns error at https://performance.wikimedia.beta.wmflabs.org/xhgui/, switch from mdb01 to mdb02, ref T301637 [releng]
18:00 <Krinkle> Move deployment-xhgui03 config from to deployment-xhgui prefix, https://gerrit.wikimedia.org/g/cloud/instance-puppet/+/1d03d4cf84a148bff7e055d9939c44f30c618d85/deployment-prep/deployment-xhgui03.deployment-prep.eqiad1.wikimedia.cloud.yaml https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/ab1c9d2968ba3c292b81ba76f8c0775151630f94%5E%21/, ref T301637 [releng]
16:18 <wm-bot> <lucaswerkmeister> deployed a074fd9c64 (trim spaces) [tools.lexeme-forms]