101-150 of 10000 results (84ms)
2024-08-08 ยง
14:24 <fnegri@cumin1002> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002" [production]
14:02 <ladsgroup@deploy1003> ladsgroup: Backport for [[gerrit:1060839|Add missing close tags to #contentSub message (T372054)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
14:01 <stevemunene@deploy1003> Finished deploy [airflow-dags/analytics_test@2a3060e]: (no justification provided) (duration: 00m 33s) [production]
14:00 <stevemunene@deploy1003> Started deploy [airflow-dags/analytics_test@2a3060e]: (no justification provided) [production]
13:59 <ladsgroup@deploy1003> Started scap sync-world: Backport for [[gerrit:1060839|Add missing close tags to #contentSub message (T372054)]] [production]
13:51 <kevinbazira@deploy1003> helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' . [production]
13:48 <ayounsi@cumin1002> END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary [production]
13:47 <ayounsi@cumin1002> START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary [production]
13:44 <fnegri@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1018.eqiad.wmnet with reason: host reimage [production]
13:41 <fnegri@cumin1002> START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1018.eqiad.wmnet with reason: host reimage [production]
13:28 <fnegri@cumin1002> START - Cookbook sre.hosts.reimage for host clouddb1018.eqiad.wmnet with OS bookworm [production]
13:25 <fnegri@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1018.eqiad.wmnet with reason: Reimaging clouddb1018 T365424 [production]
13:25 <fnegri@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb1018.eqiad.wmnet with reason: Reimaging clouddb1018 T365424 [production]
13:24 <fnegri@cumin1002> conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s7 [production]
13:24 <fnegri@cumin1002> conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s2 [production]
12:47 <jnuche@deploy1003> rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.17 refs T366962 [production]
12:23 <samtar@deploy1003> Finished scap: Backport for [[gerrit:1060765|mswikisource: add custom logos (T372031)]] (duration: 08m 47s) [production]
12:22 <dcausse> T371401: reindexing wikidatawiki@codfw to index mul labels [production]
12:18 <samtar@deploy1003> chlod, samtar: Continuing with sync [production]
12:18 <samtar@deploy1003> chlod, samtar: Backport for [[gerrit:1060765|mswikisource: add custom logos (T372031)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
12:14 <samtar@deploy1003> Started scap sync-world: Backport for [[gerrit:1060765|mswikisource: add custom logos (T372031)]] [production]
12:11 <samtar@deploy1003> Finished scap: Backport for [[gerrit:1060764|bdrwiki: add custom logos (T372031)]] (duration: 09m 20s) [production]
12:06 <samtar@deploy1003> chlod, samtar: Continuing with sync [production]
12:05 <samtar@deploy1003> chlod, samtar: Backport for [[gerrit:1060764|bdrwiki: add custom logos (T372031)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
12:01 <samtar@deploy1003> Started scap sync-world: Backport for [[gerrit:1060764|bdrwiki: add custom logos (T372031)]] [production]
11:58 <samtar@deploy1003> Finished scap: Backport for [[gerrit:1060763|dtpwiki: add custom logos (T372031)]] (duration: 10m 10s) [production]
11:53 <samtar@deploy1003> chlod, samtar: Continuing with sync [production]
11:52 <samtar@deploy1003> chlod, samtar: Backport for [[gerrit:1060763|dtpwiki: add custom logos (T372031)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
11:48 <samtar@deploy1003> Started scap sync-world: Backport for [[gerrit:1060763|dtpwiki: add custom logos (T372031)]] [production]
11:35 <jelto@cumin1002> END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica to new version [production]
10:39 <jnuche@deploy1003> rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.17 refs T366962 [production]
09:53 <jnuche@deploy1003> rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.17 refs T366962 [production]
09:38 <elukey@cumin1002> END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: Openjdk upgrade - elukey@cumin1002 [production]
09:37 <btullis@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1009.eqiad.wmnet with reason: Rebooting due to CPU soft lockup [production]
09:37 <btullis@cumin1002> START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1009.eqiad.wmnet with reason: Rebooting due to CPU soft lockup [production]
09:32 <dreamyjazz@deploy1003> Finished scap: Backport for [[gerrit:1060760|Fix DefaultPresenter rejecting IPCountInfo instances (T371966)]] (duration: 10m 38s) [production]
09:27 <dreamyjazz@deploy1003> dreamyjazz: Continuing with sync [production]
09:24 <elukey> powercycle ml-serve2004 - host frozen, no ssh access, get sel shows "Multi-bit memory errors detected on a memory device at location(s) DIMM_A2." [production]
09:23 <dreamyjazz@deploy1003> dreamyjazz: Backport for [[gerrit:1060760|Fix DefaultPresenter rejecting IPCountInfo instances (T371966)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug) [production]
09:21 <dreamyjazz@deploy1003> Started scap sync-world: Backport for [[gerrit:1060760|Fix DefaultPresenter rejecting IPCountInfo instances (T371966)]] [production]
08:45 <jelto@cumin1002> START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica to new version [production]
08:30 <dcausse> T371401: reindexing wikidatawiki@eqiad to index mul labels [production]
08:23 <ayounsi@cumin1002> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "vtrs1003+gerrit1004 - ayounsi@cumin1002" [production]
08:23 <ayounsi@cumin1002> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "vtrs1003+gerrit1004 - ayounsi@cumin1002" [production]
08:19 <elukey> restart dump_ip_reputation.service on puppetserver1001 [production]
08:13 <elukey> restart tomcat on idp[1,2]003 to pick up the new openjdk [production]
08:10 <marostegui@cumin1002> dbctl commit (dc=all): 'Depooling db1232 (T367856)', diff saved to https://phabricator.wikimedia.org/P67252 and previous config saved to /var/cache/conftool/dbconfig/20240808-081041-marostegui.json [production]
08:10 <marostegui@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1232.eqiad.wmnet with reason: Maintenance [production]
08:10 <marostegui@cumin1002> START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1232.eqiad.wmnet with reason: Maintenance [production]
08:10 <marostegui@cumin1002> dbctl commit (dc=all): 'Repooling after maintenance db1219 (T367856)', diff saved to https://phabricator.wikimedia.org/P67251 and previous config saved to /var/cache/conftool/dbconfig/20240808-081019-marostegui.json [production]