1301-1350 of 10000 results (75ms)
2023-07-06 §
11:03 <mvernon@cumin1001> END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling reboot on A:swift-fe [production]
10:58 <taavi@deploy1002> Finished scap: Backport for [[gerrit:935997|extdist: REL1_40 is stable, REL1_38 is EOL]] (duration: 08m 21s) [production]
10:54 <stevemunene@cumin1001> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1062.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001" [production]
10:53 <btullis@deploy1002> helmfile [staging] START helmfile.d/services/datahub: apply on main [production]
10:51 <taavi@deploy1002> taavi: Backport for [[gerrit:935997|extdist: REL1_40 is stable, REL1_38 is EOL]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet [production]
10:49 <taavi@deploy1002> Started scap: Backport for [[gerrit:935997|extdist: REL1_40 is stable, REL1_38 is EOL]] [production]
10:47 <stevemunene@cumin1001> START - Cookbook sre.dns.netbox [production]
10:41 <stevemunene@cumin1001> START - Cookbook sre.hosts.decommission for hosts analytics1062.eqiad.wmnet [production]
10:10 <stevemunene@cumin1001> END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1061.eqiad.wmnet [production]
10:10 <stevemunene@cumin1001> END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [production]
10:10 <stevemunene@cumin1001> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1061.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001" [production]
10:08 <stevemunene@cumin1001> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1061.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001" [production]
10:05 <stevemunene@cumin1001> START - Cookbook sre.dns.netbox [production]
09:58 <stevemunene@cumin1001> START - Cookbook sre.hosts.decommission for hosts analytics1061.eqiad.wmnet [production]
09:35 <mvernon@cumin1001> START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling reboot on A:swift-fe [production]
09:28 <btullis@deploy1002> helmfile [staging] DONE helmfile.d/services/datahub: sync on main [production]
09:13 <btullis@deploy1002> helmfile [staging] START helmfile.d/services/datahub: apply on main [production]
09:11 <elukey> restart kube-apiserver on ml-serve-ctrl2* as attempt to fix LIST-related latency issues [production]
09:10 <hashar@deploy1002> rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.16 refs T340244 [production]
08:55 <oblivian@deploy1002> helmfile [eqiad] DONE helmfile.d/services/cxserver: apply [production]
08:55 <oblivian@deploy1002> helmfile [eqiad] START helmfile.d/services/cxserver: apply [production]
08:51 <oblivian@deploy1002> helmfile [codfw] DONE helmfile.d/services/cxserver: apply [production]
08:50 <oblivian@deploy1002> helmfile [codfw] START helmfile.d/services/cxserver: apply [production]
08:49 <oblivian@deploy1002> helmfile [staging] DONE helmfile.d/services/cxserver: apply [production]
08:49 <oblivian@deploy1002> helmfile [staging] START helmfile.d/services/cxserver: apply [production]
08:45 <fabfur> reenabled puppet on cp1075.eqiad.wmnet, cp2027.codfw.wmnet, cp3050.esams.wmnet [production]
08:39 <mvernon@cumin1001> END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling reboot on A:thanos-fe [production]
08:17 <fabfur> disabling puppet temporary on cp1075.eqiad.wmnet, cp2027.codfw.wmnet, cp3050.esams.wmnet to apply 935760 (T340983) [production]
08:03 <jelto@cumin1001> END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: GitLab minor version upgrade [production]
07:31 <kart_> Updated MinT to 2023-07-06-051402-production [production]
07:29 <mvernon@cumin1001> START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling reboot on A:thanos-fe [production]
07:29 <kartik@deploy1002> helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply [production]
07:25 <kartik@deploy1002> helmfile [eqiad] START helmfile.d/services/machinetranslation: apply [production]
07:23 <kartik@deploy1002> helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply [production]
07:17 <kartik@deploy1002> helmfile [codfw] START helmfile.d/services/machinetranslation: apply [production]
07:12 <kartik@deploy1002> helmfile [staging] DONE helmfile.d/services/machinetranslation: apply [production]
07:09 <kartik@deploy1002> helmfile [staging] START helmfile.d/services/machinetranslation: apply [production]
07:04 <stevemunene@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 9 hosts with reason: Stopping puppet and hadoop-hdfs-datanode services then decommissioning the hosts [production]
07:04 <stevemunene@cumin1001> START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 9 hosts with reason: Stopping puppet and hadoop-hdfs-datanode services then decommissioning the hosts [production]
06:54 <jelto@cumin1001> START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: GitLab minor version upgrade [production]
02:17 <rzl@deploy1002> helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply [production]
02:16 <rzl@deploy1002> helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply [production]
02:06 <rzl@deploy1002> helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply [production]
02:05 <rzl@deploy1002> helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply [production]
02:05 <rzl@deploy1002> helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply [production]
02:05 <rzl@deploy1002> helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply [production]
00:22 <eileen> civicrm upgraded from 4ca2008d to 0ddd1a51 [production]
00:03 <rzl@deploy1002> helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply [production]
00:02 <rzl@deploy1002> helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply [production]
2023-07-05 §
22:52 <bking@cumin1001> END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) [production]