2401-2450 of 10000 results (31ms)
2021-07-27 ยง
14:53 <marostegui@cumin1001> dbctl commit (dc=all): 'Repool db1162 T287230', diff saved to https://phabricator.wikimedia.org/P16917 and previous config saved to /var/cache/conftool/dbconfig/20210727-145352-marostegui.json [production]
14:53 <moritzm> disabling puppet for upcoming row B maintenance [production]
14:52 <mmandere> depool lvs1014 - T286061 [production]
14:52 <elukey@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet [production]
14:52 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet [production]
14:51 <mmandere@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance [production]
14:51 <mmandere@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance [production]
14:48 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet [production]
14:47 <elukey@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet [production]
14:47 <mmandere@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance [production]
14:46 <mmandere@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance [production]
14:45 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet [production]
14:43 <marostegui@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE [production]
14:41 <jmm@cumin2002> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet [production]
14:40 <marostegui@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE [production]
14:40 <mmandere> depool authdns1001 - T286061 [production]
14:40 <elukey@cumin1001> END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-serve-ctrl2002.codfw.wmnet [production]
14:36 <jmm@cumin2002> START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet [production]
14:34 <elukey@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet [production]
14:33 <mmandere> depool cp10[79-82]).eqiad.wmnet - T286061 [production]
14:33 <elukey@cumin1001> END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet [production]
14:30 <topranks> Add peering to AS398196 - Cobalt Ridge at DE-CIX Dallas on cr2-codfw. [production]
14:29 <elukey> reduce vcores for ml-serve-ctrl[12]00[12] after performance testing - T287238 [production]
14:28 <elukey@cumin1001> START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet [production]
14:25 <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1129 T287230', diff saved to https://phabricator.wikimedia.org/P16916 and previous config saved to /var/cache/conftool/dbconfig/20210727-142520-marostegui.json [production]
14:19 <otto@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [production]
14:16 <otto@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . [production]
14:13 <otto@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [production]
14:13 <otto@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . [production]
14:11 <moritzm> installing aspell security updates [production]
14:11 <otto@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [production]
14:07 <otto@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' . [production]
14:07 <otto@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' . [production]
14:03 <otto@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' . [production]
14:03 <otto@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' . [production]
14:00 <otto@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' . [production]
13:59 <otto@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' . [production]
13:59 <otto@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' . [production]
13:54 <otto@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' . [production]
13:54 <otto@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' . [production]
13:52 <otto@deploy1002> helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' . [production]
13:42 <volans@deploy1002> Finished deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5 (duration: 02m 29s) [production]
13:40 <otto@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' . [production]
13:39 <volans@deploy1002> Started deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5 [production]
13:36 <otto@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' . [production]
13:34 <otto@deploy1002> helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' . [production]
13:30 <ottomata> deploying eventgate-analytics with native prometheus support. Doing this slowly on canary release first to ensure https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload is fixed. [production]
13:29 <otto@deploy1002> helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' . [production]
12:56 <elukey> created component/iptables185 for buster-wikimedia + imported packages from buster-backports [production]
12:50 <dcausse@deploy1002> Finished deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided) (duration: 06m 13s) [production]