| 2021-01-27
      
      § | 
    
  | 01:48 | <ryankemper> | [WDQS Deploy] Gearing up for deploy of wdqs `0.3.61`. Pre-deploy tests passing on canary `wdqs1003` | [production] | 
            
  | 01:39 | <ebernhardson@deploy1001> | Finished deploy [wikimedia/discovery/analytics@ee948e0]: transfer_to_es: Enable catchup (duration: 01m 11s) | [production] | 
            
  | 01:38 | <ebernhardson@deploy1001> | Started deploy [wikimedia/discovery/analytics@ee948e0]: transfer_to_es: Enable catchup | [production] | 
            
  | 01:25 | <legoktm@cumin1001> | conftool action : set/pooled=yes; selector: name=mw2296.codfw.wmnet | [production] | 
            
  | 01:25 | <legoktm@cumin1001> | conftool action : set/pooled=yes; selector: name=mw2295.codfw.wmnet | [production] | 
            
  | 01:23 | <ryankemper> | T272713 [Deploy envoy for `wdqs-internal`] Roll-out complete. Will monitor `wdqs-internal` for any issues. All the remaining `WDQS SPARQL` alerts should clear shortly | [production] | 
            
  | 01:21 | <ryankemper> | T272713 [Deploy envoy for `wdqs-internal`] Test queries to `wdqs1003.eqiad.wmnet` passed, and metrics in Grafana (https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs-internal&from=1611706751381&to=1611710190405) look good. Rolling out to rest of fleet | [production] | 
            
  | 01:21 | <legoktm@cumin1001> | conftool action : set/pooled=no; selector: name=mw2296.codfw.wmnet | [production] | 
            
  | 01:20 | <legoktm@cumin1001> | conftool action : set/pooled=no; selector: name=mw2295.codfw.wmnet | [production] | 
            
  | 01:14 | <ebernhardson@deploy1001> | Finished deploy [wikimedia/discovery/analytics@246b640]: remove link recommendations from hourly transfer deps (duration: 03m 31s) | [production] | 
            
  | 01:10 | <ebernhardson@deploy1001> | Started deploy [wikimedia/discovery/analytics@246b640]: remove link recommendations from hourly transfer deps | [production] | 
            
  | 00:54 | <legoktm@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2296.codfw.wmnet with reason: REIMAGE | [production] | 
            
  | 00:52 | <legoktm@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2295.codfw.wmnet with reason: REIMAGE | [production] | 
            
  | 00:51 | <ryankemper> | T272713 [Deploy envoy for `wdqs-internal`] Fixed typo in private key in commit `ea152df802b55e939d34494a4965ed83a80a24f2`. Puppet run on `wdqs1003` was successful as a result. Monitoring... | [production] | 
            
  | 00:49 | <legoktm@cumin1001> | START - Cookbook sre.hosts.downtime for 2:00:00 on mw2295.codfw.wmnet with reason: REIMAGE | [production] | 
            
  | 00:49 | <legoktm@cumin1001> | START - Cookbook sre.hosts.downtime for 2:00:00 on mw2296.codfw.wmnet with reason: REIMAGE | [production] | 
            
  | 00:45 | <ryankemper> | T272713 [Deploy envoy for `wdqs-internal`] Discovered source of the above failure; the secret key in the puppetmaster `/srv/private` repo has a typo in its name (my error): it had `wqds` instead of `wdqs`. Opening up a patch now | [production] | 
            
  | 00:44 | <ryankemper> | T272713 [Deploy envoy for `wdqs-internal`] `...Error while evaluating a Function Call, secret(): invalid secret ssl/wdqs-internal.discovery.wmnet.key (file: /etc/puppet/modules/sslcert/manifests/certificate.pp, line: 91, column: 26) (file: /etc/puppet/modules/profile/manifests/tlsproxy/envoy.pp, line: 129) on node wdqs1003.eqiad.wmnet` | [production] | 
            
  | 00:36 | <ryankemper> | [Deploy envoy for `wdqs-internal`] `...Error while evaluating a Function Call, secret(): invalid secret ssl/wdqs-internal.discovery.wmnet.key (file: /etc/puppet/modules/sslcert/manifests/certificate.pp, line: 91, column: 26) (file: /etc/puppet/modules/profile/manifests/tlsproxy/envoy.pp, line: 129) on node wdqs1003.eqiad.wmnet` | [production] | 
            
  | 00:20 | <ryankemper> | T272713 [Deploy envoy for `wdqs-internal`] Disabled puppet on all `wdqs-internal` hosts; merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/657913 | [production] | 
            
  | 00:16 | <ryankemper@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2008.codfw.wmnet with reason: Enabling envoy for wdqs-internal | [production] | 
            
  | 00:16 | <ryankemper@cumin1001> | START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2008.codfw.wmnet with reason: Enabling envoy for wdqs-internal | [production] | 
            
  | 00:15 | <ryankemper> | T272713 [Deploy envoy for `wdqs-internal`] Downtimed all `wdqs-internal` hosts on icinga | [production] | 
            
  | 00:15 | <ryankemper@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2006.codfw.wmnet with reason: Enabling envoy for wdqs-internal | [production] | 
            
  | 00:15 | <ryankemper@cumin1001> | START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2006.codfw.wmnet with reason: Enabling envoy for wdqs-internal | [production] | 
            
  | 00:15 | <ryankemper@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2005.codfw.wmnet with reason: Enabling envoy for wdqs-internal | [production] | 
            
  | 00:15 | <ryankemper@cumin1001> | START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2005.codfw.wmnet with reason: Enabling envoy for wdqs-internal | [production] | 
            
  | 00:15 | <ryankemper@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2004.codfw.wmnet with reason: Enabling envoy for wdqs-internal | [production] | 
            
  | 00:15 | <ryankemper@cumin1001> | START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2004.codfw.wmnet with reason: Enabling envoy for wdqs-internal | [production] | 
            
  | 00:15 | <ryankemper@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1011.eqiad.wmnet with reason: Enabling envoy for wdqs-internal | [production] | 
            
  | 00:15 | <ryankemper@cumin1001> | START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1011.eqiad.wmnet with reason: Enabling envoy for wdqs-internal | [production] | 
            
  | 00:15 | <ryankemper@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1008.eqiad.wmnet with reason: Enabling envoy for wdqs-internal | [production] | 
            
  | 00:15 | <ryankemper@cumin1001> | START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1008.eqiad.wmnet with reason: Enabling envoy for wdqs-internal | [production] | 
            
  | 00:14 | <ryankemper@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1003.eqiad.wmnet with reason: Enabling envoy for wdqs-internal | [production] | 
            
  | 00:14 | <ryankemper@cumin1001> | START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1003.eqiad.wmnet with reason: Enabling envoy for wdqs-internal | [production] | 
            
  
    | 2021-01-26
      
      § | 
    
  | 23:43 | <dzahn@cumin1001> | conftool action : set/pooled=yes; selector: name=mw2297.codfw.wmnet | [production] | 
            
  | 23:41 | <dzahn@cumin1001> | conftool action : set/pooled=yes; selector: name=mw2298.codfw.wmnet | [production] | 
            
  | 23:40 | <dzahn@cumin1001> | conftool action : set/pooled=yes; selector: name=mw2302.codfw.wmnet | [production] | 
            
  | 23:37 | <dzahn@cumin1001> | conftool action : set/pooled=yes; selector: name=mw1264.eqiad.wmnet | [production] | 
            
  | 23:32 | <dzahn@cumin1001> | conftool action : set/pooled=no; selector: name=mw2297.codfw.wmnet | [production] | 
            
  | 23:31 | <dzahn@cumin1001> | conftool action : set/pooled=no; selector: name=mw1264.eqiad.wmnet | [production] | 
            
  | 23:31 | <dzahn@cumin1001> | conftool action : set/pooled=no; selector: name=mw2298.codfw.wmnet | [production] | 
            
  | 23:30 | <dzahn@cumin1001> | conftool action : set/pooled=no; selector: name=mw2299.codfw.wmnet | [production] | 
            
  | 23:30 | <dzahn@cumin1001> | conftool action : set/pooled=no; selector: name=mw2302.codfw.wmnet | [production] | 
            
  | 22:35 | <ebernhardson@deploy1001> | Finished deploy [wikimedia/discovery/analytics@a276626]: correct execution_date_fn in ores_predictions_hourly (duration: 01m 07s) | [production] | 
            
  | 22:34 | <ebernhardson@deploy1001> | Started deploy [wikimedia/discovery/analytics@a276626]: correct execution_date_fn in ores_predictions_hourly | [production] | 
            
  | 22:30 | <legoktm@cumin1001> | conftool action : set/pooled=yes; selector: name=mw2300.codfw.wmnet | [production] | 
            
  | 22:27 | <legoktm@cumin1001> | conftool action : set/pooled=no; selector: name=mw2300.codfw.wmnet | [production] | 
            
  | 22:26 | <dzahn@cumin1001> | END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1264.eqiad.wmnet with reason: REIMAGE | [production] | 
            
  | 22:24 | <dzahn@cumin1001> | START - Cookbook sre.hosts.downtime for 2:00:00 on mw1264.eqiad.wmnet with reason: REIMAGE | [production] |