| 2020-06-25
      
      § | 
    
  | 09:53 | <jmm@cumin2001> | START - Cookbook sre.hosts.reboot-single | [production] | 
            
  | 09:37 | <volans@cumin1001> | END (PASS) - Cookbook sre.dns.netbox (exit_code=0) | [production] | 
            
  | 09:34 | <volans@cumin1001> | START - Cookbook sre.dns.netbox | [production] | 
            
  | 09:28 | <akosiaris> | schedule downtime for eqiad wikifeeds as it's flapping too much without yet knowing why. T256358 | [production] | 
            
  | 09:28 | <godog> | extend lv on thanos-fe2001 and restart thanos-compact | [production] | 
            
  | 09:21 | <vgutierrez> | rolling restart of  ncredir instances to catch up on kernel updates | [production] | 
            
  | 09:13 | <joal@deploy1001> | Finished deploy [analytics/refinery@4aba370] (thin): Analytics fix over weekly train THIN [analytics/refinery@4aba370] (duration: 00m 10s) | [production] | 
            
  | 09:13 | <joal@deploy1001> | Started deploy [analytics/refinery@4aba370] (thin): Analytics fix over weekly train THIN [analytics/refinery@4aba370] | [production] | 
            
  | 09:13 | <joal@deploy1001> | Finished deploy [analytics/refinery@4aba370]: Analytics fix over weekly train [analytics/refinery@4aba370] (duration: 16m 27s) | [production] | 
            
  | 09:01 | <vgutierrez> | restarting acme-chief instances to catch up on kernel updates | [production] | 
            
  | 08:56 | <joal@deploy1001> | Started deploy [analytics/refinery@4aba370]: Analytics fix over weekly train [analytics/refinery@4aba370] | [production] | 
            
  | 08:42 | <hashar> | releases2002: restarted bacula-fd to take in account the puppet provided configuration  # T247652 | [production] | 
            
  | 08:14 | <jynus> | restarting bacula-dir on backup1001 | [production] | 
            
  | 08:09 | <akosiaris> | restart etherpad-lite on etherpad1002 | [production] | 
            
  | 08:03 | <marostegui> | Failover m1 from db1135 to db1097 - T254556 | [production] | 
            
  | 07:52 | <jynus> | stop bacula-director on backup1001 for db maintenance T254556 | [production] | 
            
  | 07:49 | <akosiaris@cumin1001> | END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) | [production] | 
            
  | 07:49 | <akosiaris@cumin1001> | START - Cookbook sre.ganeti.makevm | [production] | 
            
  | 07:49 | <akosiaris@cumin1001> | END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) | [production] | 
            
  | 07:49 | <akosiaris@cumin1001> | START - Cookbook sre.ganeti.makevm | [production] | 
            
  | 07:49 | <akosiaris@cumin1001> | END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) | [production] | 
            
  | 07:48 | <akosiaris@cumin1001> | START - Cookbook sre.ganeti.makevm | [production] | 
            
  | 07:48 | <akosiaris@cumin1001> | END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) | [production] | 
            
  | 07:47 | <akosiaris@cumin1001> | START - Cookbook sre.ganeti.makevm | [production] | 
            
  | 07:36 | <elukey> | reboot an-launcher1001 for kernel upgrades | [production] | 
            
  | 07:18 | <elukey> | reboot kafkamon* vms for kernel upgrades | [production] | 
            
  | 07:08 | <marostegui> | Start pre switchover steps on m1 T254556 | [production] | 
            
  | 06:40 | <elukey> | reboot matomo1002 for kernel upgrades | [production] | 
            
  | 06:35 | <elukey> | reboot archiva1002 (new vm, not yet in service) for kernel upgrades | [production] | 
            
  | 06:34 | <elukey> | reboot archiva for kernel upgrades | [production] | 
            
  | 06:31 | <elukey> | force puppet run on ores1003/1005 to restore celery (killed by the oom) | [production] | 
            
  | 06:24 | <elukey> | reboot an-tool* vms for kernel upgrades | [production] | 
            
  | 06:23 | <elukey> | reboot analytics-tool1004 for kernel upgrades (Superset host) | [production] | 
            
  | 06:22 | <elukey> | reboot analytics-tool1001 for kernel upgrades | [production] | 
            
  | 06:19 | <elukey> | execute ip addr flush ens5 on an-airflow1001 to clear RTNETLINK answers: File exists (error from ifup@ens5.service) | [production] | 
            
  | 06:03 | <elukey> | reboot an-airflow1001 for kernel upgrades | [production] | 
            
  | 04:26 | <marostegui> | Remove triggers from db2095:3312 - T238966 | [production] | 
            
  | 04:25 | <marostegui> | Deploy schema change on s2 codfw - T238966 | [production] | 
            
  | 00:48 | <twentyafterfour> | restart php-fpm on phab1001 to fix T256343 | [production] | 
            
  | 00:12 | <twentyafterfour> | phabricator updated, all seems normal | [production] | 
            
  | 00:11 | <twentyafterfour> | updating phabricator to release/2020-06-25/1, momentary (<1 minute) downtime expected. | [production] | 
            
  
    | 2020-06-24
      
      § | 
    
  | 23:44 | <mutante> | releases2002 - systemctl stop jenkins, kill 15244 (rogue jenkins process), start jenkins with systemctl start jenkins (T247652) | [production] | 
            
  | 23:43 | <mutante> | releases1002 - kill rogue jenkins process, start jenkins with systemctl start jenkins (T247652) | [production] | 
            
  | 23:02 | <mutante> | releases1002/2002 - disabling puppet, removing failing cron job to pull deployment_charts (because /srv/deployment-charts does not exist yet) | [production] | 
            
  | 21:45 | <shdubsh> | install mtail 3.0.0~rc35+wmf2 on logstash1007 - T255776 | [production] | 
            
  | 20:42 | <brennen@deploy1001> | Synchronized php: group1 wikis to 1.35.0-wmf.38 (duration: 01m 06s) | [production] | 
            
  | 20:41 | <brennen@deploy1001> | rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.38 | [production] | 
            
  | 20:41 | <brennen> | train 1.35.0-wmf.38: attempting to roll forward to group1 after php-fpm restart on mw1287 (T256305, T254175) | [production] | 
            
  | 20:32 | <cdanis> | restarting php-fpm on mw1287 T256305 | [production] | 
            
  | 20:32 | <bsitzmann@deploy1001> | helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' . | [production] |