2019-12-13
§
|
18:46 |
<bstorm_> |
updated tools-k8s-control-2 and 3 to the new config as well |
[tools] |
18:04 |
<hashar> |
Successfully tagged docker-registry.discovery.wmnet/releng/doxygen:0.6.1 |
[releng] |
17:56 |
<bstorm_> |
updated tools-k8s-control-1 to the new control plane configuration |
[tools] |
17:47 |
<bstorm_> |
edited kubeadm-config configMap object to match the new init config |
[tools] |
17:32 |
<bstorm_> |
rebooting tools-k8s-control-2 to correct mount issue |
[tools] |
16:39 |
<twentyafterfour> |
Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/557001 |
[releng] |
16:28 |
<twentyafterfour> |
deployed codehealth jenkins jobs |
[releng] |
14:53 |
<jeh> |
restart maps-tiles1.maps.eqiad.wmflabs to resolve NFS issues after Dec 12th 2019 maintenance |
[maps] |
14:51 |
<onimisionipe> |
depool maps1003 after postgres init - T239728 |
[production] |
14:37 |
<onimisionipe> |
pool maps1002 after postgres init - T239728 |
[production] |
12:54 |
<addshore> |
reload zuul for https://gerrit.wikimedia.org/r/#/c/integration/config/+/556998/ |
[releng] |
11:46 |
<moritzm> |
installing tiff security updates |
[production] |
11:08 |
<hashar> |
deployment-mediawiki07 : removing faulty entry mwdeploy:x:497:498::/var/lib/mwdeploy:/bin/bash in /etc/passwd # T73480 |
[releng] |
10:52 |
<moritzm> |
rebooting mw2164 for microcode tests |
[production] |
10:52 |
<jmm@cumin2001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
10:52 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.downtime |
[production] |
10:30 |
<moritzm> |
uploaded doxygen 1.8.16-1~exp4~deb10+wmf1 to buster-wikimedia/component/ci T239482 |
[production] |
10:17 |
<ema> |
cp4028: restart ats-be to enable xdebug plugin |
[production] |
09:55 |
<_joe_> |
restarting pybal on lvs in esams (3007, then 3006 and 3005) |
[production] |
09:50 |
<rlazarus> |
rzl@conf1006:~$ sudo systemctl restart etcd.service |
[production] |
08:48 |
<andrewbogott> |
rebooting cloudvirt1023 to investigate some nova things |
[production] |
08:10 |
<elukey> |
rm /var/log user.log.1 messages.1 daemon.log.1 kafkatee.log.1 syslog.1 on netflow2001 to free space (logs spammed with the same error message over and over) |
[production] |
08:07 |
<elukey> |
restart kafkatee-webrequest.service on netflow1001 (spamming logs about not being able to bind to address:port) |
[production] |
08:07 |
<elukey> |
restart fastmon on netflow2001 as attempt to stop spamming logs (failed) |
[production] |
08:06 |
<elukey> |
restart kafkatee-webrequest.service on netflow2001 (spamming logs about not being able to bind to address:port) |
[production] |
07:56 |
<onimisionipe> |
depool maps1002 for postgres init. - T239728 |
[production] |
07:55 |
<elukey> |
execute clear bfd session address fe80::ee38:7300:17e8:a04e on cr3-knams to restore BFD session with eqdfw (OSPF3 status ok on cr3-knams) |
[production] |
07:42 |
<elukey> |
execute reset-failed for monitor_refine_mediawiki_job_events on an-coord1001 |
[analytics] |
06:30 |
<moritzm> |
installing libice security updates |
[production] |
02:57 |
<Krinkle> |
Restarting deployment-mediawiki-07. - T180761 |
[releng] |
02:32 |
<Krinkle> |
It appears puppet-agent has been locally disabled on deployment-mediawiki-07 for at least three days with "no reason given". Re-enabling now to unbreak https://gerrit.wikimedia.org/r/556854 for T180761 |
[releng] |
02:09 |
<Krinkle> |
Create 'mongo' security group and apply to deployment-xhgui01 (ingress tcp/27017). T180761 |
[releng] |
01:26 |
<Krinkle> |
Set `profile::webperf::site::xhgui_host: deployment-xhgui01.deployment-prep.eqiad.wmflabs` in Hiera for deployment-webperf11. T180761 |
[releng] |
00:44 |
<bstorm_> |
rebooting tools-static-13 |
[tools] |
00:32 |
<catrope@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Begin "initiation test" for suggested edits (T238888) (duration: 00m 55s) |
[production] |
00:28 |
<bstorm_> |
rebooting the k8s master to clear NFS errors |
[tools] |
00:27 |
<bstorm_> |
rebooting the paws master since it is in a bad state after the openstack maintenance as well. |
[paws] |
00:22 |
<Krinkle> |
Apply Puppet role class "xhgui::app" to deployment-xhgui01. T238788, T180761 |
[releng] |
00:21 |
<catrope@deploy1001> |
Synchronized php-1.35.0-wmf.10/extensions/GrowthExperiments/: GrowthExperiments: record suggestededits pre-activation as a preference (T238888) (duration: 00m 55s) |
[production] |
00:15 |
<bstorm_> |
switch tools-acme-chief config to match the new authdns_servers format upstream |
[tools] |
00:10 |
<catrope@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Align help panel new account enabling with homepage (T232396) (duration: 00m 56s) |
[production] |
2019-12-12
§
|
23:36 |
<bstorm_> |
rebooting toolschecker after downtiming the services |
[tools] |
23:23 |
<bstorm_> |
restarting service because it is using massive amounts of CPU |
[tools.para] |
22:58 |
<bstorm_> |
rebooting tools-acme-chief-01 |
[tools] |
22:53 |
<bstorm_> |
rebooting the cron server, tools-sgecron-01 as it wasn't recovered from last night's maintenance |
[tools] |
22:48 |
<eileen> |
process-control config revision is d195531033 jobs temporarily disabled |
[production] |
22:33 |
<eileen> |
civicrm revision changed from 2043c27a0e to ad2303ef72, config revision is 4d25b656e2 |
[production] |
21:31 |
<arlolra@deploy1001> |
Finished deploy [parsoid/deploy@75d72e8]: Updating Parsoid to 28d7c21 (duration: 07m 41s) |
[production] |
21:24 |
<jeh> |
schedule downtime until Jan 6th 2020 on cloudvirt1015 (bad hardware) T220853 |
[openstack] |
21:23 |
<arlolra@deploy1001> |
Started deploy [parsoid/deploy@75d72e8]: Updating Parsoid to 28d7c21 |
[production] |