2019-09-30
§
|
13:49 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.downtime |
[production] |
12:53 |
<jbond@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
12:51 |
<jbond@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
12:33 |
<kart_> |
Update cxserver to 2019-09-26-034732-production (T233834, T232674, T233085) |
[production] |
12:29 |
<@> |
helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' . |
[production] |
12:29 |
<jbond42> |
offline puppetmaster2002 to reimage https://gerrit.wikimedia.org/r/c/operations/puppet/+/539322 |
[production] |
12:27 |
<@> |
helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' . |
[production] |
12:24 |
<@> |
helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' . |
[production] |
12:00 |
<Urbanecm> |
EU SWAT done #2 |
[production] |
12:00 |
<urbanecm@deploy1001> |
Synchronized wmf-config/throttle.php: SWAT: 3f4f242: New throttle rule for Czech wiki course (T234113) (duration: 00m 56s) |
[production] |
11:57 |
<Urbanecm> |
Reopen EU SWAT to deploy throttle rule for October 02 (T234113) |
[production] |
11:54 |
<raynor> |
EU SWAT finished |
[production] |
11:54 |
<pmiazga@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:538296|Enable alternate mobile link for it, nl, ko wikis. (T206497)]] (duration: 00m 57s) |
[production] |
11:27 |
<kartik@deploy1001> |
Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit|539517|Enable CX out of beta in Tagalog and Central Bikol WPs (T233006, T233007)]] (duration: 00m 59s) |
[production] |
11:20 |
<hashar> |
Restarting Docker on integration-agent-puppet-docker-1001 # T234197 |
[production] |
11:08 |
<hashar> |
Restarting Docker on CI agents to clear out some docker/iptables oddity # T234197 |
[production] |
10:48 |
<hashar> |
CI outage is tracked in https://phabricator.wikimedia.org/T234197 |
[production] |
10:42 |
<moritzm> |
draining ganeti2004 for upcoming reboot (combined kernel/qemu security updates) |
[production] |
10:40 |
<hashar> |
CI down due to some DNS related failure on the hosts :-\ |
[production] |
10:30 |
<jmm@cumin2001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
10:30 |
<jmm@cumin2001> |
START - Cookbook sre.hosts.downtime |
[production] |
09:30 |
<moritzm> |
uploading ferm 2.4.1+wmf2+deb9u1 for stretch-wikimedia, fixes AAAA lookups (T153468) |
[production] |
09:11 |
<moritzm> |
draining ganeti2002 for upcoming reboot (combined kernel/qemu security updates) |
[production] |
09:10 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db2091:3314 for a schema change - T233625', diff saved to https://phabricator.wikimedia.org/P9217 and previous config saved to /var/cache/conftool/dbconfig/20190930-091043-marostegui.json |
[production] |
08:00 |
<moritzm> |
installing e2fsprogs security updates on Stretch/Buster |
[production] |
07:56 |
<marostegui> |
Stop dbstore1003:3311 for troubleshooting |
[production] |
06:47 |
<moritzm> |
installing exim security updates on buster |
[production] |
2019-09-27
§
|
22:44 |
<mutante> |
phab2001 - apt-get autoremove - remove unused python and ruby packages |
[production] |
22:36 |
<mutante> |
phab2001 - upgrade php7.2 packages to 7.2.22 (T230024) |
[production] |
22:03 |
<mutante> |
webperf1001, webperf2001: restart envoyproxy to pick up new cert with the right subject alt. names |
[production] |
18:22 |
<mutante> |
mwdebug1001, mwdebug1002 - deleted from /srv/mediawiki/: php-1.34.0-wmf.16, .17, .18, .19 and .20 (current is .24) - usage back to about 57% (T234063) |
[production] |
18:17 |
<mutante> |
mwdebug1001, mwdebug1002 - apt-get clean saves about 3GB and gets usage down from 94% to 87% on / (T234063) |
[production] |
16:01 |
<XioNoX> |
delete BGP to AS34305 on cr2-esams |
[production] |
15:34 |
<elukey> |
update pcc facts to add new hosts |
[production] |
15:02 |
<moritzm> |
installing usb.ids update from Buster 10.1 point release |
[production] |
14:45 |
<moritzm> |
installing ncurses bugfix update from Buster 10.1 point release |
[production] |
14:39 |
<moritzm> |
installing postgresql-common bugfix update from Buster 10.1 point release |
[production] |
14:32 |
<effie> |
Disable puppet and reload apache on mw* for 539465 and 539488 - T229792 |
[production] |
13:33 |
<marostegui> |
Set candidate masters in dbctl T234039 |
[production] |
13:31 |
<jmm@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
13:29 |
<jmm@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
13:16 |
<moritzm> |
reimaging auth1002 to buster |
[production] |
13:09 |
<akosiaris> |
reboot ganeti2001 T233906 |
[production] |
13:08 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) |
[production] |
13:08 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.decommission |
[production] |
13:03 |
<effie> |
Disable puppet on mwmaint1002 to test noc.wikimedia.org with PHP7 |
[production] |
12:58 |
<elukey@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) |
[production] |
12:56 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
12:48 |
<moritzm> |
installing openldap security updates on Buster |
[production] |