2019-12-04
ยง
|
13:52 |
<bblack@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
13:50 |
<bblack@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
13:45 |
<bblack@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
13:43 |
<bblack@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
13:24 |
<bblack@cumin1001> |
conftool action : set/pooled=no; selector: name=dns[345]001.wikimedia.org |
[production] |
13:24 |
<onimisionipe> |
downtimed maps1004 - T239728 |
[production] |
13:23 |
<bblack> |
dns[345]001 - starting downtimes/etc for reimage to buster... |
[production] |
12:31 |
<filippo@cumin1001> |
conftool action : set/pooled=no; selector: name=ms-fe2007.codfw.wmnet |
[production] |
12:29 |
<Urbanecm> |
EU SWAT done |
[production] |
12:28 |
<urbanecm@deploy1001> |
Synchronized php-1.35.0-wmf.5/extensions/WikimediaMessages/: SWAT: bbf2a33: Change Schema Revision of WMDEBannerEvents (T239430) (duration: 01m 02s) |
[production] |
12:26 |
<urbanecm@deploy1001> |
Synchronized php-1.35.0-wmf.8/extensions/WikimediaMessages/: SWAT: b3ef5cd: Change Schema Revision of WMDEBannerEvents (T239430) (duration: 01m 04s) |
[production] |
11:38 |
<jbond42> |
puppet enabled accross the fleet and new CA certificate installed |
[production] |
11:31 |
<akosiaris> |
drain kubernetes1002 for test of nf_conntrack changes |
[production] |
11:23 |
<jbond42> |
enable puppet in eqiad and deploy updated CA |
[production] |
11:13 |
<gehel@cumin1001> |
END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99) |
[production] |
10:54 |
<jbond42> |
enable puppet in codfw and deploy updated CA |
[production] |
10:46 |
<jbond42> |
enable puppet in esams and deploy updated CA |
[production] |
10:42 |
<jbond42> |
enable puppet in ulsfo and deploy updated CA |
[production] |
10:31 |
<gehel@cumin1001> |
START - Cookbook sre.wdqs.restart |
[production] |
10:31 |
<gehel> |
rolling restart of wdqs for config change (event logging) - T101013 |
[production] |
10:30 |
<jbond42> |
enable puppet in eqsin and deploy updated CA |
[production] |
10:24 |
<marostegui> |
stop replication and mysql on db2107 (s2 codfw master) to test puppet CA changes |
[production] |
10:21 |
<marostegui> |
stop replication and mysql on db2071 to test puppet CA changes |
[production] |
10:02 |
<jbond42> |
disabling puppet accros the fleet to start CA update change 548241 |
[production] |
09:29 |
<godog> |
roll-restart logstash7 in codfw/eqiad after https://gerrit.wikimedia.org/r/c/operations/puppet/+/554472 |
[production] |
09:15 |
<marostegui> |
Reload labsdb1010 after reimporting wikidatawiki.page - T238399 |
[production] |
09:06 |
<moritzm> |
updated jenkins on apt.wikimedia.org to 2.190.3 (T239586) |
[production] |
08:05 |
<effie> |
Restart php7-fpm on mw1348 |
[production] |
07:09 |
<marostegui> |
Depool labsdb1010 to reimport wikidatawiki.page - T238399 |
[production] |
07:02 |
<marostegui> |
Repool labsdb1011 |
[production] |
06:36 |
<mutante> |
removed LVS IP for git-ssh from interface on phab1003 |
[production] |
06:25 |
<dzahn@cumin1001> |
conftool action : set/weight=10; selector: name=phab1001-vcs.eqiad.wmnet |
[production] |
06:13 |
<mutante> |
phab1001 - running rsync of /srv/repos with --delete because it's larger than the source by about 5GB - deleting objects to match phab1003, former prod server. now both 50G (T238956) |
[production] |
06:04 |
<marostegui> |
Depool labsdb1011 |
[production] |
06:01 |
<mutante> |
rsyncing /srv/repos data once again. pulling from phab1003 to phab1001 (T238956) |
[production] |
05:51 |
<marostegui> |
Deploy schema change on s3 primary master (db1123) |
[production] |
04:59 |
<mutante> |
removed downtime for phabricator.wikimedia.org meta service (paging) |
[production] |
04:58 |
<mutante> |
phabricator maintenance ended for today - now running on phab1001 (buster) |
[production] |
04:58 |
<mutante> |
install1002 - restarted isc-dhcpd |
[production] |
04:39 |
<mutante> |
phab1001 - rebooting for BIOS config change |
[production] |
02:06 |
<mutante> |
re-enabling puppet on phab1003 and phab1001.. switching active_server for puppet |
[production] |
01:55 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=phab1001-vcs.eqiad.wmnet |
[production] |
01:46 |
<mutante> |
switching phab-vcs in conftool-data from phab1003 to phab1001, running puppet on conf* |
[production] |
01:45 |
<dzahn@cumin1001> |
conftool action : set/pooled=inactive; selector: name=phab1003-vcs.eqiad.wmnet |
[production] |
01:40 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=phab1003-vcs.eqiad.wmnet |
[production] |
01:37 |
<twentyafterfour> |
re-enable phabricator writes (disable cluster.read-only) |
[production] |
01:33 |
<twentyafterfour> |
phab1001.eqiad.wmnet : sudo chown root.www-data /srv/phab/phabricator/conf/local/www.json |
[production] |
01:29 |
<mutante> |
phabricator currently under maintenance - db connection error is known |
[production] |
01:20 |
<mutante> |
running puppet on cp-eqiad |
[production] |
00:49 |
<ejegg> |
changed donations queue consumer and thank you mailer to use 3 minute cycles |
[production] |