2019-10-10
ยง
|
19:11 |
<dduvall@deploy1001> |
rebuilt and synchronized wikiversions files: labswiki to 1.35.0-wmf.1 |
[production] |
19:09 |
<dduvall@deploy1001> |
Synchronized php-1.35.0-wmf.1/extensions/OpenStackManager: labswiki to 1.35.0-wmf.1 (duration: 01m 00s) |
[production] |
19:04 |
<marxarelli> |
promoting labswiki to 1.35.0-wmf.1 cc: T233849 |
[production] |
17:07 |
<jbond42> |
puppetmaster1001 has been upgraded and is back serving requests |
[production] |
16:21 |
<urandom> |
Upgrading sessionstore200[1-3].codfw.wmnet to Cassandra 3.11.4 -- T200803 |
[production] |
16:18 |
<urandom> |
Upgrading sessionstore1003.eqiad.wmnet to Cassandra 3.11.4 -- T200803 |
[production] |
16:16 |
<urandom> |
Upgrading sessionstore1002.eqiad.wmnet to Cassandra 3.11.4 -- T200803 |
[production] |
16:11 |
<@> |
helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' . |
[production] |
16:07 |
<@> |
helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' . |
[production] |
16:04 |
<thcipriani> |
restarting gerrit due to T224448 |
[production] |
16:04 |
<@> |
helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' . |
[production] |
16:01 |
<urandom> |
Upgrading sessionstore1001.eqiad.wmnet to Cassandra 3.11.4 -- T200803 |
[production] |
15:42 |
<@> |
helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' . |
[production] |
15:23 |
<mholloway-shell@deploy1001> |
Finished deploy [mobileapps/deploy@1adf74e]: Update mobileapps to c89aa55 (duration: 05m 39s) |
[production] |
15:18 |
<mholloway-shell@deploy1001> |
Started deploy [mobileapps/deploy@1adf74e]: Update mobileapps to c89aa55 |
[production] |
14:57 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Fully repool db1074 after getting its BBU replaced T231638', diff saved to https://phabricator.wikimedia.org/P9306 and previous config saved to /var/cache/conftool/dbconfig/20191010-145737-marostegui.json |
[production] |
14:54 |
<moritzm> |
ran systemctl reset-failed on puppetmaster1001 (puppet-master.service after reimage) |
[production] |
14:42 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1074 after BBU replacement T231638', diff saved to https://phabricator.wikimedia.org/P9305 and previous config saved to /var/cache/conftool/dbconfig/20191010-144201-marostegui.json |
[production] |
14:39 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Fully repool db1112 into recentchanges and remove db1078 from it after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9304 and previous config saved to /var/cache/conftool/dbconfig/20191010-143924-marostegui.json |
[production] |
14:36 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Fully repool to db1084 db1083 db1076 db1112 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9303 and previous config saved to /var/cache/conftool/dbconfig/20191010-143633-marostegui.json |
[production] |
14:23 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'More traffic to db1084 db1083 db1076 db1112 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9302 and previous config saved to /var/cache/conftool/dbconfig/20191010-142323-marostegui.json |
[production] |
14:13 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1084 db1083 db1076 db1112 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9301 and previous config saved to /var/cache/conftool/dbconfig/20191010-141303-marostegui.json |
[production] |
14:04 |
<marostegui@deploy1001> |
Synchronized wmf-config/db-eqiad.php: Fully repool es1013, es1014 after PDU maintenance (duration: 00m 59s) |
[production] |
14:03 |
<jbond42> |
re-enable puppet now ca has been correctly moved |
[production] |
13:58 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1112 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9300 and previous config saved to /var/cache/conftool/dbconfig/20191010-135806-marostegui.json |
[production] |
13:57 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Slowly repool db1084 db1083 db1076 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9299 and previous config saved to /var/cache/conftool/dbconfig/20191010-135659-marostegui.json |
[production] |
13:50 |
<jbond42> |
disable puppet fleet wide as puppetmaster2002 is stuggeling |
[production] |
13:32 |
<jbond42> |
reimage puppetmaster1001 |
[production] |
13:27 |
<marostegui> |
Repool labsdb1011 after reclone - T235016 |
[production] |
13:16 |
<arturo> |
added flannel 0.5.5-4 to buster-wikimedia (T235059) |
[production] |
13:05 |
<marostegui@deploy1001> |
Synchronized wmf-config/db-eqiad.php: More traffic to es1013, es1014 after PDU maintenance (duration: 00m 58s) |
[production] |
13:00 |
<jbond@cumin2001> |
END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0) |
[production] |
12:41 |
<marostegui@deploy1001> |
Synchronized wmf-config/db-eqiad.php: Slowly repool es1013, es1014 after PDU maintenance (duration: 00m 59s) |
[production] |
11:57 |
<jbond@cumin2001> |
Updating IPMI password on 1253 hosts - jbond@cumin2001 |
[production] |
11:57 |
<jbond@cumin2001> |
START - Cookbook sre.hosts.ipmi-password-reset |
[production] |
11:48 |
<jbond@cumin2001> |
END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0) |
[production] |
11:46 |
<jbond@cumin2001> |
Updating IPMI password on 35 hosts - jbond@cumin2001 |
[production] |
11:46 |
<jbond@cumin2001> |
START - Cookbook sre.hosts.ipmi-password-reset |
[production] |
11:41 |
<lucaswerkmeister-wmde@deploy1001> |
Synchronized wmf-config/Wikibase.php: [[gerrit:542087|Fix typo in beta repo data bridge config (T235033)]] (duration: 00m 59s) |
[production] |
11:40 |
<marostegui> |
Deploy schema change on s7 codfw master (db2118), this will generate lag on s7 codfw - T234066 T233135 |
[production] |
11:38 |
<jbond@cumin2001> |
END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99) |
[production] |
11:38 |
<jbond@cumin2001> |
Updating IPMI password on 1253 hosts - jbond@cumin2001 |
[production] |
11:38 |
<jbond@cumin2001> |
START - Cookbook sre.hosts.ipmi-password-reset |
[production] |
11:37 |
<arturo> |
icinga downtime cloudvirt1023 for 2h (T227536) |
[production] |
11:36 |
<arturo> |
icinga downtime cloudvirt1025 for 2h (T227536) |
[production] |
11:36 |
<jbond@cumin2001> |
END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99) |
[production] |
11:36 |
<jbond@cumin2001> |
Updating IPMI password on 1253 hosts - jbond@cumin2001 |
[production] |
11:36 |
<jbond@cumin2001> |
START - Cookbook sre.hosts.ipmi-password-reset |
[production] |
11:35 |
<arturo> |
icinga downtime cloudvirt1026 for 2h (T227536) |
[production] |
11:35 |
<marostegui> |
Stop replication on db2077 to change triggers on db2095:3317 - T234704 |
[production] |