2020-12-21
§
|
17:33 |
<robh@cumin1001> |
END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on rdb1012.eqiad.wmnet with reason: REIMAGE |
[production] |
17:31 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1012.eqiad.wmnet with reason: REIMAGE |
[production] |
14:43 |
<jbond42> |
disable puppet to upgrade puppet master packages |
[production] |
14:43 |
<jbond42> |
upload puppet_5.5.22-1 to wikimedia-buster |
[production] |
14:20 |
<jbond42> |
update puppet on puppetmaster1001 |
[production] |
14:16 |
<jbond42> |
update puppet on puppetmaster1003 |
[production] |
14:15 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: After cloning db1154:3313', diff saved to https://phabricator.wikimedia.org/P13616 and previous config saved to /var/cache/conftool/dbconfig/20201221-141555-root.json |
[production] |
14:15 |
<moritzm> |
installung sleuthkit security updates on buster |
[production] |
14:00 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: After cloning db1154:3313', diff saved to https://phabricator.wikimedia.org/P13615 and previous config saved to /var/cache/conftool/dbconfig/20201221-140051-root.json |
[production] |
13:45 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: After cloning db1154:3313', diff saved to https://phabricator.wikimedia.org/P13614 and previous config saved to /var/cache/conftool/dbconfig/20201221-134548-root.json |
[production] |
13:30 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: After cloning db1154:3313', diff saved to https://phabricator.wikimedia.org/P13613 and previous config saved to /var/cache/conftool/dbconfig/20201221-133044-root.json |
[production] |
12:57 |
<hashar> |
Gerrit briefly paused due to erroneous run of `jmap -clstats` |
[production] |
12:22 |
<hashar> |
Running jhat on gerrit1001 to analyze a heap dump, expect CPU usage |
[production] |
11:48 |
<moritzm> |
installing libxstream-java security updates on buster |
[production] |
11:31 |
<moritzm> |
installing php-pear security updates on buster |
[production] |
09:47 |
<_joe_> |
logging out of the long-running root screen session on maps1010 |
[production] |
09:46 |
<_joe_> |
logging out of the long-running root screen session on maps1001 |
[production] |
09:46 |
<_joe_> |
systemctl reset-failed on deneb, timeout downloading a docker image from the registry |
[production] |
09:24 |
<dcausse> |
depooling wdqs1011 (lag) |
[production] |
09:19 |
<_joe_> |
powercycling wdqs1011, unresponsive to ssh |
[production] |
08:31 |
<dcausse@deploy1001> |
Finished deploy [wdqs/wdqs@512d713]: GUI updates (T269224+i18n updates) (duration: 08m 57s) |
[production] |
08:22 |
<dcausse@deploy1001> |
Started deploy [wdqs/wdqs@512d713]: GUI updates (T269224+i18n updates) |
[production] |
08:15 |
<marostegui> |
Add ips to the x2 instances on dbctl T269324 |
[production] |
07:52 |
<jiji@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2023.codfw.wmnet with reason: REIMAGE |
[production] |
07:49 |
<jiji@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1023.eqiad.wmnet with reason: REIMAGE |
[production] |
07:49 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mc2023.codfw.wmnet with reason: REIMAGE |
[production] |
07:47 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mc1023.eqiad.wmnet with reason: REIMAGE |
[production] |
07:07 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Depool db1112 T268742 ', diff saved to https://phabricator.wikimedia.org/P13609 and previous config saved to /var/cache/conftool/dbconfig/20201221-070748-marostegui.json |
[production] |
06:48 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) |
[production] |
06:35 |
<marostegui> |
Compress clouddb1017:3313 clouddb1013:3313 T270473 |
[production] |
2020-12-18
§
|
21:52 |
<thcipriani> |
flushing gerrit project cache |
[production] |
20:50 |
<ejegg> |
updated payments-wiki from 3d3055c478 to c3e6c5f9f4 |
[production] |
19:41 |
<nskaggs@cumin1001> |
END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0) |
[production] |
19:41 |
<nskaggs@cumin1001> |
Added views for new wiki: skrwiki T268412 |
[production] |
19:31 |
<mutante> |
restarted wikibugs (phab, gerrit and irc jobs) |
[production] |
19:30 |
<nskaggs@cumin1001> |
START - Cookbook wmcs.wikireplicas.add_wiki |
[production] |
19:24 |
<mutante> |
tools.wikibugs@tools-sgebastion-07:~/wikibugs2$ qdel 1766104 |
[production] |
19:22 |
<mutante> |
bug: wikibugs stopped reporting bugs, attempting to restart bug bot to continue reporting bugs |
[production] |
19:07 |
<nskaggs@cumin1001> |
END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0) |
[production] |
19:07 |
<nskaggs@cumin1001> |
Added views for new wiki: madwiki T269440 |
[production] |
18:56 |
<nskaggs@cumin1001> |
START - Cookbook wmcs.wikireplicas.add_wiki |
[production] |
18:39 |
<andrew@deploy1001> |
Finished deploy [horizon/deploy@89b308c]: update codfw1dev deploy (duration: 02m 17s) |
[production] |
18:36 |
<andrew@deploy1001> |
Started deploy [horizon/deploy@89b308c]: update codfw1dev deploy |
[production] |
18:36 |
<andrew@deploy1001> |
Finished deploy [horizon/deploy@89b308c]: update codfw1dev deploy (duration: 00m 09s) |
[production] |
18:36 |
<andrew@deploy1001> |
Started deploy [horizon/deploy@89b308c]: update codfw1dev deploy |
[production] |
18:07 |
<andrew@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on labstore1005.eqiad.wmnet with reason: REIMAGE |
[production] |
18:05 |
<andrew@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on labstore1005.eqiad.wmnet with reason: REIMAGE |
[production] |
17:59 |
<cmjohnson@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
17:40 |
<cmjohnson@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
17:28 |
<andrew@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb2001-dev.wikimedia.org with reason: REIMAGE |
[production] |