2021-02-11
§
|
07:44 |
<XioNoX> |
push improved loopback dhcp term to all routers |
[production] |
07:39 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.reboot-single for host mc1021.eqiad.wmnet |
[production] |
07:25 |
<effie> |
pool thumbor1001 |
[production] |
07:06 |
<elukey@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=thumbor1001.eqiad.wmnet |
[production] |
07:06 |
<elukey> |
powercycle thumbor1001 - no ssh, no mgmt serial tty available, no racadm getsel infos |
[production] |
06:45 |
<kart_> |
Updated cxserver to 2021-02-10-134029-production (T274133, T273456, T271980) |
[production] |
06:41 |
<kartik@deploy1001> |
helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' . |
[production] |
06:35 |
<kartik@deploy1001> |
helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' . |
[production] |
06:33 |
<kartik@deploy1001> |
helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' . |
[production] |
03:10 |
<rzl@cumin1001> |
dbctl commit (dc=all): 'depool db1134', diff saved to https://phabricator.wikimedia.org/P14310 and previous config saved to /var/cache/conftool/dbconfig/20210211-031048-rzl.json |
[production] |
03:10 |
<rzl> |
depooled db1134 |
[production] |
02:18 |
<milimetric@deploy1001> |
Finished deploy [analytics/refinery@01d811f] (thin): Fix spelling error in mediacounts job (duration: 00m 06s) |
[production] |
02:18 |
<milimetric@deploy1001> |
Started deploy [analytics/refinery@01d811f] (thin): Fix spelling error in mediacounts job |
[production] |
02:18 |
<milimetric@deploy1001> |
Finished deploy [analytics/refinery@01d811f]: Fix spelling error in mediacounts job (duration: 11m 06s) |
[production] |
02:07 |
<milimetric@deploy1001> |
Started deploy [analytics/refinery@01d811f]: Fix spelling error in mediacounts job |
[production] |
02:05 |
<dwisehaupt> |
move payments1* and frpig1* out of maintenance mode |
[production] |
02:04 |
<eileen> |
process-control config revision is 726db3446a |
[production] |
02:02 |
<dwisehaupt> |
move civi1001 out of maintenance mode |
[production] |
01:54 |
<eileen> |
civicrm revision changed from 3776363c90 to b81cb5e702, config revision is f216d8fe8e |
[production] |
01:35 |
<dwisehaupt> |
applying new civicrm triggers to frdb1002 |
[production] |
01:14 |
<eileen> |
civicrm revision changed from 2ce8194c07 to 3776363c90, config revision is f216d8fe8e |
[production] |
01:06 |
<dwisehaupt> |
stopping mariadb replication on frdev1001 and frdb1004 |
[production] |
01:05 |
<dwisehaupt> |
Move payments/civi/frpig into maint mode for civi upgrade |
[production] |
01:04 |
<eileen> |
process-control config revision is f216d8fe8e |
[production] |
00:26 |
<legoktm@deploy1001> |
Synchronized wmf-config/profiler.php: Revert "profiler: Send data to excimer-buster pipeline" (duration: 02m 00s) |
[production] |
00:03 |
<milimetric@deploy1001> |
Finished deploy [analytics/refinery@3da19b6] (thin): More fixes for jobs after cluster upgrade (duration: 00m 07s) |
[production] |
00:03 |
<milimetric@deploy1001> |
Started deploy [analytics/refinery@3da19b6] (thin): More fixes for jobs after cluster upgrade |
[production] |
2021-02-10
§
|
23:53 |
<milimetric@deploy1001> |
Finished deploy [analytics/refinery@3da19b6]: More fixes for jobs after cluster upgrade (duration: 14m 23s) |
[production] |
23:49 |
<legoktm@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1328.eqiad.wmnet |
[production] |
23:49 |
<legoktm@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1327.eqiad.wmnet |
[production] |
23:49 |
<legoktm@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1326.eqiad.wmnet |
[production] |
23:49 |
<legoktm@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1325.eqiad.wmnet |
[production] |
23:38 |
<milimetric@deploy1001> |
Started deploy [analytics/refinery@3da19b6]: More fixes for jobs after cluster upgrade |
[production] |
23:36 |
<eileen> |
civicrm revision changed from ae24f87158 to 2ce8194c07, config revision is a48a7db0a2 |
[production] |
22:37 |
<legoktm@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1328.eqiad.wmnet |
[production] |
22:37 |
<legoktm@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1327.eqiad.wmnet |
[production] |
22:37 |
<legoktm@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1326.eqiad.wmnet |
[production] |
22:37 |
<legoktm@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1325.eqiad.wmnet |
[production] |
22:32 |
<ebernhardson@deploy1001> |
Finished deploy [wikimedia/discovery/analytics@d97f7d9]: query_clicks: Remove result file merging (duration: 01m 27s) |
[production] |
22:30 |
<ebernhardson@deploy1001> |
Started deploy [wikimedia/discovery/analytics@d97f7d9]: query_clicks: Remove result file merging |
[production] |
22:24 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1377.eqiad.wmnet |
[production] |
22:23 |
<dzahn@cumin1001> |
conftool action : set/pooled=yes; selector: name=mw1369.eqiad.wmnet |
[production] |
22:14 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1377.eqiad.wmnet |
[production] |
22:13 |
<dzahn@cumin1001> |
conftool action : set/pooled=no; selector: name=mw1369.eqiad.wmnet |
[production] |
22:07 |
<mutante> |
mw1369, mw1377 - all servers in this section now consistenly fail to reboot when triggered as the last step of wmf-reimage script |
[production] |
21:43 |
<legoktm@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1328.eqiad.wmnet with reason: REIMAGE |
[production] |
21:41 |
<legoktm@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1327.eqiad.wmnet with reason: REIMAGE |
[production] |
21:41 |
<legoktm@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw1328.eqiad.wmnet with reason: REIMAGE |
[production] |
21:39 |
<legoktm@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1326.eqiad.wmnet with reason: REIMAGE |
[production] |
21:39 |
<legoktm@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on mw1327.eqiad.wmnet with reason: REIMAGE |
[production] |