2022-03-01
§
|
11:09 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cp3062.esams.wmnet with reason: host reimage |
[production] |
11:07 |
<_joe_> |
restarted pybal on lvs2010, T244843 |
[production] |
11:02 |
<mmandere> |
restart purged on cp60[09,10,11] |
[production] |
11:00 |
<cmooney@cumin1001> |
START - Cookbook sre.hosts.provision for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED |
[production] |
10:47 |
<cmooney@cumin1001> |
END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED |
[production] |
10:40 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.reimage for host cp3062.esams.wmnet with OS buster |
[production] |
10:40 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 259 hosts |
[production] |
10:40 |
<jmm@cumin2002> |
START - Cookbook sre.idm.logout Logging Ema out of all services on: 259 hosts |
[production] |
10:40 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 1353 hosts |
[production] |
10:39 |
<jmm@cumin2002> |
START - Cookbook sre.idm.logout Logging Ema out of all services on: 1353 hosts |
[production] |
10:31 |
<mmandere> |
restart purged on cp600[6-8] |
[production] |
10:28 |
<cmooney@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
10:24 |
<cmooney@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
10:05 |
<vgutierrez> |
pool cp2039 running HAProxy as TLS termination layer - T290005 T271421 |
[production] |
09:48 |
<elukey> |
elukey@stat1004:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the host) |
[production] |
09:45 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2039.codfw.wmnet with OS buster |
[production] |
09:33 |
<_joe_> |
restarted pybal on lvs1019, removed the mw api from ipvsadm, the mw api is internally fully encrypted |
[production] |
09:31 |
<_joe_> |
restart pybal on lvs1020 |
[production] |
09:25 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Amuigai out of all services on: 1881 hosts |
[production] |
09:25 |
<elukey> |
restart varnishkafka-webrequest on cp6009 as attempt to clear a weird status of librdkafka (delivery errors to kafka) |
[production] |
09:25 |
<_joe_> |
manually removed ipvs entries on lvs2*, so it is actually now that the http api is not available in codfw anymore |
[production] |
09:24 |
<jmm@cumin2002> |
START - Cookbook sre.idm.logout Logging Amuigai out of all services on: 1881 hosts |
[production] |
09:24 |
<jmm@cumin2002> |
END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging ZPapierski out of all services on: 1881 hosts |
[production] |
09:22 |
<jmm@cumin2002> |
START - Cookbook sre.idm.logout Logging ZPapierski out of all services on: 1881 hosts |
[production] |
09:22 |
<_joe_> |
restarted pybal on lvs2009, the mw api is now effectively https-only in codfw T287820 |
[production] |
09:20 |
<_joe_> |
restarted pybal on lvs2010 |
[production] |
09:14 |
<vgutierrez@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage |
[production] |
09:12 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage |
[production] |
09:06 |
<elukey> |
restart purged on cp6005 |
[production] |
08:57 |
<elukey> |
restart purged on cp6004 |
[production] |
08:54 |
<vgutierrez@cumin1001> |
START - Cookbook sre.hosts.reimage for host cp2039.codfw.wmnet with OS buster |
[production] |
08:27 |
<urbanecm> |
UTC morning B&C window done |
[production] |
08:25 |
<elukey> |
restart purged on cp6003 |
[production] |
08:16 |
<moritzm> |
drain instances off ganeti2008 for eventual decom |
[production] |
08:08 |
<urbanecm@deploy1002> |
Synchronized wmf-config/ProductionServices.php: d149208dfd7e5fbf51f44dd0bf7dae3b2e2f5159: Use service-proxy to connect to linkrecommendation (T302719) (duration: 00m 49s) |
[production] |
07:59 |
<elukey> |
restart purged on cp6002 |
[production] |
06:58 |
<oblivian@deploy1002> |
Finished deploy [restbase/deploy@0848b15] (dev-cluster): T302464 test (duration: 00m 17s) |
[production] |
06:57 |
<oblivian@deploy1002> |
Started deploy [restbase/deploy@0848b15] (dev-cluster): T302464 test |
[production] |
06:56 |
<elukey> |
restart purged on cp6001 to clear stale kafka TLS consumer state (or attempting to) |
[production] |
06:46 |
<_joe_> |
uploaded scap 4.4.1 to {stretch,buster,bullseye} T302464 |
[production] |
06:46 |
<_joe_> |
uploaded scap 4.4.1 to {stretch,buster,bullseye} |
[production] |
02:59 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21618 and previous config saved to /var/cache/conftool/dbconfig/20220301-025938-ladsgroup.json |
[production] |
02:44 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21617 and previous config saved to /var/cache/conftool/dbconfig/20220301-024433-ladsgroup.json |
[production] |
02:29 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21616 and previous config saved to /var/cache/conftool/dbconfig/20220301-022928-ladsgroup.json |
[production] |
02:14 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21615 and previous config saved to /var/cache/conftool/dbconfig/20220301-021424-ladsgroup.json |
[production] |
01:14 |
<ladsgroup@cumin1001> |
dbctl commit (dc=all): 'Depooling db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21614 and previous config saved to /var/cache/conftool/dbconfig/20220301-011404-ladsgroup.json |
[production] |
01:14 |
<ladsgroup@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance |
[production] |
01:13 |
<ladsgroup@cumin1001> |
START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance |
[production] |
00:17 |
<mutante> |
15.wikipedia.org on k8s (staging) deploy1002:~] $ curl -s --resolve "15.wikipedia.org:4111:staging.svc.eqiad.wmnet" 'https://15.wikipedia.org' | grep grandpa => "“Wikipedia is like an all-knowing grandpa.”" | T300171 |
[production] |