2021-03-30
§
|
09:41 |
<hnowlan@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' . |
[production] |
09:35 |
<hnowlan@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' . |
[production] |
09:35 |
<hnowlan@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' . |
[production] |
09:05 |
<hnowlan@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' . |
[production] |
09:04 |
<hnowlan@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' . |
[production] |
08:36 |
<jynus> |
mariadb upgrade of all buster source backup hosts to 10.4.18 T250666 |
[production] |
08:05 |
<dcausse> |
refreshing wdqs entities (T278693) |
[production] |
07:37 |
<elukey> |
restart-php7.2-fpm on mw1304, jobrunner completely overwhelmed by ffmpeg/transcode jobs (not publishing metrics, erroring out for memcached timeouts) - T278734 |
[production] |
07:28 |
<hashar@deploy1002> |
rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.36 - T274940 |
[production] |
07:26 |
<Majavah> |
shutoff deployment-mediawiki-09 T278664 |
[releng] |
06:25 |
<Majavah> |
switch w-beta.wmflabs.org web proxy to deployment-mediawiki11 |
[releng] |
06:18 |
<Majavah> |
restart restbase on deployment-restbase03 to pick up config changes to use deployment-mediawiki11 |
[releng] |
06:06 |
<elukey> |
powercycle cp1087 (no ssh, no mgmt console tty) |
[production] |
06:04 |
<elukey@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=cp1087.eqiad.wmnet |
[production] |
01:39 |
<wm-bot> |
<samwilson> Updating to version |
[tools.ocr-test] |
01:35 |
<wm-bot> |
<samwilson> T278461. Test site is up and running at https://ocr-test.toolforge.org/ . |
[tools.ocr-test] |
01:20 |
<wm-bot> |
<samwilson> Added new Google Cloud Platform API key for ocr-test. |
[tools.ocr-test] |
2021-03-29
§
|
21:12 |
<Krinkle> |
Restarted cvn-app8 and cvn-app9 |
[cvn] |
21:07 |
<Krinkle> |
Flags +AV were set on cyberzeus in #cvn-wp-en. |
[cvn] |
19:06 |
<hnowlan@puppetmaster1001> |
conftool action : set/pooled=yes; selector: name=aqs1004.eqiad.wmnet |
[production] |
18:02 |
<Operator873|CVN> |
restarted CVNBot7-10, 15, and 19. Failed to regain nick |
[cvn] |
17:47 |
<volans@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
17:37 |
<volans@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
16:15 |
<hnowlan@puppetmaster1001> |
conftool action : set/pooled=no; selector: name=aqs1004.eqiad.wmnet |
[production] |
16:11 |
<hnowlan> |
depooled aqs1004 for transfer of large tables to aqs1010 |
[production] |
15:53 |
<jbond@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
15:47 |
<jbond@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
15:45 |
<jbond@cumin1001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
15:39 |
<jbond@cumin1001> |
START - Cookbook sre.dns.netbox |
[production] |
15:37 |
<Majavah> |
hard reboot deployment-sessionstore03 T263617 |
[releng] |
15:16 |
<Majavah> |
manually run puppet on deployment-sessionstore03, starting Cassandra (which was stopped) T263617 |
[releng] |
13:26 |
<jiji@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE |
[production] |
13:24 |
<jiji@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE |
[production] |
13:04 |
<Majavah> |
cherry pick https://gerrit.wikimedia.org/r/c/operations/puppet/+/675503/ on deployment-puppetmaster04 (T278664), also apply same change on horizon. this will switch traffic from deployment-mediawiki-07 to deployment-mediawiki11 |
[releng] |
13:03 |
<ema> |
cp4027: rollback luajit experiment https://github.com/apache/trafficserver/issues/7423#issuecomment-809354214 |
[production] |
12:36 |
<ema> |
cp4027: re-enable JIT compilation in all ats-be lua scripts -- https://github.com/apache/trafficserver/issues/7423 |
[production] |
11:57 |
<ema> |
cp4027: re-enable JIT compilation in normalize-path.lua -- https://github.com/apache/trafficserver/issues/7423 |
[production] |
11:32 |
<ema> |
cp4027: install libluajit 2.1.0~beta3+dfsg-6wm1 with P15083 applied -- https://github.com/apache/trafficserver/issues/7423 |
[production] |
10:29 |
<Majavah> |
remove deployment-mediawiki10, too much live debugging, not in use |
[releng] |
09:59 |
<jbond@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE |
[production] |
09:57 |
<jbond@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE |
[production] |
09:56 |
<Majavah> |
taavi@deployment-mediawiki10:~$ sudo ln -s /usr/local/share/ca-certificates/Puppet_Internal_CA.crt /etc/ssl/certs/aeffde42.0 && sudo update-ca-certificates |
[releng] |
09:29 |
<Urbanecm> |
Manually run puppet on mediawiki10 |
[releng] |
09:28 |
<Urbanecm> |
Re-enable puppet on mediawiki10 |
[releng] |
09:16 |
<ryankemper> |
T267927 `sudo -i cookbook sre.wdqs.data-reload wdqs2008.codfw.wmnet --task-id T267927 --reload-data wikidata --reason 'T267927: Reload wikidata jnl from fresh dumps' --reuse-downloaded-dump --depool` |
[production] |
09:15 |
<ryankemper@cumin2001> |
START - Cookbook sre.wdqs.data-reload |
[production] |
08:49 |
<Urbanecm> |
DIsable puppet on mediawiki10 - investigating failing curl certificate check |
[releng] |
08:47 |
<filippo@deploy1002> |
Finished deploy [librenms/librenms@df69efe]: deploy I156f32925f693 (duration: 00m 08s) |
[production] |
08:47 |
<filippo@deploy1002> |
Started deploy [librenms/librenms@df69efe]: deploy I156f32925f693 |
[production] |
07:59 |
<hashar@deploy1002> |
Synchronized php: group1 wikis to 1.36.0-wmf.36 (duration: 01m 06s) |
[production] |