2021-03-10
§
|
04:56 |
<ryankemper> |
T266470 In the `/srv/private` repo, `/srv/private/modules/secret/secrets/certificates/certificate.manifests.d/wdqs.certs.yaml` has been edited to add the relevant `alt_names` |
[production] |
04:55 |
<ryankemper> |
T266470 Certificate revoked: `ryankemper@puppetmaster1001:/srv/private$ sudo puppet cert clean wdqs.discovery.wmnet` |
[production] |
04:53 |
<ryankemper> |
T266470 `ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo disable-puppet "revoking old cert and generating new one with new alt_names - T266470"'` |
[production] |
04:53 |
<ryankemper> |
T266470 ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo disable-puppet "revoking old cert and generating new one with new alt_names - T266470"' |
[production] |
04:52 |
<ryankemper> |
T266470 Temporarily disabling puppet on all `wdqs*` hosts in preparation for `wdqs.discovery.wmnet` certificate revocation |
[production] |
01:08 |
<krinkle@deploy1002> |
Synchronized php-1.36.0-wmf.34/extensions/NavigationTiming/modules/ext.navigationTiming.js: T276826 Ibd9ddf14d64 (duration: 01m 14s) |
[production] |
00:28 |
<marxarelli> |
mariadb successfully started on db07 following transfer/extraction using mariabackup and following mysql_upgrade (T276968) |
[releng] |
00:10 |
<marxarelli> |
restore of db06 failed yet again. trying mariabackup db06 -> db07 instead of mysqldump (after fixing docs/usage of the former) (T276968) |
[releng] |
00:02 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-backup1002.eqiad.wmnet with reason: REIMAGE |
[production] |
00:00 |
<robh@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-backup1001.eqiad.wmnet with reason: REIMAGE |
[production] |
2021-03-09
§
|
23:59 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup1002.eqiad.wmnet with reason: REIMAGE |
[production] |
23:58 |
<robh@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup1001.eqiad.wmnet with reason: REIMAGE |
[production] |
22:04 |
<mutante> |
phab1001 - manually running phab public task dumd script after making changes to redirect stdout |
[production] |
22:00 |
<razzi> |
rebalance kafka partitions for webrequest_upload partition 14 |
[analytics] |
21:54 |
<marxarelli> |
restoring from db06 dump on db07 and db08 following `DROP VIEW IF EXISTS user` workaround (T276968) |
[releng] |
20:53 |
<marxarelli> |
restore on db07 failed. appears to be a bug w/ mariadb/mysqldump 10.4 compat https://jira.mariadb.org/browse/MDEV-22127 (T276968) |
[releng] |
20:53 |
<marxarelli> |
restore on db07 failed. appears to be a bug w/ mariadb/mysqldump 10.4 compat https://jira.mariadb.org/browse/MDEV-22127 |
[releng] |
20:42 |
<elukey> |
reimaged an-worker1091 to buster |
[production] |
20:42 |
<elukey> |
reimaged an-worker1091 to buster |
[analytics] |
20:41 |
<bstorm> |
depooled labsdb1009 T276980 |
[production] |
20:39 |
<marxarelli> |
doing `--skip-grant-tables` on deployment-db08 and creating a new root@127.0.0.1 user (T276968) |
[releng] |
20:33 |
<Majavah> |
install mariadb on deployment-db08 T276968 |
[releng] |
20:25 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1091.eqiad.wmnet with reason: REIMAGE |
[production] |
20:25 |
<bstorm> |
downtimed labsdb1009 so it doesn't keep paging T276980 |
[production] |
20:23 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1091.eqiad.wmnet with reason: REIMAGE |
[production] |
20:09 |
<brennen> |
train status: 1.36.0-wmf.32 (T274938) on group0 at 20:06:32 UTC; logs initially quiet. |
[production] |
20:06 |
<brennen@deploy1002> |
rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.34 |
[production] |
19:59 |
<marxarelli> |
creating new instance deployment-db08 to use as new beta replica db (T276968) |
[releng] |
19:56 |
<marxarelli> |
deleting deployment-db05 to free up quota for new replica (T276968) |
[releng] |
19:50 |
<marxarelli> |
restoring database dump on deployment-db07 (T276968) |
[releng] |
19:05 |
<brennen@deploy1002> |
Pruned MediaWiki: 1.36.0-wmf.31 (duration: 03m 34s) |
[production] |
19:04 |
<pt1979@cumin2001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
18:59 |
<pt1979@cumin2001> |
START - Cookbook sre.dns.netbox |
[production] |
18:54 |
<brennen@deploy1002> |
Finished scap: testwikis wikis to 1.36.0-wmf.34 (duration: 47m 25s) |
[production] |
18:52 |
<elukey@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1087.eqiad.wmnet with reason: REIMAGE |
[production] |
18:49 |
<marxarelli> |
restarting db dump on db06 `mysqldump -h 127.0.0.1 --events --routines --triggers --all-databases -f --single-transaction` (T276968) |
[releng] |
18:49 |
<elukey@cumin1001> |
START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1087.eqiad.wmnet with reason: REIMAGE |
[production] |
18:47 |
<dcausse> |
re-pool wdqs1004 |
[production] |
18:38 |
<Majavah> |
installing mariadb 10.4 via role::mariadb::beta to db07 T276968 |
[releng] |
18:37 |
<mbsantos@deploy1002> |
helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' . |
[production] |
18:35 |
<mbsantos@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' . |
[production] |
18:34 |
<pt1979@cumin2001> |
END (PASS) - Cookbook sre.dns.netbox (exit_code=0) |
[production] |
18:29 |
<pt1979@cumin2001> |
START - Cookbook sre.dns.netbox |
[production] |
18:26 |
<elukey> |
reimage an-worker1087 to buster |
[production] |
18:26 |
<elukey> |
reimage an-worker1087 to buster |
[analytics] |
18:25 |
<marxarelli> |
"View 'labswiki.tag_summary' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them" when using LOCK TABLES" during mysqldump on db06 (T276968) |
[releng] |
18:21 |
<Majavah> |
create deployment-db07 as g2.cores8.ram16.disk160 Buster T276968 |
[releng] |
18:20 |
<marxarelli> |
disabled puppet on deployment-db06 and started mysqldump (T276968) |
[releng] |
18:16 |
<mbsantos@deploy1002> |
helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' . |
[production] |
18:13 |
<mbsantos@deploy1002> |
helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' . |
[production] |