2020-06-11
ยง
|
13:35 |
<filippo@cumin1001> |
conftool action : set/pooled=false; selector: dnsdisc=thanos-swift,name=eqiad |
[production] |
13:33 |
<wm-bot> |
<zppixbot> auto-update@website: Synced website repo in 95.s |
[tools.zppixbot] |
13:16 |
<wm-bot> |
<zppixbot> auto-update@website: Synced website repo in 45.s |
[tools.zppixbot] |
12:42 |
<arturo> |
introduce puppet profile 'toolsbeta-docker-registry' and relocate some hiera config there |
[toolsbeta] |
12:39 |
<ayounsi@cumin1001> |
END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0) |
[production] |
12:39 |
<arturo> |
for the record, k8s etcd servers certificate changed (puppet based) and k8s just kept working |
[toolsbeta] |
12:36 |
<elukey> |
updated pcc facts |
[production] |
12:35 |
<arturo> |
according to `aborrero@cloud-cumin-01:~$ sudo cumin --force -x 'O{project:toolsbeta}' 'run-puppet-agent'` we are mostly back in business |
[toolsbeta] |
12:28 |
<jayme@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . |
[production] |
12:28 |
<jayme@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . |
[production] |
12:28 |
<marostegui@cumin1001> |
END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) |
[production] |
12:25 |
<marostegui@cumin1001> |
START - Cookbook sre.hosts.downtime |
[production] |
12:15 |
<jayme@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' . |
[production] |
12:15 |
<jayme@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' . |
[production] |
12:14 |
<arturo> |
try switching all VMs to toolsbeta-puppetmaster-04 |
[toolsbeta] |
12:14 |
<arturo> |
poweroff toolsbeta-puppetmaster-03 |
[toolsbeta] |
12:12 |
<arturo> |
copy over labs/private from toolsbeta-puppetmaster-03 to toolsbeta-puppetmaster-04 |
[toolsbeta] |
12:04 |
<jforrester@deploy1001> |
Synchronized php-1.35.0-wmf.36/includes/title/NamespaceInfo.php: T253098 NamespaceInfo::makeValidNamespace: Don't throw for -1 or -2 (duration: 01m 06s) |
[production] |
12:03 |
<marostegui> |
Reimage es2023 (es5 codfw master) |
[production] |
11:54 |
<marostegui@cumin1001> |
dbctl commit (dc=all): 'Repool db2075 T254139', diff saved to https://phabricator.wikimedia.org/P11469 and previous config saved to /var/cache/conftool/dbconfig/20200611-115430-marostegui.json |
[production] |
11:53 |
<arturo> |
create VM toolsbeta-puppetmaster-04 |
[toolsbeta] |
11:46 |
<marostegui> |
Deploy schema change on s6 codfw - T250066 |
[production] |
11:44 |
<volans@deploy1001> |
Finished deploy [homer/deploy@df83901]: Release v0.2.3 (duration: 00m 25s) |
[production] |
11:44 |
<volans@deploy1001> |
Started deploy [homer/deploy@df83901]: Release v0.2.3 |
[production] |
11:36 |
<ayounsi@cumin1001> |
START - Cookbook sre.network.prepare-upgrade |
[production] |
11:36 |
<matthiasmullie> |
EU BACON done |
[production] |
11:35 |
<arturo> |
try reinstalling the python3 stack in toolsbeta-puppetmaster-03, because everything python-related segfaults |
[toolsbeta] |
11:35 |
<mlitn@deploy1001> |
Synchronized php-1.35.0-wmf.36/extensions/GrowthExperiments: Help panel: Update guidance behavior rules (duration: 01m 06s) |
[production] |
11:34 |
<jayme@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' . |
[production] |
11:34 |
<jayme@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' . |
[production] |
11:33 |
<arturo> |
reboot toolsbeta-puppetmaster-03 to try cleaning up potential kernel/filesystem problems |
[toolsbeta] |
11:32 |
<arturo> |
apparently every python script segfaults in toolsbeta-puppetmaster-03 |
[toolsbeta] |
11:28 |
<kartik@deploy1001> |
Synchronized php-1.35.0-wmf.36/extensions/ContentTranslation/modules/tools/mw.cx.tools.IssueTrackingTool.js: Backport: [[gerrit|604587|IssueTrackingTool: Fix js error in getCurrentNodeId method (T254965)]] (duration: 01m 07s) |
[production] |
11:27 |
<arturo> |
puppetdb wasn't the problem. The problem is puppet-enc segfaulting in toolsbeta-puppetmaster-03 |
[toolsbeta] |
11:21 |
<arturo> |
puppet not working bc puppetdb, run `aborrero@toolsbeta-puppetdb-02:~ $ sudo systemctl restart puppetdb` |
[toolsbeta] |
11:11 |
<arturo> |
deployed nginx-ingress for some early testing (not definitive) with code https://github.com/crookedstorm/paws/commit/bee62b3fd57f9804aa27e7b8b41fde50bd93df94 (T195217) |
[paws] |
11:08 |
<jayme@deploy1001> |
helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' . |
[production] |
11:04 |
<legoktm> |
restarting everything after gerrit-replica 502s fixed T255094 T255125 |
[codesearch] |
11:04 |
<mlitn@deploy1001> |
Synchronized php-1.35.0-wmf.36/extensions/MachineVision: $aliases should be an array of strings, not AliasGroup objects (duration: 01m 07s) |
[production] |
10:47 |
<moritzm> |
repooling mw1318,mw2139,mw2145,mw2147,mw2221,mw2219,mw2250,mw2350 (these were depooled, but seem all fine in Icinga and were probably just forgotten) |
[production] |
10:41 |
<filippo@cumin1001> |
conftool action : set/pooled=yes; selector: cluster=thanos,service=thanos-swift |
[production] |
10:40 |
<filippo@cumin1001> |
conftool action : set/pooled=yes; selector: cluster=thanos,service=thanos-query |
[production] |
10:37 |
<Urbanecm> |
Run `update page set page_content_model="json" where page_content_model = "CollaborationListContent" OR page_content_model = "CollaborationHubContent";` at beta enwiki (T255107) |
[releng] |
10:37 |
<moritzm> |
installing buster kernel security updates (no reboots yet, on hold for regression-free microcode update) |
[production] |
10:32 |
<godog> |
roll-restart pybal in eqiad lvs low-traffic |
[production] |
10:21 |
<mutante> |
restarting gerrit on gerrit-replica (gerrit2001) - java.lang.OutOfMemoryError: Java heap space |
[production] |
10:21 |
<Urbanecm> |
Run scap pull at mwdebug1001 to revert temporary changes |
[production] |
10:18 |
<RhinosF1> |
tools.zppixbot-test@tools-sgebastion-08:~$ grep -r -D skip "last_event_at" (in case anything seems slow, may take a while, please don't kill anything while I do it) END |
[tools.zppixbot-test] |
10:15 |
<arturo> |
added role (just a label) for ingress nodes: `kubectl label node paws-k8s-ingress-1 kubernetes.io/role=ingress` (T195217) |
[paws] |
10:14 |
<RhinosF1> |
tools.zppixbot-test@tools-sgebastion-08:~$ grep -r -D skip "last_event_at" (in case anything seems slow, may take a while, please don't kill anything while I do it) |
[tools.zppixbot-test] |